Download xcodec - download do código-fonte xcodec

xcodec

Código-Fonte de IA

1.0.0

Baixar

Codec X

Codec Semântico e Acústico Unificado para Modelo de Linguagem de Áudio.

Papel

Título : Codec é importante: explorando a deficiência semântica do codec para modelo de linguagem de áudio

Autores : Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo*, Wei Xue*

Visão geral

Experimentos em VALL-E

Exp.

Destaque

Você pode aplicar facilmente nossa abordagem para aprimorar qualquer codec acústico existente:

Por exemplo

 class Codec ():
    def __init__ ( self ):
        # Acoustic codec components
        self . encoder = Encoder (...)       # Acoustic encoder
        self . decoder = Decoder (...)       # Acoustic decoder
        self . quantizer = RVQ (...)         # Residual Vector Quantizer (RVQ)

        # Adding the semantic module
        self . semantic_model = AutoModel . from_pretrained (...)  # e.g., Hubert, WavLM

        # Adding Projector
        self . fc_prior = nn . Linear (...)     
        self . fc_post1 = nn . Linear (...)     
        self . fc_post2 = nn . Linear (...)     

    def forward ( self , x , bw ):
        # Encode the input acoustically and semantically
        e_acoustic = self . encoder ( x )
        e_semantic = self . semantic_model ( x )

        # Combine acoustic and semantic features
        combined_features = torch . cat ([ e_acoustic , e_semantic ])

        # Apply prior transformation
        transformed_features = self . fc_prior ( combined_features )

        # Quantize the unified  semantic and acoustic features
        quantized , codes , bandwidth , commit_loss = self . quantizer ( transformed_features , bw )

        # Post-process the quantized features
        quantized_semantic = self . fc_post1 ( quantized )
        quantized_acoustic = self . fc_post2 ( quantized )

        # Decode the quantized acoustic features
        output = self . decoder ( quantized_acoustic )



    def semantic_loss ( self , semantic , quantized_semantic ):
        return F . mse_loss ( semantic , quantized_semantic )

Para mais detalhes, consulte nosso código.

Modelos disponíveis

? links para o hub do modelo Huggingface.

Nome do modelo	Abraçando o rosto	Configuração	Modelo Semântico	Domínio	Dados de treinamento
xcodec_hubert_librispeech	?	?	? Base Hubert	Discurso	Librispeech
xcodec_wavlm_mls (não mencionado no artigo)	?	?	? Wavlm-base-plus	Discurso	MLS Inglês
xcodec_wavlm_more_data (não mencionado no artigo)	?	?	? Wavlm-base-plus	Discurso	MLS Inglês + Dados internos
xcodec_hubert_general_audio	?	?	?Hubert-base-geral-áudio	Áudio geral	200 mil horas de dados internos
xcodec_hubert_general_audio_more_data (não mencionado no artigo)	?	?	?Hubert-base-geral-áudio	Áudio geral	Dados mais equilibrados

Inferência

Para executar a inferência, primeiro baixe o modelo e configure do huging face.

python inference.py

Treinamento

Prepare o training_file e o validação_file em config. O arquivo deve listar os caminhos para seus arquivos de áudio:

/path/to/your/xxx.wav
/path/to/your/yyy.wav
...

Então:

torchrun --nnodes=1 --nproc-per-node=8 main_launch_vqdp.py

Reconhecimento

Gostaria de agradecer especialmente aos autores do Uniaudio e DAC, já que nossa base de código é emprestada principalmente do Uniaudio e DAC.

Citação

Se você achar este repositório útil, considere citar no seguinte formato:

 @article { ye2024codecdoesmatterexploring ,
      title = { Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model } , 
      author = { Zhen Ye and Peiwen Sun and Jiahe Lei and Hongzhan Lin and Xu Tan and Zheqi Dai and Qiuqiang Kong and Jianyi Chen and Jiahao Pan and Qifeng Liu and Yike Guo and Wei Xue } ,
      journal = { arXiv preprint arXiv:2408.17175 } ,
      year = { 2024 } ,
}

Expandir

Informações adicionais

Versão 1.0.0
Tipo Código-Fonte de IA
Data da Última Atualização 2024-12-08
tamanho 1.82MB
Vindo de Github

Aplicativos Relacionados

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

Recomendado para você

chat.petals.dev

Outro código-fonte

1.0.0
GPT Prompt Templates

Outro código-fonte

1.0.0
GPTyped

Outro código-fonte

GPTyped 1.0.5
node telegram bot api

Código-Fonte de IA

v0.50.0
typebot.io

Código-Fonte de IA

v3.1.2
python wechaty getting started

Código-Fonte de IA

1.0.0
waymo open dataset

Outro código-fonte

December 2023 Update
termwind

Outras categorias

v2.3.0
wp functions

Outras categorias

1.0.0

Informações Relacionadas Todos