Descargar xcodec - Descarga del código fuente xcodec

xcodec

Código Fuente de IA

1.0.0

Descargar

Códec X

Códec semántico y acústico unificado para el modelo de lenguaje de audio.

Papel

Título : El códec sí importa: exploración de las deficiencias semánticas del códec para el modelo de lenguaje de audio

Autores : Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo*, Wei Xue*

Descripción general

Experimentos en VALL-E

Exp.

Destacar

Puede aplicar fácilmente nuestro enfoque para mejorar cualquier códec acústico existente:

Por ejemplo

 class Codec ():
    def __init__ ( self ):
        # Acoustic codec components
        self . encoder = Encoder (...)       # Acoustic encoder
        self . decoder = Decoder (...)       # Acoustic decoder
        self . quantizer = RVQ (...)         # Residual Vector Quantizer (RVQ)

        # Adding the semantic module
        self . semantic_model = AutoModel . from_pretrained (...)  # e.g., Hubert, WavLM

        # Adding Projector
        self . fc_prior = nn . Linear (...)     
        self . fc_post1 = nn . Linear (...)     
        self . fc_post2 = nn . Linear (...)     

    def forward ( self , x , bw ):
        # Encode the input acoustically and semantically
        e_acoustic = self . encoder ( x )
        e_semantic = self . semantic_model ( x )

        # Combine acoustic and semantic features
        combined_features = torch . cat ([ e_acoustic , e_semantic ])

        # Apply prior transformation
        transformed_features = self . fc_prior ( combined_features )

        # Quantize the unified  semantic and acoustic features
        quantized , codes , bandwidth , commit_loss = self . quantizer ( transformed_features , bw )

        # Post-process the quantized features
        quantized_semantic = self . fc_post1 ( quantized )
        quantized_acoustic = self . fc_post2 ( quantized )

        # Decode the quantized acoustic features
        output = self . decoder ( quantized_acoustic )



    def semantic_loss ( self , semantic , quantized_semantic ):
        return F . mse_loss ( semantic , quantized_semantic )

Para obtener más detalles, consulte nuestro código.

Modelos disponibles

? enlaces al centro de modelos de Huggingface.

Nombre del modelo	abrazando la cara	configuración	Modelo semántico	Dominio	Datos de entrenamiento
xcodec_hubert_librispeech	?	?	? base Hubert	Discurso	Librispeech
xcodec_wavlm_mls (no mencionado en el artículo)	?	?	? Wavlm-base-plus	Discurso	MLS Inglés
xcodec_wavlm_more_data (no mencionado en el documento)	?	?	? Wavlm-base-plus	Discurso	MLS Inglés + Datos internos
xcodec_hubert_general_audio	?	?	?Hubert-base-audio-general	Audio general	200.000 horas de datos internos
xcodec_hubert_general_audio_more_data (no mencionado en el documento)	?	?	?Hubert-base-audio-general	Audio general	Datos más equilibrados

Inferencia

Para ejecutar la inferencia, primero descargue el modelo y la configuración de Hugging Face.

python inference.py

Capacitación

Prepare el archivo_entrenamiento y el archivo_validación en config. El archivo debe enumerar las rutas a sus archivos de audio:

/path/to/your/xxx.wav
/path/to/your/yyy.wav
...

Entonces:

torchrun --nnodes=1 --nproc-per-node=8 main_launch_vqdp.py

Reconocimiento

Me gustaría extender un agradecimiento especial a los autores de Uniaudio y DAC, ya que nuestro código base está tomado principalmente de Uniaudio y DAC.

Citación

Si encuentra útil este repositorio, considere citarlo en el siguiente formato:

 @article { ye2024codecdoesmatterexploring ,
      title = { Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model } , 
      author = { Zhen Ye and Peiwen Sun and Jiahe Lei and Hongzhan Lin and Xu Tan and Zheqi Dai and Qiuqiang Kong and Jianyi Chen and Jiahao Pan and Qifeng Liu and Yike Guo and Wei Xue } ,
      journal = { arXiv preprint arXiv:2408.17175 } ,
      year = { 2024 } ,
}

Expandir

Información adicional

Versión 1.0.0
Tipo Código Fuente de IA
Fecha de actualización 2024-12-08
tamaño 1.82MB
Proviene de Github

Aplicaciones relacionadas

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
node telegram bot api

Código Fuente de IA

v0.50.0
typebot.io

Código Fuente de IA

v3.1.2
python wechaty getting started

Código Fuente de IA

1.0.0
waymo open dataset

Otro código fuente

December 2023 Update
termwind

Otras categorias

v2.3.0
wp functions

Otras categorias

1.0.0

Información relacionada Todo