Unduhan xcodec - pengunduhan kode sumber xcodec

xcodec

Kode Sumber AI

1.0.0

Unduh

X-Codec

Codec Semantik dan Akustik Terpadu untuk Model Bahasa Audio.

Kertas

Judul : Codec Penting: Mengeksplorasi Kekurangan Semantik Codec untuk Model Bahasa Audio

Penulis : Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo*, Wei Xue*

Ringkasan

Eksperimen pada VALL-E

Contoh

Menyorot

Anda dapat dengan mudah menerapkan pendekatan kami untuk menyempurnakan codec akustik yang ada:

Misalnya

 class Codec ():
    def __init__ ( self ):
        # Acoustic codec components
        self . encoder = Encoder (...)       # Acoustic encoder
        self . decoder = Decoder (...)       # Acoustic decoder
        self . quantizer = RVQ (...)         # Residual Vector Quantizer (RVQ)

        # Adding the semantic module
        self . semantic_model = AutoModel . from_pretrained (...)  # e.g., Hubert, WavLM

        # Adding Projector
        self . fc_prior = nn . Linear (...)     
        self . fc_post1 = nn . Linear (...)     
        self . fc_post2 = nn . Linear (...)     

    def forward ( self , x , bw ):
        # Encode the input acoustically and semantically
        e_acoustic = self . encoder ( x )
        e_semantic = self . semantic_model ( x )

        # Combine acoustic and semantic features
        combined_features = torch . cat ([ e_acoustic , e_semantic ])

        # Apply prior transformation
        transformed_features = self . fc_prior ( combined_features )

        # Quantize the unified  semantic and acoustic features
        quantized , codes , bandwidth , commit_loss = self . quantizer ( transformed_features , bw )

        # Post-process the quantized features
        quantized_semantic = self . fc_post1 ( quantized )
        quantized_acoustic = self . fc_post2 ( quantized )

        # Decode the quantized acoustic features
        output = self . decoder ( quantized_acoustic )



    def semantic_loss ( self , semantic , quantized_semantic ):
        return F . mse_loss ( semantic , quantized_semantic )

Untuk lebih jelasnya, silakan lihat kode kami.

Model yang tersedia

? tautan ke hub model Huggingface.

Nama model	Memeluk Wajah	Konfigurasi	Model Semantik	Domain	Data Pelatihan
xcodec_hubert_librispeech	?	?	? Pangkalan Hubert	Pidato	Pidato pustaka
xcodec_wavlm_mls (tidak disebutkan di kertas)	?	?	? Wavlm-basis-plus	Pidato	MLS Bahasa Inggris
xcodec_wavlm_more_data (tidak disebutkan di kertas)	?	?	? Wavlm-basis-plus	Pidato	MLS Bahasa Inggris + Data internal
xcodec_hubert_general_audio	?	?	?Hubert-base-umum-audio	Audio umum	Data internal 200k jam
xcodec_hubert_general_audio_more_data (tidak disebutkan di kertas)	?	?	?Hubert-base-umum-audio	Audio umum	Data yang lebih seimbang

Kesimpulan

Untuk menjalankan inferensi, pertama-tama unduh model dan konfigurasi dari pelukan wajah.

python inference.py

Pelatihan

Siapkan file_pelatihan dan file_validasi di konfigurasi. File tersebut harus mencantumkan jalur ke file audio Anda:

/path/to/your/xxx.wav
/path/to/your/yyy.wav
...

Kemudian:

torchrun --nnodes=1 --nproc-per-node=8 main_launch_vqdp.py

Pengakuan

Saya ingin mengucapkan terima kasih khusus kepada penulis Uniaudio dan DAC, karena basis kode kami sebagian besar dipinjam dari Uniaudio dan DAC.

Kutipan

Jika Anda merasa repo ini bermanfaat, harap pertimbangkan untuk mengutip dalam format berikut:

 @article { ye2024codecdoesmatterexploring ,
      title = { Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model } , 
      author = { Zhen Ye and Peiwen Sun and Jiahe Lei and Hongzhan Lin and Xu Tan and Zheqi Dai and Qiuqiang Kong and Jianyi Chen and Jiahao Pan and Qifeng Liu and Yike Guo and Wei Xue } ,
      journal = { arXiv preprint arXiv:2408.17175 } ,
      year = { 2024 } ,
}

Memperluas

Informasi Tambahan

Versi 1.0.0
Tipe Kode Sumber AI
Waktu Pembaruan 2024-12-08
ukuran 1.82MB
Berasal dari Github

Aplikasi Terkait

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

Direkomendasikan untuk Anda

chat.petals.dev

Kode sumber lainnya

1.0.0
GPT Prompt Templates

Kode sumber lainnya

1.0.0
GPTyped

Kode sumber lainnya

GPTyped 1.0.5
node telegram bot api

Kode Sumber AI

v0.50.0
typebot.io

Kode Sumber AI

v3.1.2
python wechaty getting started

Kode Sumber AI

1.0.0
waymo open dataset

Kode sumber lainnya

December 2023 Update
termwind

Kategori lainnya

v2.3.0
wp functions

Kategori lainnya

1.0.0

Informasi Terkait Semua