xcodec下载 - xcodec源码下载

xcodec

Ai源码

1.0.0

下载

X编解码器

音频语言模型的统一语义和声学编解码器。

纸

标题：编解码器确实很重要：探索音频语言模型编解码器的语义缺点

作者：叶震、孙培文、雷嘉禾、林红战、谭旭、戴哲琪、孔秋强、陈建一、潘家豪、刘奇峰、郭一克*、薛伟*

VALL-E上的实验

经验值

强调

您可以轻松应用我们的方法来增强任何现有的声学编解码器：

例如

 class Codec ():
    def __init__ ( self ):
        # Acoustic codec components
        self . encoder = Encoder (...)       # Acoustic encoder
        self . decoder = Decoder (...)       # Acoustic decoder
        self . quantizer = RVQ (...)         # Residual Vector Quantizer (RVQ)

        # Adding the semantic module
        self . semantic_model = AutoModel . from_pretrained (...)  # e.g., Hubert, WavLM

        # Adding Projector
        self . fc_prior = nn . Linear (...)     
        self . fc_post1 = nn . Linear (...)     
        self . fc_post2 = nn . Linear (...)     

    def forward ( self , x , bw ):
        # Encode the input acoustically and semantically
        e_acoustic = self . encoder ( x )
        e_semantic = self . semantic_model ( x )

        # Combine acoustic and semantic features
        combined_features = torch . cat ([ e_acoustic , e_semantic ])

        # Apply prior transformation
        transformed_features = self . fc_prior ( combined_features )

        # Quantize the unified  semantic and acoustic features
        quantized , codes , bandwidth , commit_loss = self . quantizer ( transformed_features , bw )

        # Post-process the quantized features
        quantized_semantic = self . fc_post1 ( quantized )
        quantized_acoustic = self . fc_post2 ( quantized )

        # Decode the quantized acoustic features
        output = self . decoder ( quantized_acoustic )



    def semantic_loss ( self , semantic , quantized_semantic ):
        return F . mse_loss ( semantic , quantized_semantic )

欲了解更多详情，请参阅我们的代码。

可用型号

？链接到 Huggingface 模型中心。

型号名称	抱脸	配置	语义模型	领域	训练数据
xcodec_hubert_librispeech	？	？	？休伯特基	演讲	书本演讲
xcodec_wavlm_mls（论文中未提及）	？	？	？ Wavlm-base-plus	演讲	木林森英语
xcodec_wavlm_more_data（论文中未提及）	？	？	？ Wavlm-base-plus	演讲	MLS 英文+内部数据
xcodec_hubert_general_audio	？	？	?Hubert-base-通用音频	通用音频	20万小时内部数据
xcodec_hubert_general_audio_more_data（论文中未提及）	？	？	?Hubert-base-通用音频	通用音频	数据更均衡

推理

要运行推理，首先从 Hugging Face 下载模型和配置。

python inference.py

训练

在config中准备training_file和validation_file。该文件应列出音频文件的路径：

/path/to/your/xxx.wav
/path/to/your/yyy.wav
...

然后：

torchrun --nnodes=1 --nproc-per-node=8 main_launch_vqdp.py

致谢

我要特别感谢 Uniaudio 和 DAC 的作者，因为我们的代码库主要借鉴了 Uniaudio 和 DAC。

引文

如果您发现此存储库有帮助，请考虑按以下格式引用：

 @article { ye2024codecdoesmatterexploring ,
      title = { Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model } , 
      author = { Zhen Ye and Peiwen Sun and Jiahe Lei and Hongzhan Lin and Xu Tan and Zheqi Dai and Qiuqiang Kong and Jianyi Chen and Jiahao Pan and Qifeng Liu and Yike Guo and Wei Xue } ,
      journal = { arXiv preprint arXiv:2408.17175 } ,
      year = { 2024 } ,
}