spear tts pytorchダウンロード - spear tts pytorchソースコードのダウンロード

spear tts pytorch

AI ソースコード

0.4.8

ダウンロード

Spear-TTS - パイトーチ

Spear-TTS - マルチスピーカーのテキスト読み上げアテンションネットワーク (Pytorch で) の実装

ここで構築された text-to-semantic モジュールは、SoundStorm の条件付けに使用されます。

感謝

最先端の人工知能研究に取り組むための寛大なスポンサーシップとオープンソースの安定性
Lucas Newman は、逆変換部分とビーム検索デコードを完了してくれました。
Lucas Newman は、最終テキストをセマンティック変換するトレーニングコードを完成させてくれました。

インストール

$ pip install spear-tts-pytorch

使用法

 import torch

from audiolm_pytorch import HubertWithKmeans

from spear_tts_pytorch import (
    TextToSemantic ,
    SemanticToTextDatasetGenerator ,
    GeneratedAudioTextDataset ,
    MockDataset
)

wav2vec = HubertWithKmeans (
    checkpoint_path = './hubert_base_ls960.pt' ,
    kmeans_path = './hubert_base_ls960_L9_km500.bin'
)

model = TextToSemantic (
    wav2vec = wav2vec ,
    dim = 512 ,
    num_text_token_ids = 256 ,
    heads = 8 ,
    target_kv_heads = 2 , # grouped query attention, for memory efficient decoding
    source_depth = 1 ,
    target_depth = 1
)

ds = MockDataset ( 10 )

dataset_generator = SemanticToTextDatasetGenerator (
    model = model ,
    dataset = ds ,
    folder = './output_folder'
)

dataset_generator ( max_length = 2 )

generated_dataset = GeneratedAudioTextDataset (
    folder = './output_folder'
)

assert len ( generated_dataset ) == 10

藤堂

引用

 @misc { kharitonov2023speak ,
    title   = { Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision } , 
    author  = { Eugene Kharitonov and Damien Vincent and Zalán Borsos and Raphaël Marinier and Sertan Girgin and Olivier Pietquin and Matt Sharifi and Marco Tagliasacchi and Neil Zeghidour } ,
    year    = { 2023 } ,
    eprint  = { 2302.03540 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.SD }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @misc { shi2023enhance ,
    title   = { Enhance audio generation controllability through representation similarity regularization } , 
    author  = { Yangyang Shi and Gael Le Lan and Varun Nagaraja and Zhaoheng Ni and Xinhao Mei and Ernie Chang and Forrest Iandola and Yang Liu and Vikas Chandra } ,
    year    = { 2023 } ,
    eprint  = { 2309.08773 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.SD }
}

 @article { Ainslie2023GQATG ,
    title   = { GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints } ,
    author  = { Joshua Ainslie and James Lee-Thorp and Michiel de Jong and Yury Zemlyanskiy and Federico Lebr'on and Sumit K. Sanghai } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2305.13245 } ,
    url     = { https://api.semanticscholar.org/CorpusID:258833177 }
}

 @inproceedings { Leviathan2022FastIF ,
    title   = { Fast Inference from Transformers via Speculative Decoding } ,
    author  = { Yaniv Leviathan and Matan Kalman and Y. Matias } ,
    booktitle = { International Conference on Machine Learning } ,
    year    = { 2022 } ,
    url     = { https://api.semanticscholar.org/CorpusID:254096365 }
}