spear tts pytorch 다운로드 - spear tts pytorch 소스 코드 다운로드

spear tts pytorch

AI 소스 코드

0.4.8

다운로드

Spear-TTS - Pytorch

Spear-TTS 구현 - Pytorch에서 다중 화자 텍스트 음성 변환 주의 네트워크

여기에 구축된 텍스트-의미론적 모듈은 조절을 위해 SoundStorm에 사용됩니다.

감사

최첨단 인공지능 연구 및 오픈소스 작업을 위한 아낌없는 후원을 위한 안정성
역번역 부분과 빔 검색 디코딩을 완료한 Lucas Newman!
의미론적 변환기 훈련 코드의 최종 텍스트를 완성한 Lucas Newman!

설치하다

$ pip install spear-tts-pytorch

용법

 import torch

from audiolm_pytorch import HubertWithKmeans

from spear_tts_pytorch import (
    TextToSemantic ,
    SemanticToTextDatasetGenerator ,
    GeneratedAudioTextDataset ,
    MockDataset
)

wav2vec = HubertWithKmeans (
    checkpoint_path = './hubert_base_ls960.pt' ,
    kmeans_path = './hubert_base_ls960_L9_km500.bin'
)

model = TextToSemantic (
    wav2vec = wav2vec ,
    dim = 512 ,
    num_text_token_ids = 256 ,
    heads = 8 ,
    target_kv_heads = 2 , # grouped query attention, for memory efficient decoding
    source_depth = 1 ,
    target_depth = 1
)

ds = MockDataset ( 10 )

dataset_generator = SemanticToTextDatasetGenerator (
    model = model ,
    dataset = ds ,
    folder = './output_folder'
)

dataset_generator ( max_length = 2 )

generated_dataset = GeneratedAudioTextDataset (
    folder = './output_folder'
)

assert len ( generated_dataset ) == 10

토도

인용

 @misc { kharitonov2023speak ,
    title   = { Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision } , 
    author  = { Eugene Kharitonov and Damien Vincent and Zalán Borsos and Raphaël Marinier and Sertan Girgin and Olivier Pietquin and Matt Sharifi and Marco Tagliasacchi and Neil Zeghidour } ,
    year    = { 2023 } ,
    eprint  = { 2302.03540 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.SD }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @misc { shi2023enhance ,
    title   = { Enhance audio generation controllability through representation similarity regularization } , 
    author  = { Yangyang Shi and Gael Le Lan and Varun Nagaraja and Zhaoheng Ni and Xinhao Mei and Ernie Chang and Forrest Iandola and Yang Liu and Vikas Chandra } ,
    year    = { 2023 } ,
    eprint  = { 2309.08773 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.SD }
}

 @article { Ainslie2023GQATG ,
    title   = { GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints } ,
    author  = { Joshua Ainslie and James Lee-Thorp and Michiel de Jong and Yury Zemlyanskiy and Federico Lebr'on and Sumit K. Sanghai } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2305.13245 } ,
    url     = { https://api.semanticscholar.org/CorpusID:258833177 }
}

 @inproceedings { Leviathan2022FastIF ,
    title   = { Fast Inference from Transformers via Speculative Decoding } ,
    author  = { Yaniv Leviathan and Matan Kalman and Y. Matias } ,
    booktitle = { International Conference on Machine Learning } ,
    year    = { 2022 } ,
    url     = { https://api.semanticscholar.org/CorpusID:254096365 }
}