spear tts pytorch تحميل - spear tts pytorch تحميل كود المصدر

spear tts pytorch

كود الذكاء الاصطناعي

0.4.8

تنزيل

الرمح تحويل النص إلى كلام - Pytorch

تنفيذ Spear-TTS - شبكة تحويل النص إلى كلام متعددة المتحدثين، في Pytorch

سيتم استخدام وحدة تحويل النص إلى الدلالات المبنية هنا في SoundStorm للتكييف.

تقدير

الاستقرار لرعايتهم السخية للعمل على أبحاث الذكاء الاصطناعي المتطورة ومفتوحة المصدر
لوكاس نيومان لاستكمال جزء الترجمة العكسية، وكذلك فك تشفير بحث الشعاع!
لوكاس نيومان لاستكمال النص النهائي لكود التدريب على المحولات الدلالية!

ثَبَّتَ

$ pip install spear-tts-pytorch

الاستخدام

 import torch

from audiolm_pytorch import HubertWithKmeans

from spear_tts_pytorch import (
    TextToSemantic ,
    SemanticToTextDatasetGenerator ,
    GeneratedAudioTextDataset ,
    MockDataset
)

wav2vec = HubertWithKmeans (
    checkpoint_path = './hubert_base_ls960.pt' ,
    kmeans_path = './hubert_base_ls960_L9_km500.bin'
)

model = TextToSemantic (
    wav2vec = wav2vec ,
    dim = 512 ,
    num_text_token_ids = 256 ,
    heads = 8 ,
    target_kv_heads = 2 , # grouped query attention, for memory efficient decoding
    source_depth = 1 ,
    target_depth = 1
)

ds = MockDataset ( 10 )

dataset_generator = SemanticToTextDatasetGenerator (
    model = model ,
    dataset = ds ,
    folder = './output_folder'
)

dataset_generator ( max_length = 2 )

generated_dataset = GeneratedAudioTextDataset (
    folder = './output_folder'
)

assert len ( generated_dataset ) == 10

ما يجب القيام به

الاستشهادات

 @misc { kharitonov2023speak ,
    title   = { Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision } , 
    author  = { Eugene Kharitonov and Damien Vincent and Zalán Borsos and Raphaël Marinier and Sertan Girgin and Olivier Pietquin and Matt Sharifi and Marco Tagliasacchi and Neil Zeghidour } ,
    year    = { 2023 } ,
    eprint  = { 2302.03540 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.SD }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @misc { shi2023enhance ,
    title   = { Enhance audio generation controllability through representation similarity regularization } , 
    author  = { Yangyang Shi and Gael Le Lan and Varun Nagaraja and Zhaoheng Ni and Xinhao Mei and Ernie Chang and Forrest Iandola and Yang Liu and Vikas Chandra } ,
    year    = { 2023 } ,
    eprint  = { 2309.08773 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.SD }
}

 @article { Ainslie2023GQATG ,
    title   = { GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints } ,
    author  = { Joshua Ainslie and James Lee-Thorp and Michiel de Jong and Yury Zemlyanskiy and Federico Lebr'on and Sumit K. Sanghai } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2305.13245 } ,
    url     = { https://api.semanticscholar.org/CorpusID:258833177 }
}

 @inproceedings { Leviathan2022FastIF ,
    title   = { Fast Inference from Transformers via Speculative Decoding } ,
    author  = { Yaniv Leviathan and Matan Kalman and Y. Matias } ,
    booktitle = { International Conference on Machine Learning } ,
    year    = { 2022 } ,
    url     = { https://api.semanticscholar.org/CorpusID:254096365 }
}

يوسع

معلومات إضافية

الإصدار 0.4.8
النوع كود الذكاء الاصطناعي
وقت التحديث 2025-01-15
الحجم 110.97KB
من Github

تطبيقات ذات صلة

GitHub sgrebnov/cordova plugin background download

2024-11-05
pytorch image models

2024-11-03
F5 TTS ComfyUI

2024-11-02
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
معلومات باللغة الإنجليزية عن تطوير الصوت (دليل مستخدم TTS إصدار دلفي)

2009-05-28

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
node telegram bot api

كود الذكاء الاصطناعي

v0.50.0
typebot.io

كود الذكاء الاصطناعي

v3.1.2
python wechaty getting started

كود الذكاء الاصطناعي

1.0.0
waymo open dataset

شفرة المصدر الأخرى

December 2023 Update
termwind

فئات أخرى

v2.3.0
wp functions

فئات أخرى

1.0.0

أخبار ذات صلة الكل