OuteTTS-0.1-350M: A novel text-to-speech synthesis method with zero-sample voice cloning function - AI Articles

Author：Eve Cole Update Time：2025-02-13 03:32:01

Oute AI has launched a new text-to-speech synthesis method called OuteTTS-0.1-350M, a simplified TTS model based on the LLaMa architecture. It does not require an external adapter, directly uses WavTokenizer to generate audio tags, and has zero-sample voice cloning function, which can copy new sounds in just a few seconds of reference audio. The model's parameters are relatively small in scale, but it can achieve performance comparable to larger and more complex systems, and is compatible with llama.cpp, making it ideal for real-time applications. Its efficiency and ease of use make it have a wide range of application prospects in areas such as personalized assistants, audiobooks and content localization.

Recently, Oute AI released a novel text-to-speech synthesis method called OuteTTS-0.1-350M. This approach utilizes pure language modeling without external adapters or complex architectures, providing a simplified TTS approach. OuteTTS-0.1-350M is based on the LLaMa architecture, using WavTokenizer to directly generate audio tags, making the process more efficient.

The model has zero-sample voice cloning, and can copy new sounds in just a few seconds of reference audio. OuteTTS-0.1-350M is designed for device performance and is compatible with llama.cpp, making it ideal for real-time applications. Although the model has a relatively small parameter size (350 million), its performance is comparable to larger and more complex TTS systems.

The accessibility and efficiency of OuteTTS-0.1-350M makes it suitable for a wide range of applications, including personalized assistants, audiobooks and content localization. Oute AI, released under CC-BY license, encourages further experimentation and integration into different projects to democratize advanced TTS technology.

The release of OuteTTS-0.1-350M marks a key step forward in text-to-speech technology, which utilizes a simplified architecture to provide high-quality speech synthesis with minimal computational requirements. It integrates the LLaMa architecture, uses WavTokenizer, and is able to perform zero-sample voice cloning without complex adapters, which distinguishes it from the traditional TTS model.

Address: https://www.outeai.com/blog/OuteTTS-0.1-350M

OuteTTS-0.1-350M's efficient, simplified architecture and zero-sample voice cloning function bring new possibilities to text-to-speech technology and provides developers with more convenient and easy-to-use tools. Its open source characteristics have promoted the technological development and popularization of applications in this field.