Fish Audio releases Fish Agent V0.1 3B real-time voice cloning

Author：Eve Cole Update Time：2024-12-24 19:48:01

The editor of Downcodes learned that the latest speech processing model Fish Agent V0.13B released by Fish Audio Company has made waves in the field of AI speech with its efficient and accurate speech generation and processing capabilities. This model is particularly good at simulating and cloning various sounds, significantly improving the fidelity and response speed of the AI voice assistant, and bringing users a more natural and smooth voice interaction experience. Its innovative architecture enables "instant" voice cloning and text-to-speech conversion with a conversion time of only 200 milliseconds, which enables it to show great potential in real-time voice generation applications such as voice assistants and automated customer service.

Thanks to this innovative architecture, Fish Agent V0.13B is able to generate high-quality speech quickly and naturally, achieving "instant" speech cloning and text-to-speech conversion, with a text-to-audio conversion time (TTFA) of only 200 milliseconds. This feature makes it ideal for application scenarios that require real-time speech generation, such as voice assistants, automated customer service, and other scenarios that require fast voice feedback.

The Fish Agent V0.13B model supports multiple languages, including English, Chinese, German, Japanese, French, Spanish, Korean and Arabic, and was trained using approximately 700,000 hours of multilingual audio data. This means it can handle multiple languages and contexts and generate speech that is more natural and closer to what a real person would pronounce.

In addition to speech-to-speech generation and text-to-speech conversion capabilities, Fish Agent V0.13B also includes the following key features:

Zero-sample voice cloning: Voice cloning can be achieved without training.

Streamlined 3B parameters: Use 3 billion parameters to facilitate development.

Support text and audio input: flexible multiple input methods.

Currently, Fish Audio has open sourced the Fish Agent V0.13B model and provided a preliminary demo version for users to experience. The release of this model will further promote the development of AI voice technology and bring more possibilities to applications such as voice assistants and virtual humans.

GitHub: https://github.com/fishaudio/fish-speech

Fish Agent Demo: https://huggingface.co/spaces/fishaudio/fish-agent

Model download: https://huggingface.co/fishaudio/fish-agent-v0.1-3b

Technical report: https://arxiv.org/abs/2411.01156

The open source release of Fish Agent V0.13B marks a new milestone in AI voice technology, providing developers and researchers with powerful tools, and also indicates that AI voice applications will be richer and more convenient in the future. We look forward to Fish Audio bringing more innovations in the field of AI voice!