Doubao releases real-time speech large model with first-class Chinese language, Shuangshang Online - AI article

Author：Eve Cole Update Time：2025-01-28 11:48:02

The latest real-time speech model released by Doubao Company has achieved breakthrough progress in the field of Chinese dialogue. It is fully launched in the 7.2.0 New Year version of Doubao App. This model deeply integrates speech understanding and generation to create an end-to-end speech dialogue system, which significantly improves speech expressiveness, control, and emotional acceptance. It also has functions such as low latency and interrupting conversations at any time, bringing more benefits to users. Natural and smooth interactive experience. This update also brings a new real-time voice call function, which supports flexible adjustment of conversation details, imitation of multiple voices and dialects, and even the ability to sing some songs, further enhancing the realism of human-machine dialogue.

Recently, Doubao Company announced the launch of its new real-time speech model, claiming to have achieved a "cliff lead" in Chinese dialogue, marking a significant improvement in AI dialogue capabilities. This model is fully open in the Doubao App (version number 7.2.0 New Year Edition), bringing users a richer and more realistic voice communication experience.

According to reports, Doubao’s real-time speech large model realizes the deep integration of speech understanding and generation, forming an end-to-end speech dialogue system. This technological breakthrough allows the model to perform very well in terms of voice expressiveness, control, and emotional acceptance. It has low latency and the ability to interrupt conversations at any time, which greatly improves the user's interactive experience. Officials stated that this technology not only improves "IQ", but also has online emotional intelligence, allowing it to better understand and express emotions.

This update also includes a real-time voice call function, which relies on Doubao's latest large model and can flexibly adjust details such as conversation rhythm, voice, volume, and breath sounds in different scenarios. In addition, the new voice function can also imitate different voices, support multiple dialects and English conversations, and even have the ability to sing some songs. All of this has raised the realism of human-machine dialogue to a new level, almost reaching the point where it is "difficult to distinguish between man and machine".

Doubao’s R&D team stated that this new technology is based on an end-to-end framework and uses native methods to deeply integrate speech and text patterns for unified modeling. Such a design not only optimizes the process of speech recognition and generation, but also gives AI a richer "soul" so that it can better communicate with humans.

The launch of Doubao's real-time voice large model in the field of Chinese voice dialogue will provide users with an unprecedented interactive experience and promote the development of intelligent voice technology.

The launch of the Doubao real-time voice model marks significant progress in intelligent voice interaction technology, and its outstanding performance in the field of Chinese dialogue is exciting. In the future, with the continuous development of technology, I believe that similar speech models will bring more convenience and surprises to people's lives.