iFlytek officially released the large-scale Xinghuo simultaneous interpretation voice model, marking the official launch of China's first large-scale model with end-to-end voice simultaneous interpretation capabilities. This model has been applied to iFlytek Translator, achieving almost no delay in simultaneous speech interpretation from English to Chinese, significantly improving translation speed and accuracy, and is suitable for a variety of international communication scenarios, such as overseas travel and international exhibitions. Its core technology is to support reverse control of translation length and improve the naturalness and fluency of translation through streaming speech synthesis technology. The performance of this model surpasses interpretation technologies such as Google Gemini 2.0 and OpenAI GPT-4o, and can achieve a simultaneous interpretation delay of less than 5 seconds at the fastest, reaching the level of human expert translators.
Today, iFlytek officially released its newly developed Spark simultaneous interpretation large-scale voice model, marking the official launch of the first domestic large-scale model with end-to-end voice simultaneous interpretation capabilities. Compared with iFlytek's previous translation technology, this innovative technology has significantly improved the translation effect in all scenarios and greatly shortened the end-to-end response time.
The release of the Xinghuo simultaneous interpretation voice model brings users a smoother and more accurate simultaneous interpretation experience. In the iFlytek demonstration, the iFlytek translator equipped with a large-scale Spark simultaneous interpretation voice model achieved almost no delay in English-Chinese voice simultaneous interpretation, which is very suitable for use in scenarios such as overseas travel and international exhibitions. This optimization not only significantly improves the rendering speed of translated subtitles, but also ensures the accuracy of Chinese-English translation.
It is understood that the Xinghuo simultaneous interpretation speech large model supports reverse control of the translation length. During the end-to-end translation process from speech to text, it can perform meaning group segmentation, context understanding and information reorganization in a streaming manner. At the same time, streaming speech synthesis technology can also support meaning-group prosodic connection and adaptive speech speed adjustment, further improving the naturalness and fluency of translation.
In international communication scenarios, whether it is daily dialogue, business communication or industry translation and other difficult simultaneous interpretation needs, the Xinghuo simultaneous interpretation voice large model has demonstrated excellent performance. Its content completeness, information accuracy and language quality are all at the leading level in the industry, and have surpassed translation technologies such as Google Gemini2.0 and OpenAI GPT-4o. The fastest possible simultaneous interpretation delay is within 5 seconds, reaching the level of human expert translators.
The release of iFlytek's large-scale Spark simultaneous interpretation voice model not only represents a major breakthrough in domestic AI translation technology, but also indicates that international communication will be more convenient and efficient in the future.
The emergence of the Xinghuo simultaneous interpretation voice model marks a new milestone in AI translation technology. In the future, it will better serve international exchanges and cooperation and promote the efficiency and convenience of global communication. Advances in technology will continue to improve people's lives and bring more possibilities to the world.