Caiyun Xiaomeng V3.5 is online! Breakthrough to improve Transformer efficiency - AI Articles

Author：Eve Cole Update Time：2025-02-05 20:00:02

Caiyun Technology released the general big model "Yun Jintianzhang" and Caiyun Xiaomeng V3.5 version based on DCFormer architecture, marking a major breakthrough in model architecture efficiency in the field of AI. The DCFormer architecture significantly improves model expression capabilities through a dynamically combined multi-head attention mechanism, solves the problem of inefficiency of traditional Transformer architectures, and effectively responds to the energy challenges facing AI development. This innovation has been published at the top international conference ICML and has received high praise.

In the field of AI, the Transformer architecture has always been the core technical support for mainstream big models such as ChatGPT and Gemini. This year, Caiyun Technology's paper "Improving Transformers with Dynamically Composable Multi-Head Attention" published at the top international conference ICML, is the first to propose the DCFormer architecture. Tests show that the DCPythia-6.9B model developed based on this architecture achieves a significant improvement of 1.7-2 times in performance to the traditional Transformer model. Regarding the energy challenges facing AI development, Yuan Xingyuan, CEO of Caiyun Technology, pointed out that according to forecasts, global AI power consumption may reach 8 times the current earth's power generation capacity by 2050. Nvidia CEO Huang Renxun said more vividly that at the current development speed, "14 planets, 3 galaxies, and 4 suns" may be needed in the future to provide energy support for AI. In response to this dilemma, Caiyun Technology chose to start from improving the underlying architecture of the model. By introducing a dynamically combined multi-head attention (DCMHA) mechanism, DCFormer has removed the fixed binding of attention heads in the traditional multi-head attention module (MHA), achieving more flexible dynamic combinations, thereby greatly improving the model expression ability. This innovation has enabled Caiyun Technology to score an average of 7 high scores in three papers at the ICML conference, and has become one of the only two companies in China to be invited to give a speech at ICML2024 in Vienna. As the first product of the DCFormer architecture, the new version of Caiyun Xiaomeng has shown excellent performance: it supports 10,000 words of long text input, the story background setting length can reach 10,000 words, and the overall fluency and coherence are improved by 20%. This means that AI can better maintain plot coherence, maintain the consistency of characters' personalities, and have the ability to reflect and correct plots. As one of the earliest companies in China to get involved in large language models, Caiyun Technology currently has three profitable AI products: Caiyun Weather, Caiyun Xiaomeng, and Caiyun Xiaoyi. The company said it will continue to increase its R&D investment in DCFormer, and is committed to breaking the traditional pattern of "foreign technology layer and domestic application layer" and promoting domestic AI technology to occupy an advantageous position in global competition. Through this technological breakthrough, Caiyun Technology not only demonstrates the strength of Chinese companies in the innovation of AI underlying architecture, but also provides new ideas for solving the energy bottlenecks in the development of AI, which is expected to accelerate the sustainable development of AI technology.

Caiyun Technology's innovation has brought new hope to the development of AI. Its breakthrough in underlying architecture will promote the continuous progress of AI technology and occupy a favorable position in global competition. It is worth looking forward to its future development.