Zhipu AI has released a new generation of video generation model CogVideoX, the core of which is to significantly improve the efficiency and quality of video generation, marking the company's major breakthrough in the field of multi-modal technology. CogVideoX has achieved multiple innovations at the technical level and has been opened to users through the Zhipu Qingyan platform, providing convenient AI video generation services and providing API calling interfaces for enterprises and developers.
Zhipu AI has launched a new generation of video generation model CogVideoX, marking another important progress in the company's multi-modal technology development.
CogVideoX’s core technical features include:
Three-dimensional variational autoencoder structure (3D VAE): This structure independently developed by Zhipu AI can compress the original video data to 2% of the original size, reducing the cost and difficulty of training. Combined with the 3D RoPE position encoding module, it improves the ability to capture inter-frame relationships in the time dimension and establishes long-term dependencies in videos.
End-to-end video understanding model: It enhances the model's ability to understand text and follow instructions, ensuring that the generated video is more in line with user needs and can handle ultra-long and complex prompt instructions.
Transformer architecture that integrates text, time, and space in three dimensions: Expert Block is innovatively designed to align text and video modal spaces, and optimizes the interaction between modalities through the Full Attention mechanism.
The CogVideoX model has been launched on the PC, mobile applications and mini-programs of Zhipu Qingyan. Users can experience AI text-generated video and image-generated video services for free through the "Ying" function. The main features of Qingying include rapid generation, efficient command following capabilities, content coherence and screen scheduling flexibility.
In addition, bigmodel.cn, the open platform for big models, has also deployed "Qingying", and enterprises and developers can use its functions through API calls. Zhipu AI has verified the effectiveness of Scaling Law in the field of video generation, and will continue to expand the data scale and model scale, and research new model architectures to compress video information more efficiently and integrate text and video content more comprehensively.
Experience address: https://top.aibase.com/tool/qingying-ai-shipinshengchengfuwu
The launch of CogVideoX not only provides users with a more convenient AI video generation experience, but also indicates that AI video generation technology will usher in a new stage of development. In the future, Zhipu AI will continue to explore more advanced model architectures and technical solutions to promote the continuous advancement of AI video generation technology.