CogVideoX v1.5, the latest version of the open source video model of Zhipu AI, goes online with 10-second 4K "new clear video"

Author：Eve Cole Update Time：2024-12-01 09:00:01

The editor of Downcodes reports: The Zhipu technical team today released a major open source CogVideoX v1.5 video generation model, which is another major upgrade of the series since August. The new version has made a significant breakthrough in video generation capabilities, supporting longer videos, higher resolutions and smoother frame rates, and combined with the newly launched CogSound sound effect model to create a "new clear video" platform to provide users with better Premium video creation experience. This update not only improves video quality, but also enhances the model's ability to understand complex semantics, providing developers with more powerful tools.

It is understood that this update has greatly improved the video generation capabilities, including supporting 5-second and 10-second video lengths, 768P resolution, and 16-frame generation capabilities. At the same time, the I2V (image to video) model also supports any size ratio, further enhancing the ability to understand complex semantics.

CogVideoX v1.5 contains two main models: CogVideoX v1.5-5B and CogVideoX v1.5-5B-I2V, which are designed to provide developers with more powerful video generation tools.

What is even more noteworthy is that CogVideoX v1.5 will be simultaneously launched on the Qingying platform and combined with the newly launched CogSound sound effect model to become the "New Qingying" . New Qingying will provide a number of special services, including significant improvements in video quality, aesthetic performance and motion rationality, and support the generation of 10-second, 4K, 60-frame ultra-high-definition videos.

The official introduction is as follows:

Quality improvement: The ability of Tusheng videos in terms of quality, aesthetic performance, rationality of movement, and semantic understanding of complex prompt words has been significantly enhanced.
Ultra-HD resolution: Supports generating 10s, 4K, and 60-frame ultra-high-definition videos.
Variable ratio: supports any ratio to adapt to different playback scenarios.
Multi-channel output: The same command/picture can generate 4 videos at one time.
AI video with sound effects: Xinqingying can generate sound effects that match the picture.

In terms of data processing, the CogVideoX team focuses on improving data quality, developing an automated filtering framework to filter bad video data, and launching the end-to-end video understanding model CogVLM2-caption to generate accurate content descriptions. This model can effectively handle complex instructions and ensure that the generated video matches the user's needs.

In order to improve content coherence, CogVideoX uses efficient three-dimensional variational autoencoder (3D VAE) technology, which significantly reduces training costs and difficulty. In addition, the team also developed a Transformer architecture that integrates the three dimensions of text, time and space. By removing the traditional cross-attention module, the interactive effect of text and video is enhanced, and the quality of video generation is improved.

In the future, the Zhipu technical team will continue to expand the amount of data and model scale, and explore more efficient model architecture to achieve a better video generation experience. The open source of CogVideoX v1.5 not only provides developers with powerful tools, but also injects new vitality into the field of video creation.

Code: https://github.com/thudm/cogvideo

Model: https://huggingface.co/THUDM/CogVideoX1.5-5B-SAT

Highlight:

The new version of CogVideoX v1.5 is open source and supports 5/10 second video, 768P resolution and 16 frame generation capabilities.

The new Qingying platform is launched, combined with the CogSound sound effect model, to provide ultra-high-definition 4K video generation.

Data processing and algorithm innovation ensure the quality and consistency of the generated videos.

All in all, the open source of CogVideoX v1.5 and the launch of the new Qingying platform mark an important step in AI video generation technology, bringing more powerful tools and a broader creative space to developers and creators. We look forward to seeing more exciting applications based on CogVideoX in the future.