Shanghai Step Star Intelligent Technology Co., Ltd. released the V2 version of its video generation model Step-Video on January 22, 2025. This version has been significantly upgraded in many aspects, such as adopting a more efficient VAE model and optimized DiT architecture to improve the efficiency and quality of video generation. In addition, Step-Video V2 also combines a self-developed multi-modal understanding large model and video knowledge base to make the generated video closer to the real world, and adds a basic text generation function to further expand application scenarios. This upgrade demonstrates Step Star’s strong technical strength in the field of video generation and provides more powerful tools for video creation.
On January 22, 2025, Shanghai Step-Video Intelligent Technology Co., Ltd. announced that its video generation model Step-Video was officially upgraded to the V2 version. This upgrade brings significant technological breakthroughs and feature improvements, making it even more powerful for real-world simulations.
The Step-Video V2 version has been optimized and innovated in multiple core technology areas. First of all, this version uses a VAE model with a higher compression ratio. Through efficient compression of space and time, it significantly reduces the computational complexity and improves the generation efficiency while ensuring the reconstruction quality. Secondly, Step-Video V2 deeply optimizes the DiT architecture and introduces reinforcement learning algorithms to further improve the smoothness and detailed expression of video generation. In addition, this version also combines a self-developed multi-modal understanding large model and video knowledge base, which can more accurately describe video content and lens language, and generate videos that are closer to the real world.
In practical applications, Step-Video V2 has demonstrated powerful complex motion generation capabilities, and can smoothly present dynamic images in scenes such as ballet, karate, and badminton. At the same time, the model performs well in capturing human expressions and can delicately present the expressions and light and shadow effects of real or fictional characters. In addition, Step-Video V2 also supports a rich lens language, including push, pull, shake, shift and other movement methods, as well as switching between different scenes, providing more possibilities for video creation.
It is worth mentioning that Step-Video V2 has added a basic text generation function, which can naturally integrate text into video content, and the generation effect is significantly better than the previous generation model. The addition of this function further expands the application scenarios of video generation.
Currently, Step-Video V2 has opened trial applications on the Yuewen web page (https://yuewen.cn/videos), and users can experience this upgraded and powerful function.
This upgrade not only marks Step Star’s technological progress in the field of video generation, but also provides creators with more powerful tools to promote video creation to a new stage.
The upgrade of Step-Video V2 has brought new possibilities to the field of video creation. Its powerful functions and ease of use will bring users a more convenient and efficient video creation experience. It is worth looking forward to its future development and application.