Alibaba launches AtomoVideo text + picture video model, comparable to Gen-2 and Pika

Author：Eve Cole Update Time：2025-01-04 19:48:01

Alibaba’s technical team recently released the AtomoVideo model, an AI model that can generate high-fidelity videos based on text and images. The technical breakthrough of this model lies in its innovative multi-granularity image injection and temporal modeling technology, which has demonstrated performance comparable to commercial models in evaluations. This marks significant progress in the field of video generation, brings unlimited possibilities for future video content creation, and also heralds the continued breakthrough of AI in the field of multi-modal content generation.

The Alibaba technical team launched the AtomoVideo model, which can generate high-fidelity videos from text and pictures. Technical innovations include multi-granularity picture injection and time modeling. Evaluations show that it is comparable to commercial models, bringing new possibilities to the field of video generation.

The emergence of the AtomoVideo model not only improves the efficiency and quality of video generation, but also provides new tools and ideas for video content creation in all walks of life. In the future, with the continuous advancement of technology, I believe that the AtomoVideo model will play a role in more fields and bring us more exciting video content.