The open-source Open-Sora project of the Luchen team has made breakthrough progress in the field of 720p high-definition video generation. Its efficient generation speed and high-quality output are amazing. The project quickly gained over 17.5K stars on GitHub and received widespread industry attention, with even Lambda Labs building a digital LEGO universe based on its model weights. Open-Sora is not only easy to use, as convenient as ordering takeout, but more importantly, it opens up the model weights and detailed technical routes, allowing more developers and enthusiasts to participate and promote the advancement of Wensheng video technology.
Recently, the Luchen Open-Sora team has achieved breakthrough progress in the quality and generation time of 720p high-definition Wensheng video. Not only did they make big news in the quality and generation time of 720p high-definition video, but they also made this baby open source, so that The whole community is excited!
It’s no exaggeration to say that their open source project makes video generation as easy as ordering takeout. Since its debut in March, it has received 17.5K stars on GitHub and is extremely popular!
Open source address: https://github.com/hpcaitech/Open-Sora
Open-Sora can generate 16-second 720p high-definition videos with one click. Whether it is exquisite portraits, cool sci-fi blockbusters, vivid and interesting animations, and smooth zoom effects, it can easily handle it. No, even Lambda Labs, an AI company in which Nvidia has a stake, has created a digital Lego universe based on the weight of the Open-Sora model, allowing Lego fans to find a new world of creativity.
The Luchen team not only open sourced the model weights, but also published the technical route on GitHub, allowing every player to become the master of the large video model. This technical report deeply analyzes the core and key points of model training, from video compression network to diffusion model algorithm to controllability. They use a 1.1B diffusion generation model to solve the pain points of video model training.
Report address: https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_03.md
The introduction of video compression network is the same method as OpenAI's Sora. It can perform 4 times compression in the time dimension without frame extraction, and can generate videos using the original FPS. The team also proposed a simple video compression network (ie VAE), which can first achieve 8x8 times compression in the spatial dimension, and then 4 times compression in the time dimension.
The latest diffusion model of Stable Diffusion3 improves the generation quality through rectified flow technology. The technologies provided by Luchen's team include rectification training, logit-norm time step sampling, etc., which speed up model training and reduce inference waiting time.
The report also revealed the core details of model training, including data cleaning, model tuning techniques, and the construction of a model evaluation system. They even provide one-click deployment of the Gradio application, which supports various parameter adjustments.
The open source of Luchen Open-Sora breaks the closed loop and injects vitality into the innovation and development of Vincent Video. Users have transformed from content consumers to creators, and enterprise users have unlocked new skills for independent development.
The open source of Open-Sora lowers the entry threshold for Wensheng video technology and provides unlimited possibilities for future creative content generation. It is worth looking forward to subsequent development and the exploration of more application scenarios.