MDTv2 is released, and the training speed of Sora’s core component DiT is increased by 10 times

Author：Eve Cole Update Time：2025-02-10 13:00:04

The latest achievement of the team of Yan Shuicheng and Cheng Mingming, MDTv2, has made breakthrough progress in the field of artificial intelligence image generation. This model has significantly optimized Sora's core component DiT, greatly improved the training speed, and achieved the best results in the ImageNet benchmark test. The core innovation of MDTv2 is the introduction of Masked Diffusion Transformer, which effectively solves the bottleneck of diffusion models in learning semantic relationships, achieves significant improvements in image generation quality and efficiency, and sets a new benchmark for artificial intelligence image generation technology.

The article focuses on:

The team of Yan Shuicheng and Cheng Mingming released MDTv2, which improved the training speed of DiT, the core component of Sora, and set a new best result in the ImageNet benchmark. By introducing Masked Diffusion Transformer, the difficulty of diffusion model in learning semantic relationships is successfully solved. MDTv2 has made significant progress in both training speed and generation quality, demonstrating strong performance advantages.

The success of MDTv2 lies not only in its excellent performance, but also in its innovative improvements to diffusion model technology, which points out a new direction for the future development of artificial intelligence image generation technology. It is believed that more applications and research based on MDTv2 will emerge in the future, promoting the continuous progress of artificial intelligence technology.