ByteDance has launched the new AI system OmniHuman, which can generate realistic full-body videos based on a single photo, showing the characters' speeches, singing and natural movements. This technology combines a variety of inputs such as text, audio and human movements, and adopts a "full-condition" training method to learn from massive data. The resulting video quality is significantly improved, surpassing the previous ones that can only handle the face or upper body. AI model. The emergence of OmniHuman indicates that new changes will be seen in the fields of digital entertainment and communications, bringing unlimited possibilities to video creation, educational content production and digital communication.
OmniHuman is able to generate full-body videos to show the gestures and dynamics of characters when they speak, surpassing the AI models that can only simulate facial or upper body. The core of this technology is that it combines multiple inputs such as text, audio and human movements, allowing AI to learn from larger and richer data centers through an innovative approach called "full-condition" training.
The research team pointed out that OmniHuman has shown significant progress after more than 18,700 hours of human video data training. By introducing multiple conditional signals (such as text, audio, and poses), this technology not only improves the quality of video generation, but also effectively reduces data waste.
Researchers mentioned in a paper published in ARXIV that although the end -to -end -end technology of human animation has made significant progress in recent years, the existing methods still have limitations in expanding application scale.
OmniHuman has a wide range of application potential and can be used to make speech videos, demonstrate instrumental performances, etc. After testing, the technology outperforms existing systems on multiple quality benchmarks, showing its superior performance. This development comes amid the increasingly competitive AI video generation technology, with companies such as Google, Meta and Microsoft actively pursuing similar technologies.
However, although Omnihuman brings the possibility of changes to entertainment production, educational content creation, and digital communication, it has also aroused concerns about the potential misuse of synthetic media. The research team will show their research results at the upcoming computer vision conference, although the specific time and the meeting have not yet been announced.
Thesis: https://arxiv.org/pdf/2502.01061
Points:
Omnihuman is a new type of AI that can transform single photos into realistic whole body videos.
After 18,700 hours of human video data training, this technology combines a variety of input signals to improve the generating effect.
Despite its widespread application potential, it has also raised concerns about the potential for synthetic media to be abused.
The breakthrough of Omnihuman technology has set new benchmarks for the field of AI video generation, but at the same time, it also needs to pay attention to its potential ethical risks. It needs to be treated with caution in future applications to ensure that technology is used reasonably and avoid negative impacts. Looking forward to more applications and research results on OmniHuman in the future.