A research team from the National University of Singapore has developed a new image generation framework called OminiControl, which significantly improves the flexibility and efficiency of image generation. It cleverly combines image conditioning and a pre-trained diffusion transformer model (DiT) to achieve unprecedented control capabilities, even complex subject integration can be easily achieved. The editor of Downcodes will give you an in-depth understanding of the uniqueness of OminiControl and the changes it brings to the field of image generation.
Simply put, as long as you provide a material picture, you can use OminiControl to integrate the theme in the material picture into the generated picture. For example, the editor of Downcodes uploaded the material picture on the left and entered the prompt word "The chip man is placed next to the table in a doctor's office, with a stethoscope placed on the table." The generated effect is relatively general, as follows:
The core of OminiControl lies in its "parameter reuse mechanism". This mechanism enables the DiT model to effectively handle image conditions with fewer additional parameters. This means that compared to existing methods, OminiControl only needs 0.1% to 0.1% more parameters to achieve powerful functions. Furthermore, it is able to uniformly handle multiple image conditioning tasks, such as subject-based generation and the application of spatial alignment conditions, such as edges, depth maps, etc. This flexibility is particularly useful for topic-driven generation tasks.
The research team also emphasized that OminiControl achieves these capabilities by training generated images, which is particularly important for topic-driven generation. After extensive evaluation, OminiControl significantly outperforms existing UNet models and DiT adaptation models in both topic-driven generation and spatially aligned conditional generation tasks. This research result brings new possibilities to the creative field.
To support broader research, the team also released a training data set called Subjects200K, which contains more than 200,000 identity-consistent images and provides an efficient data synthesis pipeline. This dataset will provide researchers with a valuable resource to help them further explore the topic consensus generation task.
The launch of Omini not only improves the efficiency and effect of image generation, but also provides more possibilities for artistic creation.
Online experience: https://huggingface.co/spaces/Yuanshi/OminiControl
github:https://github.com/Yuanshi9815/OminiControl
Paper: https://arxiv.org/html/2411.15098v2
The emergence of OminiControl marks a significant leap in image generation technology. Its efficient parameter reuse mechanism and powerful control capabilities have opened up new paths for artistic creation and scientific research. In the future, with the continuous development of technology, I believe OminiControl will play an important role in more fields and bring us a more amazing image generation experience.