Nvidia's latest open source image generation model SANA, with its small body and powerful performance, set off a wave in the field of AI image generation. SANA has only 060 million parameters, but it can generate high -definition images of 4096 × 4096 pixels, and achieve a sub -second -level generation speed on the 16GB graphics card. This is due to its innovative deep compression self -encoder and linear diffusion converter, as well as optimization of text encoding and reasoning strategies. Its performance is outstanding in similar models, and it is not even inferior to the model with larger parameters.
Recently, Nvidia opens an image generating model called SANA. This model has only 60 million parameters, which greatly reduces the operating threshold.
It is understood that SANA can generate 4096 × 4096 resolution images, and can run on a 16GB graphics card. In less than 1 second, a high -quality picture of 1024 × 1024 resolution is generated. This speed is prominent in similar models Essence
The research team introduced a deep compression self-encoder (DC-AE). Compared with the traditional self-coder, the compression ratio of SANA is as high as 32 times, which greatly reduces the number of potential marks. This is for generating ultra-high resolution images. It is important. Secondly, SANA uses a linear diffusion converter (DIT), which replaces the traditional secondary attention with linear attention, thereby reducing the complexity to O (N). Capture ability. Such a design has increased SANA's delay of generating 4K images by 1.7 times.
In terms of text encoding, SANA chose a small decoder -specific large language model Gemma, replacing the traditional T5 model. Gemma's performance in understanding and execution of complex instructions is better, which enhances the ability to align between images and text. In addition, SANA also optimizes training and reasoning strategies, and improves the consistency of text and images through the description of automatic marking and selecting high CLIP scores. The newly proposed Flow-DPM-Solver algorithm reduced the reasoning steps to 14-20, significantly improved performance.
In terms of comprehensive performance, SANA performed well in multiple advanced texts to image diffusion models. At 512 × 512 resolution, the throughput of SANA-0.6 is 5 times that of Pixart-σ, and it is excellent in image production quality. Under the resolution of 1024 × 1024, SANA-0.6B also has a significant advantage in the model of less than 300 million parameters.
SANA-0.6B not only has strong performance, but also can quickly generate images on the 16GB notebook GPU, helping the content creators to achieve creative goals efficiently. It is said that SANA0.6B also has competitiveness and FLUX-12B. The number of parameters is only 1/20, but the speed is 100 times faster.
Interestingly, SANA prompts support English, Chinese and emoji. Users can enter Chinese poems to generate artistic images related to them. In addition, SANA also has certain security. When the user enters improper vocabulary, the system will automatically replace it with a red heart pattern to avoid the production of discomfort.
For example, the AIBASE input prompt word "A cat is playing in the grass, the stars", the generating speed is fast, and the effect is special.
Another example is to give the prompt word "A cute one, ink painting style", you can see that the model can accurately identify EMOJI.
It is worth mentioning that SANA has received official support for ComfYUI and is equipped with LoRa training tools. This makes users more convenient during use and greatly improved practicality. Interested friends can try themselves.
Project entrance: https://nv-sana.mit.edu/
Points:
** Efficient generation **: SANA can quickly generate high -quality images of up to 4096 × 4096 resolution, suitable for use on ordinary laptop GPUs.
** Innovative design **: Deep compressor and linear diffusion converter have greatly improved the generation speed and quality.
** Excellent performance **: SANA performed well in a number of tests, with a significant increase in throughput than other advanced models, and supports fast content creation.
All in all, SANA has brought a new AI image generation experience to users with its efficient generating speed, high -quality image output, and convenient use methods, and it is worth looking forward to its future development.