AI image generation ushered in a new overlord! The open source model FLUX.1 was born, are Midjourney and DALL·E 3 nervous?

Author：Eve Cole Update Time：2024-12-05 17:16:01

The field of artificial intelligence image generation is changing with each passing day. Following the update of Midjourney, the open source model FLUX.1 has arrived. Its performance is said to surpass closed source models such as DALL·E3 and Midjourney V6, as well as the SD3 series of open source models, attracting widespread attention in the industry. The editor of Downcodes will give you an in-depth understanding of this new masterpiece created by Robin Rombach, an authoritative expert in the field of diffusion models, as well as the technological innovation and future prospects behind it.

In the field of artificial intelligence, disruptive changes can occur every day. Just the day after Midjourney's major update, the field of open source image generation ushered in an eye-catching dark horse-FLUX.1. This sudden new player not only claims to significantly surpass closed-source models such as DALL·E3 and Midjourney V6 in terms of performance, but also kills the entire open-source SD3 series, instantly detonating the AI circle.

Let’s first get to know the mastermind behind FLUX.1. Its founder, Robin Rombach, is not an unknown person, but an authoritative expert in the field of diffusion models. His representative works include VQGAN, Taming Transformers and Latent Diffusion. He once served as the chief scientist of Stability AI and led the world-renowned Stable Diffusion series of projects. It can be said that Robin Rombach is an experienced driver among experienced drivers in the field of AI image generation.

In March of this year, due to internal turmoil at Stability AI, Robin chose to leave. After four months of hard work, he returned with the new open source large model platform FLUX.1. What’s even more surprising is that upon its debut, FLUX.1 received a US$32 million seed round of financing led by the well-known venture capital institution Andreessen Horowitz. This undoubtedly provides a boost to the future development of FLUX.1.

So, what is so outstanding about FLUX.1? First of all, it is based on the Vision Transformer architecture, adopts a process matching training method, and uses rotation position embedding and parallel attention layers to improve model performance and hardware utilization efficiency. This 12 billion parameter model is launched in three versions:

Pro version: used through API, with the most powerful performance.
Dev version: A non-commercial guided distillation model that inherits most of the performance of the Pro version.
Schnell version: An open source model that can be used commercially and has excellent performance.

According to the test data of the FLUX.1 team, even the open source Schnell version surpasses Midjourney v6.0 and DALL·E3 (HD) in terms of text semantic restoration, picture quality, action consistency, coherence and diversity. and mainstream models such as SD3-Ultra. Especially in embedding text into images, FLUX.1 shows obvious advantages.

Here, AIbase has selected several official generation effect displays for your reference:

Real photography pictures

AIbase tested the previous cat patron saint, and it was no problem at all. FLUX.1 understood the prompt words more accurately.

Of course, FLUX.1’s ambitions obviously don’t stop there. The team said that Vincent Picture is just the beginning. In the future, they also plan to launch Vincent Video model to challenge first-line products such as Sora, Gen-3, and Luma.

For developers and AI enthusiasts, the emergence of FLUX.1 is undoubtedly a major benefit. The Schnell version is fully open source and supported by Comfyui. If you have more than 36G of video memory, you can even run the fp16 version of t5. However, it should be noted that t5xxl_fp16.safetensors or clip_l.safetensors and VAE need to be downloaded separately.

The emergence of FLUX.1 not only brings new hope to the field of open source AI image generation, but also injects new vitality into the entire AI industry. Its powerful performance and open source features are likely to accelerate the popularity and innovation of AI image generation technology. For ordinary users, this means that we may soon be able to run AI image generation models on our home computers that rival or even surpass Midjourney.

Project address: https://github.com/black-forest-labs/flux

Trial address: https://replicate.com/black-forest-labs/flux-pro

Comfyui workflow: https://comfyanonymous.github.io/ComfyUI_examples/flux/

All in all, the emergence of FLUX.1 marks a new stage in the field of open source AI image generation. Its powerful performance and open source features will greatly promote the popularity and development of AI image generation technology. We look forward to FLUX.1 bringing more surprises in the future!