Stability AI releases new Stable Diffusion 3.5 generation model, three versions, greatly improved speed

Author：Eve Cole Update Time：2024-11-27 20:36:01

The editor of Downcodes learned that Stability AI has recently released its text-to-image generation model Stable Diffusion 3.5, which includes three versions: Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large Turbo and Stable Diffusion 3.5 Medium, designed to meet the needs of different users, from Professionals to casual enthusiasts. This update is Stability AI's response to the shortcomings of previous versions and aims to improve its competitiveness and compete with platforms such as OpenAI's DALL-E and Midjourney. The new model has significant improvements in image quality, generation speed, and ease of use, and introduces query-key normalization technology to enhance the model's customization and responsiveness to prompts.

Stability AI recently launched its latest deep learning text-to-image generation model - Stable Diffusion3.5. This release includes three improved open source models designed to meet the needs of different users, including researchers, enterprise customers and enthusiasts.

Among them, Stable Diffusion3.5Large is the most powerful model in the entire series, with parameters as high as 8.1 billion. This model is ideal for professional users due to its excellent image quality and high responsiveness to prompts, capable of producing high-quality images with resolutions up to 1 megapixel.

In addition, Stable Diffusion3.5Large Turbo is a simplified version of Stable Diffusion3.5Large. It greatly improves the speed while generating high-quality images. It only takes 4 steps to complete image generation. It is more efficient than the previous version and is suitable for users who need to create quickly.

Another new model is Stable Diffusion3.5Medium, which has 2.5 billion parameters. The model uses an improved MMDiT-X architecture and training method, and is designed to be used "out of the box" and run smoothly even on consumer-grade hardware. It strikes a good balance between image generation quality and ease of customization, producing images from 0.25 to 2 megapixels.

The background to this launch is that after the June release of Stable Diffusion3Medium failed to meet expectations, Stability AI decided to launch a more transformative solution. The company said it hopes to regain market competitiveness with this update to face challenges from platforms such as OpenAI's DALL-E and Midjourney.

An important technical innovation of the new model is the introduction of Query-Key Normalization technology. This innovation enhances model customization and responsiveness to prompts, allowing users to achieve more consistent results with explicit prompts, as well as richer image interpretations when using broader prompts.

The Stable Diffusion3.5 series of models will be released under Stability AI’s community license, allowing users to use it for free for non-commercial use. At the same time, entities with an annual revenue of less than US$1 million can also use it for free, and users with more than this income need to apply for an enterprise license.

All models and their required weights for self-hosting will be available on Hugging Face and Stability AI's APIs. Additionally, ControlNets functionality providing advanced image customization options is expected to be launched in the coming days.

Official entrance:

https://stability.ai/stable-image

Three versions of Hugging Face entrance:

https://huggingface.co/stabilityai/stable-diffusion-3.5-large

https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo

https://huggingface.co/stabilityai/stable-diffusion-3.5-medium

All in all, the launch of the Stable Diffusion 3.5 series marks an important advancement in text-to-image generation technology, providing users with more choices and more powerful features. The editor of Downcodes looks forward to the emergence of more innovative features in the future.