ByteDance's commercialization technology team has open sourced its latest developed Vincent graph model Infinity. This model has made significant breakthroughs in image generation quality and inference speed, surpassing many industry-leading models, such as Stable Diffusion 3 and HART. , LlamaGen et al. The core innovation of the Infinity model lies in its unique Bitwise Token autoregressive framework and infinite vocabulary, which allows the model to capture finer image details and greatly improve the quality and performance upper limit of generated images. This article will introduce in detail the technical details, performance and open source situation of the Infinity model.
In the field of artificial intelligence, the Infinity model, the latest achievement of ByteDance's commercialization technology team, has become the new king in the field of autoregressive Vincentian graphs with its excellent performance and innovative technology. This new open source model not only surpasses Stable Diffusion3 in image generation quality, but also shows significant advantages in inference speed.
The core innovation of the Infinity model is the adoption of the Bitwise Token autoregressive framework. This framework significantly improves the model's ability to detect high-frequency signals by predicting the fine-grained "Bitwise Token" composed of +1 or -1 at the next level of resolution. capture capabilities, resulting in more detailed images. In addition, the Infinity model expands the vocabulary to infinity, greatly enhancing the representation space of the Image tokenizer and improving the performance upper limit of the autoregressive venogram.
In the performance comparison, the Infinity model performed outstandingly among autoregressive methods, far surpassing HART, LlamaGen, Emu3 and other methods, and defeated the HART model in human evaluation with a winning rate of nearly 90%. At the same time, Infinity also defeated SOTA's diffusion models such as PixArt-Sigma, SD-XL, SD3-Meidum, etc. with winning rates of 75%, 80%, and 65%, proving its advantages among models of the same size.
Another major feature of the Infinity model is its good scaling characteristics. As the model size increases and training resources are invested, the validation set loss steadily decreases and the validation set accuracy steadily increases. In addition, Infinity also proposed bit self-correction technology, which enhances the self-correction ability of the model and alleviates the cumulative error problem during autoregressive reasoning.
In terms of inference speed, Infinity inherits the speed advantage of VAR. It only takes 0.8 seconds for the 2B model to generate a 1024x1024 image, which is 3 times faster than SD3-Medium of the same size and 14 times faster than 12B Flux Dev. The 8B model is 7 times faster than the SD3.5 of the same size. The 20B model takes 3 seconds to generate a 1024x1024 image, which is nearly 4 times faster than the 12B Flux Dev.
At present, the training and inference code, demo, and model weights of the Infinity model have been launched in the GitHub warehouse, and a website experience is also provided to facilitate users to try out and evaluate the model effect.
Project page: https://foundationvision.github.io/infinity.project/
All in all, the Infinity model has brought new breakthroughs to the field of autoregressive Vincentian graphs with its advanced technical architecture, excellent performance and convenient open source methods, which deserves attention and further research. Its efficient inference speed and high-quality image generation capabilities give it great potential in practical applications.