At the FORCE Motive Power Conference on December 18, 2024, Volcano Engine released a comprehensive upgrade of the beanbag large model family. The most eye-catching one is the launch of a new visual understanding model. The model supports simultaneous input of text and images, has stronger recognition, understanding and reasoning capabilities, and provides services to users at a very competitive price. This upgrade not only improves the application capabilities of the Beanbao large model in various fields, but also marks that visual understanding technology has entered a new stage of development, bringing more convenient and efficient AI solutions to enterprises and developers.
At the Volcano Engine FORCE Motive Power Conference on December 18, 2024, Volcano Engine announced a comprehensive upgrade of the bean bag large model family and released a brand new visual understanding model.
Tan Dai, president of Volcano Engine, said that the daily average usage of tokens of Doubao model has grown rapidly in the past few months, reaching more than 4 trillion, an increase of 33 times compared with the launch in May. This growing trend shows the widespread use of large beanbag models in multiple application scenarios.
This time, the Volcano Engine launched a visual understanding model, allowing users to input text and image questions at the same time, and the model can comprehensively understand and give accurate answers. This innovation will greatly simplify the application development process and activate the potential of large models in more scenarios.
The visual understanding model has stronger content recognition capabilities. It can not only identify basic elements such as object categories and shapes in images, but also understand the relationship between objects, spatial layout, and the overall meaning of the scene. For example, identifying shadows, identifying natural knowledge, etc.
The visual understanding model has stronger understanding and reasoning capabilities. It can not only better identify content, but also perform complex logical calculations based on the recognized text and image information, such as graphical reasoning and physical reasoning.
In addition, it also has a more delicate visual description ability, which can describe the content of the image in more detail based on image information, and can also create a variety of literary styles, such as image creation, image poetry creation, etc.
The Doubao visual understanding model shows broad application prospects in many fields such as education, tourism, and e-commerce. For example, in education, the model can help students optimize their compositions and popular science knowledge; in tourism, the model can provide tourists with translations of foreign menus and explanations of architectural background knowledge; in e-commerce marketing, it can help merchants describe product characteristics in detail , thereby improving advertising effectiveness.
The cost of using the visual understanding model is also very affordable. The price per thousand tokens is 0.003 yuan, which is 85% lower than the industry average price. This price level allows each dollar to process up to 284 720P images, marking the entry of visual understanding technology into the "centi era." In addition, Volcano Engine also provides enterprises and developers with up to 15,000 initial traffic support to help them better utilize this technology.
At this conference, Volcano Engine not only released a visual understanding model, but also upgraded multiple other models. The comprehensive task processing capability of Doubao Universal Model Pro has increased by 32% compared with May, and there have also been significant improvements in areas such as reasoning, instruction following, coding, and mathematics. At the same time, the beanbag and video generation model will be opened to the public in January 2025, and companies can make reservations for use.
In order to improve enterprises' information acquisition and search recommendation capabilities, Volcano Engine has also launched a global AI search service to help enterprises better connect information and user needs, and facilitate the intelligent transformation of various industries.
Highlight:
The average daily token usage of Doubao Big Model reached 4 trillion, an increase of 33 times compared with May.
The newly launched visual understanding model supports simultaneous input of text and images, and is suitable for fields such as education, tourism, and e-commerce.
The usage cost per thousand tokens is only 0.003 yuan, which is significantly lower than the industry average price.
In short, the large bean bag model upgrade and new visual understanding model released by Volcano Engine this time demonstrate its continuous innovation in the field of artificial intelligence and its deep understanding of user needs, providing strong technical support for the intelligent transformation of various industries.