This article reviews the eight key development stages of the Beanbao model since its release on May 15, 2024, demonstrating its remarkable performance in speech recognition, music creation, video generation, image editing, programming capabilities, text understanding, visual perception, etc. Making progress. From the initial speech recognition breakthrough to the final realization of the Doubao-pro universal model aligned with GPT-4 capabilities, the Doubao model has achieved remarkable results in just 230 days, demonstrating strong technical strength and development potential. The article details the technological breakthroughs and application scenarios at each stage, and illustrates some of its functions with pictures.
1. Breakthroughs in speech recognition and emotional expressionThe Doubao model achieved a major breakthrough in the field of speech recognition in July: it can understand mixed conversations in more than 20 dialects, and has the ability to think while listening. Not only that, it has learned to express emotions in conversations, can interject freely in interactions, and even retains human language habits such as swallowing and accent. The core technology behind this is the beanbag speech recognition model Seed-ASR and the speech generation base model Seed-TTS. These models integrate a wider range of data and reasoning chains, giving them extremely strong generalization capabilities.
2. The birth of AI bandIn September, the Doubao large model creatively realized the concept of "AI band". From songwriting to performance generation to vocal singing, Doubao Master has mastered more than 10 music creation skills and can bring unexpected inspiration to music creation. The technology behind it is the Seed-Music framework, which combines the advantages of language models and diffusion models to implement a universal framework for music generation and has extremely high editing controllability.
3. Accurate video generation and lens controlIn the same month, the bean bag model further broke the boundaries of creation, able to follow complex prompt words, generate multi-subject high-definition videos, and accurately control the camera angle. With the help of two video generation models, PixelDance and Seaweed, Doubao Big Model can achieve high-quality video and sound effects simultaneous generation, providing creators with a more realistic and dreamy visual experience.
4. Upgrading of image editing and creation capabilitiesIn November, Doubao Big Model mastered the capabilities of "one-sentence P-picture" and "one-click poster generation". Users only need simple text commands to perform precise image editing and text generation. Through the continuously iterative Vincent graph model SeedEdit, Doubao can accurately present complex scenes and provide natural language-driven image editing.
5. A leap in programming abilityEntering December, Doubao's programming capabilities have been greatly improved, and he has become an AI programmer and data analyst. Through Doubao MarsCode, users can easily implement code writing, data processing and visual analysis. Doubao's large code model Doubao-coder deeply supports 16 programming languages and can meet the needs of full-stack programming such as front-end and back-end development and machine learning.
6. Extreme text understanding and processing capabilitiesThe Doubao large model also breaks through the limit of the context window, increasing it to 3 million words, capable of processing larger text, and with a processing delay of only 15 seconds per million tokens. Through linked data algorithms such as STRING, the Beanbao large model can quickly acquire massive external knowledge and provide more accurate understanding capabilities.
7. Breakthroughs in visual perception and deep thinkingIn mid-December, the large bean bag model achieved visual perception and was able to integrate multiple senses for in-depth thinking. It can not only accurately understand images, but also perform complex operations, such as taking a picture of a calculus math problem, demonstrating its excellent cross-modal learning and reasoning capabilities.
8. Fully upgraded general model Doubao-proIn mid-December, the Doubao general model Doubao-pro was fully upgraded, its capabilities were fully aligned with GPT-4, and it learned to "reflect" during the answer process. This upgrade improves Doubao-pro's understanding accuracy and generation quality, making it an efficient "hexagon warrior" with balanced performance in various abilities and becoming another benchmark in the AI field.
This year, the Doubao Big Model team has made significant progress in basic AI research. The team has published 57 papers and appeared at top conferences such as ICLR, CVPR, and NeurIPS. In addition, the Doubao Big Model team has in-depth cooperation with many top universities and established joint laboratories to promote the development of AI technology.
The large bean bag model is not only a breakthrough in technology, but also widely used in many industries. Through the Volcano Engine, Doubao Big Model serves more than 30 industries, and the average daily token calls exceed 4 trillion, an increase of 33 times from the time of release in May.
Official address: https://mp.weixin.qq.com/s/KVfu86njzyK2iK4j6VJONw
All in all, the rapid development and widespread application of the bean bag model indicates the huge potential of artificial intelligence technology in various fields, and its future development is worth looking forward to.