The Chinese research team made a major breakthrough and successfully created the largest public multimodal AI dataset "Infinity-MM", based on this, and trained a small model with excellent performance Aquila-VL-2B. This dataset contains massive image descriptions, visual instruction data, and data generated by AI models such as GPT-4. Its synthesis method combines RAM++ and MiniCPM-V models to ensure the quality and diversity of the data. The Aquila-VL-2B model has achieved excellent results in multiple benchmark tests, with only 2 billion parameters, but has a high score of 54.9% in the MMStar basic test, and is also outstanding in math and image comprehension tasks. Thanks to the effective use of synthetic data, the model performance has been improved by 2.4%. More importantly, the dataset and model have been opened to the research community, driving the development of open source AI.
This research result marks a significant progress in China's multimodal AI field. The success of Aquila-VL-2B and the opening of the Infinity-MM dataset will provide valuable resources to the global AI research community to promote multimodal AI Further development and application of technology. The Infinity-MM paper and Aquila-VL-2B project are available, please visit the relevant link to learn more.