In recent years, the field of artificial intelligence has made great progress in the integration of vision and language, especially the emergence of large-scale language models, which has injected new vitality into the development of multi-modal artificial intelligence systems. However, there are still challenges in building strong foundational models of vision and visual language. In order to meet this challenge, researchers from many well-known universities and research institutions collaborated to develop an innovative model called InternVL, which aims to improve the scale and versatility of the basic vision model to better cope with various vision models. language tasks.
Recently, the field of artificial intelligence has been focusing on the seamless integration of vision and language, especially with the emergence of large language models (LLMs), which has made significant progress. However, for multimodal AGI systems, the development of basic models of vision and visual language still needs to catch up. To fill this gap, researchers from Nanjing University, OpenGVLab, Shanghai Artificial Intelligence Laboratory, University of Hong Kong, Chinese University of Hong Kong, Tsinghua University, University of Science and Technology of China, and SenseTime Research proposed an innovative model - InternVL. This model expands the scale of vision-based models and adapts them to general visual language tasks. InternVL demonstrates its superior capabilities in tasks as diverse as image and video classification, image and video text retrieval, image captioning, visual question answering, and multimodal dialogue by outperforming existing methods on 32 general visual language benchmarks. .The emergence of the InternVL model marks a new stage in the development of visual language models. Its excellent results in multiple benchmark tests provide new directions and possibilities for the construction of future multi-modal artificial intelligence systems. It is expected that this model can play a role in more practical applications in the future and promote the development and application of artificial intelligence technology.