Alibaba Cloud recently released its new visual model of Tongyi Qianwen big model - Qwen2.5-VL, and has opened sourced its three different size versions of 3B, 7B and 72B. This move marks a significant breakthrough in the field of AI vision. Its flagship version of Qwen2.5-VL-72B is even more powerful, winning the visual understanding championship in 13 authoritative reviews, surpassing GPT-4o and Claude3 . 5th-class competitors. Qwen2.5-VL not only has powerful image understanding capabilities, but also supports more than one hour of video understanding. It can efficiently extract key information in the video and supports multi-step complex operations, such as sending blessings, editing pictures, and subscribing. Tickets, etc.
Alibaba Cloud Tongyi Qianwen has opened sourced the new visual model Qwen2.5-VL, and launched three size versions in 3B, 7B and 72B.
Among them, the flagship Qwen2.5-VL-72B won the visual understanding championship in 13 authoritative reviews, surpassing GPT-4o and Claude3.5. Alibaba Cloud officially introduced that the new Qwen2.5-VL can more accurately parse image content and support more than 1 hour of video understanding. This model can search for specific events in the video and summarize key points of different time periods of the video, thereby quickly and efficiently helping users extract key information from the video.
In addition, Qwen2.5-VL can be transformed into a Visual Agents that can control mobile phones and computers without fine-tuning, achieving multi-step and complex operations, such as sending blessings to designated friends, computer photo editing, and mobile ticket booking wait. Qwen2.5-VL is not only good at identifying common objects, such as flowers, birds, fish and insects, but also analyses text, charts, icons, graphics, and layouts in images. Alibaba Cloud has also improved the OCR recognition capabilities of Qwen2.5-VL, and enhanced the multi-scene, multi-language and multi-directional text recognition and text positioning capabilities.
At the same time, the information extraction capability has been greatly enhanced to meet the growing digital and intelligent needs of qualification review, finance and commerce.
Points:
Alibaba Cloud Tongyi Qianwen open source Qwen2.5-VL, launching three versions of 3B, 7B and 72B.
Qwen2.5-VL-72B surpasses GPT-4o and Claude3.5 in visual comprehension evaluation.
Qwen2.5-VL supports video understanding over 1 hour and enhances OCR recognition capabilities.
The open source of Qwen2.5-VL will greatly promote the development of AI vision and bring more possibilities for innovative applications to all walks of life. Its powerful performance and wide application prospects will undoubtedly promote the further development and popularization of artificial intelligence technology.