Zhiyuan Research Institute launches a service including Vincent’s video model battle evaluation: FlagEval large model arena

Author：Eve Cole Update Time：2024-12-11 14:16:01

Beijing Zhiyuan Artificial Intelligence Research Institute (BAAI) launched the FlagEval large model arena on September 4, 2024. This is the world's first model battle evaluation service that includes Vincent's videos. The service is open to the public and covers about 40 large models at home and abroad. It supports customized online or offline evaluation of four major tasks: language question and answer, multi-modal image and text understanding, text-based pictures, and text-based videos, and innovatively introduces subjective Favors the ladder scoring system and strives to evaluate model performance more accurately. FlagEval not only provides evaluation of a variety of preset questions such as simple understanding, knowledge application, coding ability, reasoning ability, etc., but also uses an anonymous mechanism to ensure the fairness and objectivity of the evaluation process. Users can participate in the evaluation through the web or mobile terminal, and view the scoring results and arena rankings in real time.

On September 4, 2024, Beijing Zhiyuan Artificial Intelligence Research Institute (BAAI) announced the launch of the world's first model battle evaluation service including Vincent's video-FlagEval large model arena.

This service is open to users, covering about 40 large models at home and abroad, and supports customized online or offline evaluation of four major tasks, including language question and answer, multi-modal image and text understanding, Vincentian pictures, and Vincentian videos. The launch of the FlagEval large model arena not only provides evaluation of a variety of preset questions such as simple understanding, knowledge application, coding ability, reasoning ability, etc., but also introduces a subjective tendency ladder scoring system for the first time to more accurately reveal model performance differences.

The service adopts an anonymous mechanism for evaluation to ensure the fairness of the evaluation process. Users can participate in the evaluation through the web page or the first domestic mobile access portal and experience efficient model battle evaluation. The scoring results of FlagEval's large-scale model arena will be announced immediately to form an arena list, showing the combat capabilities of each model.

Zhiyuan Research Institute stated that it will open source the full-link data of model battle evaluation to promote the development of large model evaluation ecology. The launch of FlagEval's large-scale model arena further expands Zhiyuan's technical layout and research and development of tools and methods in the field of model evaluation, and provides new testing and evaluation tools for research and application in the field of artificial intelligence.

Experience address: https://flageval.baai.ac.cn/#/home

Zhiyuan Research Institute’s open source FlagEval large model arena data aims to promote the healthy development of the large model evaluation ecosystem and provide strong support for the continued progress in the field of artificial intelligence. Welcome to visit the experience address, participate in the evaluation, and jointly promote the development of AI technology!