OpenAI releases multilingual AI data set to promote global language equality

Author：Eve Cole Update Time：2024-12-01 20:50:01

The editor of Downcodes learned that OpenAI recently released a blockbuster multi-lingual data set MMMLU, aiming to evaluate the performance of AI in 14 languages, covering Arabic, German, Swahili and other languages, and it was used in Hugging Face published publicly on the platform. This move marks another important progress made by OpenAI in the global AI field, filling the gap in AI research focusing on low-resource languages, and also providing new ways for enterprises and governments to better interact with global users. The release of the MMMLU data set will undoubtedly promote the development and application of multi-language AI technology.

Recently, OpenAI launched a blockbuster multilingual data set designed to evaluate the performance of artificial intelligence in 14 languages, including Arabic, German, Swahili, Bengali and Yoruba.

This data set, called "Multi-Language Large-Scale Multi-Task Language Understanding" (MMMLU), has been released on the open data platform Hugging Face, marking another important progress of OpenAI in the global AI field.

Dataset entrance: https://huggingface.co/datasets/openai/MMMLU

The previous "Large-Scale Multi-Task Language Understanding" (MMLU) dataset was only evaluated on English and covered 57 subjects such as mathematics, law, and computer science. The newly released MMMLU data set focuses on multiple languages and aims to fill the gap in low-resource languages in AI research. OpenAI’s move this time is to meet the growing needs of enterprises and governments so that AI systems can better interact with users around the world.

To ensure high accuracy of the dataset, OpenAI relies on professional human translators to create the MMMLU dataset. This is especially important because many automated translation tools are prone to subtle errors when processing low-resource languages, which can have serious consequences in precision-critical industries such as healthcare, legal, and finance. Therefore, OpenAI uses human translation to ensure that the data set can provide a reliable basis for the evaluation of multilingual AI models.

At the same time, OpenAI also announced the launch of "OpenAI Academy", a project designed to support developers and mission-minded organizations, especially in low- and middle-income countries, to use AI technology to solve local problems. OpenAI will provide training, technical guidance, and US$1 million in API usage credits to help local AI talents access the latest resources.

For enterprises, the MMMLU dataset provides a good opportunity for the evaluation of their AI systems in the global market. Whether it is customer service, content moderation or data analysis, AI systems that can perform well in multiple languages will help companies reduce communication barriers and improve user experience.

As more companies and researchers begin to use this multilingual benchmark for testing, the multilingual capabilities of AI systems will become increasingly important in the future. The release of OpenAI's data set not only positions it in the field of multilingual AI, but also actively promotes future technology development.

Highlight:

? OpenAI released the MMMLU data set, covering 14 languages, to promote the research and application of multilingual AI.

?‍?The data set is produced by professional human translators to ensure high accuracy, especially suitable for high-demand industries.

OpenAI Academy launched to provide support to promote the growth and development of AI developers in low-income countries.

All in all, the MMMLU data set released by OpenAI and its supporting OpenAI Academy project demonstrate its determination and actions to promote the development of global AI and promote the inclusiveness of AI technology. This will have a positive impact on multilingual AI research and application, and deserves the attention of the industry.