The editor of Downcodes learned that Cohere recently released two powerful open source AI models-Aya Expanse 8B and 35B, which are officially launched on the Hugging Face platform. These two models aim to bridge the performance gap between basic models in different languages, significantly improve the AI capabilities in 23 languages, and provide global AI researchers with more convenient tools and more powerful multi-language capabilities. The Aya project is committed to expanding access to non-English language basic models. Its data arbitrage method and "global preference" training strategy effectively avoid the generation of low-quality content and improve the overall performance and security of the model. Next, let’s dive into the specifics of both models.
Recently, Cohere announced the launch of two new open source AI models, aiming to narrow the language gap of basic models through its Aya project. The two new models, called Aya Expanse8B and 35B, are now available on Hugging Face. The launch of these two models has significantly improved the AI performance in 23 languages.
Cohere said in his blog that the 8B parameter model makes it easier for researchers around the world to achieve breakthroughs, while the 32B parameter model provides industry-leading multi-language capabilities.
The goal of the Aya project is to expand access to the base model to more languages other than English. Prior to this, Cohere's research department launched the Aya project last year and released the Aya101 Large Language Model (LLM) in February, which covers 101 languages. In addition, Cohere also launched the Aya dataset to aid model training on other languages.
The Aya Expanse model follows many of the core methods of Aya101 in its construction process. Cohere said the improvements to Aya Expanse are the result of years of rethinking core building blocks in machine learning breakthroughs. Their research direction mainly focuses on narrowing the language gap, and has achieved some key breakthroughs, such as data arbitrage, preference training for general performance and security, and model merging.
In multiple benchmark tests, Cohere said that Aya Expanse's two models outperformed similar-sized AI models from companies such as Google, Mistral, and Meta.
Among them, Aya Expanse32B outperformed Gemma227B, Mistral8x22B, and even the larger Llama3.170B in multi-language benchmark tests. The small 8B model also surpassed Gemma29B, Llama3.18B and Ministral8B, with winning rates ranging from 60.4% to 70.6%.
To avoid generating content that is difficult to understand, Cohere uses a data sampling method called data arbitrage. This approach enables better training of models, especially for low-resource languages. In addition, Cohere is focused on guiding models toward “global preferences” and taking into account the perspectives of different cultures and languages to improve model performance and security.
Cohere's Aya program seeks to ensure that LLMs can perform better in research in non-English languages. Although many LLMs will eventually be released in other languages, they often face the problem of insufficient data when training models, especially for low-resource languages. Therefore, Cohere's efforts are particularly important in helping to build multilingual AI models.
Official blog: https://cohere.com/blog/aya-expanse-connecting-our-world
Highlight:
? **Cohere launches two new AI models**, committed to narrowing the language gap of basic models and supporting performance improvements in 23 languages.
**The Aya Expanse model performs well**, outperforming many of its competitors on multilingual benchmarks.
**Data arbitrage method** helps the model avoid generating low-quality content, pay attention to global cultural and language perspectives, and improve the training effect of multi-lingual AI.
All in all, the Aya Expanse model launched by Cohere has made significant progress in the field of multilingual AI, and its open source nature also provides valuable resources for the research and development of the global AI community. The editor of Downcodes believes that this will further promote the progress of multi-lingual AI technology and promote global information exchange and sharing.