Meta's latest release of the Llama 3.1 series of open source models has achieved significant breakthroughs in performance, and its 405B parameter version even surpasses some closed source models. Among them, the Llama3.1-8B-Instruct version supports multiple languages, with a context length of up to 131072 tokens, and is trained with massive synthetic data to improve its reasoning capabilities in areas such as code and mathematics. Based on this model, the OpenBuddy team launched the OpenBuddy-Llama3.1-8B-v22.1-131K model that supports Chinese question and answer and cross-language translation, demonstrating the potential of open source models in multi-language applications.
Meta recently released a new generation of open source model series Llama3.1, which includes a 405B parameter version whose performance is close to or even surpasses closed source models such as GPT-4 in some benchmark tests. Llama3.1-8B-Instruct is an 8B parameter version in the series, supports English, German, French, Italian, Portuguese, Spanish, Hindi and Thai, context length up to 131072tokens, knowledge deadline updated to 2023 December of the year.
To enhance the capabilities of Llama3.1-8B-Instruct, Meta used more than 25 million pieces of synthetic data in training, which were generated by the larger 405B model. This allows Llama3.1-8B-Instruct to show similar cognitive and reasoning capabilities to GPT3.5Turbo in coding, mathematics and other tests.
OpenBuddy uses the Llama3.1-8B-Instruct model and trains on a small amount of Chinese data to release OpenBuddy-Llama3.1-8B-v22.1-131k, a new generation with Chinese question and answer and cross-language translation capabilities Open source cross-language model. Although Llama3.1 itself does not have Chinese capabilities, after training, the model is able to generate answers that usually only larger models can generate on some questions that are prone to conceptual confusion, showing stronger cognitive potential.
However, due to limitations of training data set and time, OpenBuddy-Llama3.1-8B-v22.1 still has limitations in Chinese knowledge, especially traditional cultural knowledge. Despite this, the model shows relatively stable performance on tasks such as long text understanding, which benefits from its original long text capabilities.
In the future, OpenBuddy plans to conduct larger-scale training of the 8B and 70B models to enhance the model's Chinese knowledge reserve, long text ability and cognitive ability, and explore the possibility of fine-tuning the 405B model.
Project address: https://modelscope.cn/models/OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k
The release of the OpenBuddy-Llama3.1-8B-v22.1-131k model marks a new stage in the development of open source multi-language models. Although there is still room for improvement in Chinese knowledge, its potential is worth looking forward to. In the future, as the scale of model training expands, its performance is expected to be further improved. Look forward to more surprises from the OpenBuddy team in the future.