Nvidia recently opened two new models: NEMOTRON-4-Minitron-4B and Nemotron-4-Minitron-8B, which made significant breakthroughs in training efficiency. Through structured pruning and knowledge distillation technology, the data required for the training of these two models have been reduced by 40 times, and the cost of computing power has been reduced by 1.8 times. Its performance is comparable to other well -known big models. This not only represents a leap in AI technology, but also brings new possibilities to the AI field, and has contributed valuable resources to the AI community.
Traditional AI model training requires a lot of data and composition. However, Nvidia has significantly reduced this demand by using structured pruning and knowledge distillation. Specifically, compared with the training from scratch, the training token data required for the new model has been reduced by 40 times, and the cost of computing power has saved 1.8 times. Behind this achievement is Nvidia's in-depth optimization of the existing model LLAMA-3.18B.
Structural pruning is a neural network compression technology that simplifies the model structure by removing unimportant weights. Different from random branches, the structured branches retain the structure of the weight matrix. By removing the entire neurons or attention head, the model after the pruning is more suitable for efficient operation on hardware such as GPU and TPU.
Knowledge distillation is a way to improve performance by imitating the teacher model of student models. In Nvidia's practice, through Logit -based knowledge distillation, the student model can learn the deep understanding of the teacher model, and even if it greatly reduces training data, it can maintain excellent performance.
MINITRON-4B and Minitron-8B models trained by structured branches and knowledge distillation have increased by 16%on MMLU, and performance can be comparable to well-known models such as Mistral7b, Gemma7b and LLAMA-38B. This achievement proves the effectiveness of the Nvidia method, and also provides new possibilities for the training and deployment of large AI models.
This open source measure of Nvidia not only shows its leadership position in AI technology, but also brings valuable resources to the AI community. With the continuous progress of AI technology, we look forward to seeing more innovative methods to promote AI to develop in a more efficient and smarter direction.
Model address:
https://huggingface.co/nvidia/nemotron-4-minitron-4b-base
https://huggingface.co/nvidia/nemotron-4-minitron-8b-base
The two major models of Nvidia open source provide new ideas for the improvement of the efficiency of the AI field, and also indicate the further reduction of future AI model training costs and further expansion of the scope of application. Looking forward to more innovative applications based on this.