Nous Research launches the revolutionary AI training optimizer DisTrO, breaking the situation where large AI model training is limited to large corporate giants. DisTrO can significantly reduce the amount of data transmission between multiple GPUs and can efficiently train AI models even in ordinary network environments. This will greatly lower the threshold for AI model training and allow more individuals and institutions to participate in the development of AI technology. Research and development in progress. This innovative technology is expected to completely change the research and development model in the field of AI and promote the popularization and development of AI technology.
Recently, the research team at Nous Research brought exciting news to the technology circle. They launched a new optimizer called DisTrO (Distributed Internet Training). The birth of this technology means that powerful AI models are not just the patent of large companies, but ordinary people also have the opportunity to use their own computers for efficient training at home.
The magic of DisTrO is that it can significantly reduce the amount of information that needs to be transferred between multiple graphics processing units (GPUs) when training an AI model. Through this innovation, powerful AI models can be trained under ordinary network conditions, and even allow individuals or institutions around the world to join forces to jointly develop AI technology.
According to a technical paper by Nous Research, the efficiency improvement of DisTrO is astonishing. The training efficiency using it is 857 times higher than that of a common algorithm-All-Reduce. At the same time, the amount of information transmitted in each training step is also reduced from 74.4GB to 74.4GB. 86.8MB. Such improvements not only make training faster and cheaper, but also mean that more people have the opportunity to participate in this field.
Nous Research stated on its social platform that through DisTrO, researchers and institutions no longer need to rely on a certain company to manage and control the training process, which provides them with more freedom to innovate and experiment. This open competitive environment helps promote technological progress and ultimately benefits the entire society.
In AI training, hardware requirements are often prohibitive. In particular, high-performance Nvidia GPUs have become increasingly scarce and expensive in this era, and only some well-funded companies can afford the burden of such training. However, Nous Research’s philosophy is exactly the opposite. They are committed to opening the training of AI models to the public at a lower cost and striving to allow more people to participate.
DisTrO works by reducing communication overhead by four to five orders of magnitude by reducing the need for full gradient synchronization between GPUs. This innovation allows AI models to be trained on slower internet connections, with the 100Mbps download and 10Mbps upload speeds easily accessible to many households today being sufficient.
In preliminary tests on Meta's Llama2 large language model, DisTrO showed comparable training results to traditional methods while significantly reducing the amount of communication required. The researchers also said that although they have only been tested on smaller models so far, they tentatively speculate that as the size of the model increases, the reduction in communication requirements may be more significant, even reaching 1,000 to 3,000 times.
It is worth noting that although DisTrO makes training more flexible, it still relies on GPU support, but now these GPUs do not need to be gathered in the same place, but can be dispersed around the world and collaborate through the ordinary Internet. We saw that DisTrO was able to match the traditional AdamW+All-Reduce method in terms of convergence speed when rigorously tested using 32 H100 GPUs, but it significantly reduced communication requirements.
DisTrO is not only suitable for large language models, but may also be used to train other types of AI such as image generation models. The future application prospects are exciting. In addition, by improving training efficiency, DisTrO may also reduce the environmental impact of AI training because it optimizes the use of existing infrastructure and reduces the need for large data centers.
Through DisTrO, Nous Research not only promotes technological advancements in AI training, but also promotes a more open and flexible research ecosystem, which opens up unlimited possibilities for future AI development.
Reference: https://venturebeat.com/ai/this-could-change-everything-nous-research-unveils-new-tool-to-train-powerful-ai-models-with-10000x-efficiency/
The emergence of DisTrO heralds the democratization process of AI training, lowers the threshold for participation, promotes the rapid development and widespread application of AI technology, and brings new vitality and unlimited possibilities to the AI field. In the future, we expect DisTrO to bring more surprises to the development of AI.