Nous Research is conducting a groundbreaking experiment: using globally distributed machines to pretrain a 1.5 billion parameter large language model (LLM). This experiment subverts the traditional centralized training model, avoids expensive and energy-consuming data centers, and broadcasts the training process in real time through its website distro.nousresearch.com, demonstrating model performance and hardware location maps. This move not only reduces training costs, but more importantly, it is expected to lower the entry barrier for large language models, allowing more small teams and individuals to participate in the research and development of generative AI.
In the rapidly developing field of generative AI, the Nous Research team is conducting a unique experiment: they are using machines distributed around the world to pre-train a 1.5 billion parameter large language model (LLM), a process that avoids the traditional requires centralized development in expensive and power-hungry data centers or superclusters.
Nous Research also broadcasts the pre-training process live on its dedicated website distro.nousresearch.com, showing the model's performance on various evaluation benchmarks in real time, and providing a map of the hardware locations participating in the training, covering multiple locations in the United States and Europe. As of the publication of this article, the remaining time for pre-training is approximately 57 hours (i.e. 2.3 days), and more than 75% of the training progress has been completed.
Pre-training is the first and most basic step in training LLM, which involves training a large amount of text data to learn the statistical properties and structure of language. At this stage, the model captures the patterns of language, syntax, and contextual relationships between words by processing extensive text data sets. This process gives the model a broad understanding of language, the ability to generate coherent text and perform a variety of language-related tasks. After pre-training, the model also needs to be fine-tuned for specific tasks or domains.
If this plan is successful, Nous Research will prove that cutting-edge LLM can still be trained without expensive super clusters or low-latency transmission, marking a new era of distributed AI training. This open-source training approach could change the power dynamics of generative AI, making smaller teams and non-corporate actors more competitive in this space.
The new technology used by Nous is called Nous DisTrO (Distributed Training Over-the-Internet), which is designed to reduce the communication bandwidth requirements between GPUs during the pre-training process. According to the latest release from Nous Research, DisTrO can reduce communication requirements by up to 10,000 times, allowing competitive convergence rates and loss curves to be maintained over slower and more affordable Internet connections.
In addition, the core breakthrough of DisTrO is to effectively compress the amount of data exchanged between GPUs without affecting the performance of the model. This technology builds on the earlier Decoupled Momentum Optimization algorithm (DeMo), which also aims to significantly reduce inter-GPU communication requirements while maintaining training performance.
In terms of hardware, Nous Research's pre-training process is supported by many well-known partners such as Oracle, Lambda Labs, Northern Data Group, Crusoe Cloud and Andromeda Cluster, which jointly provide the required heterogeneous hardware to fully test DisTrO in actual distributed systems. ability in the environment.
Blog entrance: https://nousresearch.com/
This experiment by Nous Research not only made a breakthrough in technology, but more importantly, it provided a new idea and possibility for AI researchers around the world, heralding a change in the AI training model. In the future, perhaps more similar distributed training projects will emerge, further lowering the entry threshold for AI technology and promoting the vigorous development of the AI field.