NovaSky, a research team at the Sky Computing Laboratory at the University of California, Berkeley, recently released an inference model called Sky-T1-32B-Preview, which performed excellently on multiple key benchmarks, even comparable to the early version of O1 of OpenAI. . What is even more striking is that the training cost of this model is extremely low, showing a new trend in efficient and economical artificial intelligence development.
Sky-T1-32B-Preview is the first truly open source reasoning model. The NovaSky team not only exposes the model itself, but also provides the training dataset and necessary training code so that the model can be completely copied. According to the team's blog, "Sky-T1-32B-Preview's training costs are less than $450, which proves that advanced reasoning capabilities can be achieved at a low cost." In contrast, in the past, training of similar performance models tends to be done in the past. Requires millions of dollars in investment. This significant reduction in cost is mainly attributed to the use of synthetic training data. For example, the recently released Palmyra X004 model by artificial intelligence company Writer relies almost entirely on synthetic data for training, with a development cost of only $700,000.
Inference models are different from ordinary artificial intelligence models. They have the ability to verify their facts and can effectively avoid some common mistakes. However, inference models often take longer to come up with solutions, ranging from seconds to minutes. Nevertheless, its reliability in fields such as physics, science and mathematics makes it ideal for these fields.
The NovaSky team revealed that they used Alibaba's QwQ-32B-Preview inference model to generate the initial training data of Sky-T1, and then sorted the data and reconstructed the data into a more usable one using OpenAI's GPT-4o-mini. Format. It takes about 19 hours to train Sky-T1 with 32 billion parameters using 8 Nvidia H100 GPU racks, and the number of parameters directly reflects the model's problem-solving ability.
In performance testing, Sky-T1 outperformed the early preview version of o1 on the MATH500 (a set of "contest-level" mathematical challenges) and also beat the preview version of o1 on a set of coding puzzles from LiveCodeBench. However, Sky-T1 is not as good as the o1 preview version on GPQA-Diamond, which contains physics, biology and chemistry-related issues that doctoral graduates should master. In addition, OpenAI's o1GA version is more powerful than the preview version, and OpenAI expects to release a better-performing inference model o3 in the coming weeks.
Nevertheless, the NovaSky team said that Sky-T1 is just the starting point for them to develop an open source model with advanced reasoning capabilities. “Looking forward, we will focus on developing more efficient models, maintaining strong inference performance, and exploring advanced technologies to further improve the efficiency and accuracy of models when testing,” the team wrote in the post, “stay tuned us Progress made in these exciting plans. "The emergence of this open source reasoning model undoubtedly brings new opportunities and challenges to the field of artificial intelligence, and its future development deserves continuous attention.