Challenging new heights of open source AI: DeepSeek V3 surpasses Llama3.1, with training data reaching 14.8 trillion tokens

Author：Eve Cole Update Time：2024-12-30 17:48:02

China has made a major breakthrough in the field of artificial intelligence! DeepSeek has released DeepSeek V3, an open source large language model with a parameter size of 671 billion. Its performance surpasses many mainstream closed source models including GPT-4. DeepSeek V3 not only performed well in programming competitions and code integration tests, but was also eye-catching with its efficient development cost—just two months and $5.5 million—which is in sharp contrast to the development investment of similar products. Behind this achievement is the strong support of quantitative hedge fund High-Flyer Capital Management, who invested in the construction of powerful server clusters.

Chinese artificial intelligence company DeepSeek recently released a landmark open source large language model DeepSeek V3. This model with 671 billion parameters not only exceeds Meta's Llama3.1 in scale, but also outperforms mainstream closed-source models including GPT-4 in multiple benchmark tests.

The outstanding features of DeepSeek V3 are its powerful performance and efficient development process. The model performed well in competitions on the programming platform Codeforces and led its competitors in the Aider Polyglot test, which tests code integration capabilities. The model training uses a huge data set of 14.8 trillion tokens, and the parameter size reaches 1.6 times that of Llama3.1.

AI 机器人人工智能 (2)

What’s even more striking is that DeepSeek completed model training in only two months and at a cost of US$5.5 million, which is far lower than the development investment of similar products.

The backer behind DeepSeek is Chinese quantitative hedge fund High-Flyer Capital Management. The fund invested in a server cluster with 10,000 Nvidia A100 GPUs worth approximately $138 million. Liang Wenfeng, founder of High-Flyer, said that open source AI will eventually break the monopoly advantage of the current closed model.

DeepSeek V3 is released under a permissive license, allowing developers to download, modify and use it for various applications, including commercial purposes. Although running the full version still requires powerful hardware support, the release of this open source model marks an important step for open innovation in the field of AI.

The open source release of DeepSeek V3 not only promotes the advancement of artificial intelligence technology, but also provides more opportunities for global developers, indicating that the future development of the artificial intelligence field will be more open and diversified. Its low-cost and high-efficiency training process also provides valuable experience and reference for other research institutions and companies, and it is worth looking forward to subsequent development.