Small but mighty! A team of 10 people built the first fine-tuned Llama 3.1 405B

Author：Eve Cole Update Time：2024-12-22 08:32:01

In the field of AI, there are many giants and competition is fierce. However, Nous Research, a start-up company composed of only 10 people, has successfully challenged the authority of technology giants with its strong technical strength and open source concept. Their newly released Hermes3 model is fine-tuned based on Llama 3.1, with a parameter size of 405B and amazing performance. It has been downloaded more than 33 million times, making it a phenomenal product in the AI industry. This article will delve into the excellent performance of the Hermes3 model, efficient training methods, and the innovative spirit of Nous Research.

A small team of only 10 people dared to challenge the status of the technology giant Meta. This is simply a real-life version of David defeating Goliath!

This startup called Nous Research is no unknown person. The Hermes3 they just launched is fine-tuned based on the 405B model of Llama3.1. Although the team has a small number of people, their strength cannot be underestimated. This ten-member team has successfully fine-tuned multiple models such as Mistral, Yi, Llama, etc., and has been downloaded more than 33 million times. It is simply a hot-selling machine in the AI industry!

The emergence of Hermes3 is like a shot in the arm in the AI world. Even after FP8 quantization, its performance is still staggeringly powerful. This optimization not only significantly reduces the VRAM and disk requirements of the model, but also allows Hermes3 to run on a single node, which is great news for developers!

In terms of conversation ability, Hermes3 is simply an all-rounder. Whether it's long-term memory, multiple rounds of dialogue, role-playing or internal monologue, it can handle it with ease. Thanks to Llama3.1's 128K context window, Hermes3 is a seasoned diplomat at keeping conversations coherent.

But Hermes3’s capabilities don’t stop there. It demonstrates a set of advanced capabilities that go beyond traditional language modeling to understand and evaluate the quality of generated text in a sophisticated and nuanced way. This means that it can not only be an eloquent speaker, but also a strict text critic!

What’s even more amazing is that Hermes3 also integrates several agent capabilities, including structured output, output of intermediate steps, and generation of internal monologues to achieve transparent decision-making. This is like equipping AI with a transparent brain, allowing us to peek into its thinking process.

The training process of Hermes3 can be called a devilish training in the AI world. It has gone through two stages: supervised fine-tuning (SFT) and direct preference optimization (DPO). The team spent a full 5 months screening and building the SFT data set, and their dedication and patience are simply awe-inspiring.

Nous Research, a private applied research group founded in 2023 and headquartered in New York, is simply a barbarian invader in the AI world. They firmly believe in the power of open source and vow to challenge the innovation limitations of closed technologies. The company's slogan is red-hot: We challenge the assumption that closed technologies will always occupy the pinnacle of innovation and, instead, deliver powerful open source code.

In just over a year, Nous Research has released 5 data sets and 89 models. This high output seems to declare to the world: size does not matter, strength is king!

Paper address: https://nousresearch.com/wp-content/uploads/2024/08/Hermes-3-Technical-Report.pdf

Official introduction: https://nousresearch.com/freedom-at-the-frontier-hermes-3/

The success of Nous Research and Hermes3 not only proves the power of open source, but also brings new vitality and possibilities to the field of AI. Small teams can also create miracles, which is undoubtedly a great encouragement to all AI practitioners. In the future, let us wait and see what more surprising results Nous Research will bring.