xAI Grok-2 squeezed into the second place in the chat robot rankings, closely chasing GPT-4o

Author：Eve Cole Update Time：2024-12-23 10:48:02

The xAI team’s latest large-scale language models, Grok-2 and Grok-Mini, have achieved impressive results on the LMSys chatbot Arena rankings. With its powerful performance, especially its outstanding performance in mathematical tasks, Grok-2 ranked second, tied with Google's Gemini model, and even surpassed OpenAI GPT-4o in May. This achievement was achieved by more than 6,000 people. Approval from community users. The Grok-Mini also performed well, finishing fifth. This significant improvement in ranking demonstrates the strong strength of the xAI team in the field of AI model research and development, and also provides new directions and possibilities for the future development of large-scale language models.

Data shows that the two models of the xAI team, Grok-2 and Grok-Mini, have officially entered the LMSys Chatbot Arena rankings. Among them, Grok-2 stands out in second place, surpassing OpenAI’s GPT-4o (5 month), alongside the latest Gemini model, supported by active votes from over 6,000 community users.

It is worth mentioning that Grok-2 performed particularly well on math tasks, winning first place in this category, and also achieved excellent second place results in multiple other tasks, including complex prompts, programming and following instructions. wait. In comparison, Grok-2-Mini entered the rankings at fifth place, showing its considerable strength.

Grok-2-Mini has also experienced significant speed improvements, now running twice as fast as before. This leap of improvement comes from xAI's inference team, which completely rewrote the inference stack and used SGLang to achieve more efficient multi-host inference and improved accuracy. At the same time, the team also introduced new computing and communication core algorithms, as well as better batch processing scheduling and quantification technology, to further improve the overall performance of the model.

Although some people are skeptical about the performance of Grok-2 and believe that OpenAI's GPT-4o is better, in actual use, many users have stated that Grok-2 does perform quite well in programming and mathematics tasks. The Grok-2 series models were released in beta version this month, and users can also experience them through the X platform. In addition, the model also supports image creation using the FLUX.1 image generation model.

Highlight:

✨ Grok-2 ranked second in the LMSys chatbot rankings, surpassing GPT-4o (May) and tied for second with Gemini.

Grok-2 performed well on the math task, winning first place, and also ranked among the best in many other tasks.

Grok-2-Mini is twice as fast as before, further enhancing performance.

The outstanding performance of Grok-2 and Grok-Mini not only proves the innovation ability of the xAI team in the field of AI technology, but also provides a new reference for the development of large-scale language models in the future. Their advantages in specific fields, such as mathematics and programming, herald the great potential of large language models for professional applications. I believe that the xAI team will bring more surprises in the future.