NVIDIA Blackwell platform released: AI training performance surges by 2.2 times, GPU demand is significantly reduced!

Author：Eve Cole Update Time：2024-11-28 11:48:01

The editor of Downcodes learned that Nvidia’s latest Blackwell platform performed amazingly in the MLPerf Training 4.1 benchmark test, and its performance greatly exceeded the previous generation Hopper platform. The test results show that Blackwell has achieved significant performance improvements in multiple benchmark tests, which has attracted widespread attention in the industry and heralds a new breakthrough in AI accelerator technology. Specifically, Blackwell has demonstrated impressive advantages in LLM fine-tuning and pre-training tasks, bringing new possibilities to the development of the AI field.

Recently, NVIDIA released its new Blackwell platform and demonstrated preliminary performance in the MLPerf Training4.1 benchmark test. According to test results, Blackwell's performance in some aspects has doubled compared to the previous generation Hopper platform. This result has attracted widespread attention in the industry.

In the MLPerf Training4.1 benchmark, the Blackwell platform achieved 2.2 times the performance of Hopper per GPU in the Llama270B fine-tuning task of the LLM (Large Language Model) benchmark, and 2.2 times in the pre-training of GPT-3175B times improvement. In addition, in other benchmark tests such as Stable Diffusion v2 training, the new generation Blackwell also surpassed the previous generation product with a 1.7 times advantage.

Notably, while Hopper continues to show improvement, it also improves performance in language model pre-training by a factor of 1.3 compared to the previous round of the MLPerf Training benchmark. This shows that Nvidia's technology continues to improve. In the recent GPT-3175B benchmark, Nvidia submitted 11,616 Hopper GPUs, setting a new scaling record.

Regarding the technical details of Blackwell, Nvidia said that the new architecture uses optimized Tensor Cores and faster high-bandwidth memory. This allows the GPT-3175B benchmark to be run on just 64 GPUs, whereas using the Hopper platform would require 256 GPUs to achieve the same performance.

Nvidia also emphasized the performance improvements of Hopper generation products in software and network updates at the press conference, and Blackwell is expected to continue to improve with future submissions. In addition, NVIDIA plans to launch the next generation AI accelerator Blackwell Ultra next year, which is expected to provide more memory and stronger computing power.

Blackwell also debuted last September on the MLPerf Inference v4.1 benchmark, achieving an impressive four times more performance per GPU than the H100 in AI inference, especially using lower FP4 precision. . This new trend aims to address the growing demand for low-latency chatbots and intelligent computing like OpenAI’s o1 model.

The outstanding performance of the Blackwell platform marks a major leap forward in AI accelerator technology, and its performance improvements in LLM training and inference will greatly promote the development and application of AI technology. The editor of Downcodes will continue to pay attention to the subsequent development of the Blackwell platform and bring more related reports.