Recently, SemiAnalysis released a report stating that there are serious flaws in the software of AMD's new generation AI chip MI300X, which prevents its performance from being fully realized and unable to effectively challenge Nvidia's dominance in the AI chip market. The report, based on a five-month in-depth investigation, reveals AMD's shortcomings in software ecosystem construction and makes recommendations for AMD's future development.
Recently, the technology analysis agency SemiAnalysis released a five-month investigation report, revealing that AMD's latest MI300X AI chip has major software problems, causing it to be unable to perform as it should, and therefore unable to compete in the AI chip market. Challenging Nvidia's dominance.
The report pointed out that AMD's software contains a large number of vulnerabilities, making AI model training almost impossible and users need to spend a lot of time debugging. Meanwhile, Nvidia continues to roll out new features, libraries, and performance updates that further widen the gap between the two. Analysts conducted extensive testing, including GEMM benchmarks and single-node training, and the results showed that AMD has been unable to overcome the so-called "CUDA moat"-that is, Nvidia's strong advantage in software.
From the perspective of hardware specifications, the performance data of MI300X is quite eye-catching. The FP16 computing power reaches 1307TeraFLOPS and is equipped with 192GB HBM3 memory. By comparison, Nvidia's H100 has 989 TeraFLOPS and 80GB of memory, although Nvidia's latest H200 closes the gap in terms of memory, offering a 141GB configuration. It's worth mentioning that AMD systems offer advantages in terms of total cost of ownership, with lower prices and more affordable Ethernet networks.
However, these hardware advantages do not bring the desired results in actual use. SemiAnalysis describes this phenomenon as "comparing cameras by pixel count alone," suggesting that AMD is getting lost in the numbers game and failing to deliver enough real-world performance. In order to obtain usable benchmark results, analysts had to work directly with AMD engineers to resolve multiple software vulnerabilities, whereas Nvidia's system was ready to use without additional tweaks.
The report also mentioned that Tensorwave, AMD's largest GPU cloud service provider, even had to provide its own purchased GPUs to the AMD team for free to help solve software problems. To this end, SemiAnalysis suggested that AMD CEO Su Zifeng needs to increase investment in software development and testing, especially allocating a large number of MI300X chips for automated testing, simplifying complex environment variables, and improving default settings to enhance the factory experience.
Although SemiAnalysis hopes that AMD can become a strong competitor to Nvidia, they also said that "unfortunately, there is still a lot of work to be done." Without major software improvements, AMD risks falling further behind, especially as Nvidia prepares to launch its next generation of Blackwell chips, although there are also reports that Nvidia's next-generation product launch will not be smooth sailing.
Highlight:
AMD MI300X AI chip faces serious software issues, making AI model training difficult.
Nvidia continues to expand its market advantage with its powerful CUDA platform and frequent software updates.
SemiAnalysis recommends that AMD increase investment in software development and improve user experience to enhance competitiveness.
All in all, the report clearly points out the huge software challenges faced by AMD's MI300X chips and the directions in which AMD needs to improve. Whether it can overcome the "moat" in software will directly determine AMD's success or failure in future AI chip market competition.