Zoom in! OpenAI releases the strongest inference model o3 and the streamlined version o3-mini

Author：Eve Cole Update Time：2024-12-26 17:32:02

OpenAI has released a new generation of inference model o3 and its streamlined version o3-mini, which are the successors of the o1 series and are designed to improve the accuracy of answering questions through deeper thinking. o3 has made breakthrough progress in the ARC-AGI benchmark, demonstrating near-human-level problem-solving capabilities. o3-mini focuses on speed and cost-effectiveness, and is especially suitable for programming tasks. While the o3 series models will not be released directly to the public, OpenAI has opened them up to security researchers for preview.

The o3 model performs well in multiple benchmarks. For example, the accuracy on the SWE-bench Verified benchmark is more than 20% higher than o1, and the accuracy on competition mathematics and GPQA Diamond is also significantly improved. OpenAI has also introduced a new security assessment method called “deliberative alignment” to ensure model security and compliance with security specifications. Currently, OpenAI is undergoing external security testing and has opened early access applications.

OpenAI’s strongest inference model o3 is released: AGI capabilities soar, approaching human levels

In terms of programming and mathematical problem solving, the o3 model has demonstrated remarkable capabilities. On the SWE-bench Verified benchmark, the accuracy of o3 is approximately 71.7%, which is more than 20% higher than the o1 model. In the Competition Code, o3 received an Elo score of 2727, while o1 only received 1891. In addition, o3's accuracy in competition mathematics reached 96.7%, and its accuracy in GPQA Diamond reached 87.7%, which is nearly 10% higher than o1.

OpenAI also introduced a new security assessment method - deliberative alignment, which is a new paradigm that directly teaches the model security specifications and can train the model to explicitly recall the specifications and accurately perform reasoning before answering. This approach is used to align OpenAI’s o-series models and achieve highly precise compliance with OpenAI’s security policies.

Currently, OpenAI is promoting external security testing and has opened early access applications on the website. Applicants need to fill out an online form and provide relevant information. Selected researchers will be granted access to o3 and o3-mini to explore their capabilities and contribute to security assessments.

The release of the OpenAI o3 series models marks a significant improvement in artificial intelligence reasoning capabilities, and its outstanding performance in multiple fields heralds a new direction for future AI technology development. In the future, we will continue to pay attention to the progress and application of o3 series models.