Alibaba Cloud Tongyi team launches a new mathematical reasoning process reward model Qwen2.5-Math-PRM. This model is available in 72B and 7B sizes. It significantly surpasses similar open source models in performance, especially in identifying reasoning errors. . It is worth noting that version 7B even surpassed the popular GPT-4o, demonstrating Alibaba Cloud's breakthrough progress in the field of inference model research and development. In order to evaluate the model performance more comprehensively, the team also open sourced the first step-level evaluation standard ProcessBench, which contains 3,400 mathematical questions covering the difficulty of the Mathematical Olympiad, and is marked with detailed reasoning processes by experts to ensure the scientificity and rigor of the evaluation.
Today, the Alibaba Cloud Tongyi team officially released a new mathematical reasoning process reward model Qwen2.5-Math-PRM. The model is available in 72B and 7B sizes, and its performance is significantly better than similar open source process reward models, especially in identifying inference errors.
The 7B version of Qwen2.5-Math-PRM surprisingly surpassed the industry's popular GPT-4o. This achievement marks an important step for Alibaba Cloud in the development of inference models. In order to comprehensively evaluate the performance of the model in mathematical reasoning, the Tongyi team also open sourced the first step-level evaluation standard-ProcessBench. This evaluation standard covers 3,400 mathematical problem test cases, including difficult questions from the International Mathematical Olympiad Competition. Each case is marked with a detailed reasoning process by human experts to ensure the scientificity and comprehensiveness of the evaluation.
By evaluating the performance of Qwen2.5-Math-PRM on ProcessBench, the research team found that both the 72B and 7B size models performed well. Especially the 7B version, not only surpasses the open source model of the same size, but even surpasses the closed source GPT-4o-0806 in some aspects. This proves the great potential of the Process Reward Model (PRM) in improving the reliability of reasoning and provides new ideas for the development of future reasoning process supervision technology.
This innovative work by the Alibaba Cloud Tongyi team not only promotes the advancement of artificial intelligence reasoning technology, but also provides valuable reference for other developers in the industry. Through open source, the Tongyi team hopes to share experience with more researchers and promote technological progress in the entire industry.
The release of Qwen2.5-Math-PRM marks a new breakthrough for large models in the field of mathematical reasoning. Its open source feature also provides great convenience for research and application in academia and industry. It is worth looking forward to its use in More possibilities in the future development of artificial intelligence.