The research team from the Chinese University of Hong Kong (Shenzhen) and the Shenzhen Big Data Research Institute recently launched a medical large-scale language model (LLM) called HuatuoGPT-o1. This innovative achievement marks the medical AI field's progress in complex reasoning. An important step. Designed for complex reasoning in the medical field, the model aims to improve the accuracy and reliability of medical diagnosis and decision-making. Unlike the LLM that focused on mathematical reasoning in the past, HuatuoGPT-o1 focuses on the special field of medical care, and has opened up a new path for the development of medical AI by simulating the rigorous thinking process of doctors in actual work.
The main challenge facing the research team in the development process is that the reasoning process in the medical field often lacks clear steps and is difficult to verify. To solve this problem, they selected 40,000 difficult questions with unique and objective correct answers from the medical examination question bank and transformed them into open-ended questions to build a verifiable set of medical questions. These questions not only require the model to conduct in-depth reasoning, but also verify the correctness of the inference process through the right or wrong answers, thus providing reliable data support for model training.
In order to improve the model's reasoning ability, the research team adopted a two-stage training method. In the first phase, they utilize validator feedback (correct or wrong) to guide the model for policy-based searches, generating complex inference trajectories. The model first initializes a thinking chain (CoT). If the validator believes that the current CoT is incorrect, the model will try to backtrack, explore new paths, verify or correct strategies until the correct answer is found. These successful reasoning trajectories are then used to fine-tune the LLM to give it the complex reasoning ability of iterative reflection. In the second phase, the research team used the sparse rewards provided by the validator to further improve the model's complex reasoning capabilities through reinforcement learning (RL) algorithms.
The experimental results show that this two-stage training method has achieved significant results. Using only 40,000 verifiable questions, a model with 8 billion parameters achieved an 8.5-point increase in medical benchmarks. A 70 billion parameter model also surpasses other open source general and medical-specific LLMs in multiple medical benchmarks. These results not only confirm the effectiveness of complex reasoning in solving medical problems, but also demonstrate the significant role of reinforcement learning in improving model performance.
HuatuoGPT-o1 is innovative in that it uses verifiable medical problems and medical validators for the first time to enhance LLM's medical complex reasoning capabilities. With this approach, the model can think deeply like a doctor and perform self-examination and correction before giving an answer. This not only improves the application potential of the model in the medical field, but also provides reference for improving the reasoning ability in other professional fields.
To further verify the reliability of the model, the researchers used GPT-4o as a validator, and the results showed that its accuracy rate reached 96.5% in the first phase and 94.5% in the second phase. At the same time, they also confirmed that LLM-based validators are more reliable than traditional precise matching methods. In addition, the researchers applied the method to the Chinese medical field, and also achieved remarkable results, demonstrating the adaptability of the method in different fields and language environments.
Overall, the emergence of HuatuoGPT-o1 marks significant progress in medical AI in complex reasoning. It not only provides more reliable tools for medical diagnosis and decision-making, but also provides new ideas for future application of AI in other professional fields. Although the model is still in the research stage and cannot be applied directly to clinical practice, its huge potential has attracted widespread attention and is expected to play a greater role in the medical field in the future.
Paper address: https://arxiv.org/pdf/2412.18925