A research team from Peking University and the Hong Kong University of Science and Technology has made an eye-catching breakthrough. They developed an innovative training method and successfully improved the performance of an 8B-sized medical expert model to GPT-4 level. This research not only introduces a new concept of "stability gap" to explain the phenomenon of performance fluctuations of large language models during continuous pre-training, but more importantly, they propose three effective strategies to solve this problem. problem, and open sourced the Llama-3-Physician-8B model, bringing revolutionary progress to the field of medical AI. The model's performance on medical question answering tasks even surpasses open source models of the same size and is close to the level of GPT-4, which indicates the great potential of medical AI.
First, they found that during the continuous pre-training process, the model's performance in the target domain will first decrease and then increase, which is as exciting as a roller coaster. To solve this problem, they proposed three strategies. The first is to perform multiple rounds of pre-training on appropriately sized subsets of the data, which can restore performance faster than a single round of pre-training on a large data set. The second is to select the highest quality sub-corpus for multiple rounds of pre-training. Finally, mixing data to approximate the pre-trained data distribution can make the model more stable.
These strategies have achieved remarkable results in continuous pre-training and fine-tuning of instructions in the medical field, improving the effect and reducing the amount of calculations. Moreover, their open source Llama-3-Physician-8B model is already available on HuggingFace.
The significance of this research goes beyond that. They also found that with these strategies, the OpenLLaMa model only needed to be trained on high-quality 5 billion data for 4 epochs to significantly surpass all baselines on medical tasks. This not only improves performance, but also greatly reduces the consumption of computing resources.
What's even more impressive is that their Llama-3-Physician-8B-insturct model's performance on medical question answering tasks is not only better than other open source models of the same size, it even surpasses the closed-source GPT-3.5 model and is close to GPT-4 level. This is simply a revolution in the medical field.
This research not only provides us with a new training method, but also allows us to see the huge potential of large language models in the medical field. Through continuous pre-training and instruction fine-tuning, we can make the model achieve higher performance in specific areas while reducing computational costs. This is undoubtedly a huge boon for the medical industry.
This study also reminds us that the training of large language models is not achieved overnight, but requires continuous optimization and adjustment. By introducing the concept of "stability gap", we can better understand and solve problems in model training, allowing the model to play a greater role in specific fields. This is not only a technological breakthrough, but also a profound insight into the medical industry.
Paper link: https://arxiv.org/abs/2406.14833
Open source address: https://huggingface.co/YiDuo1999/Llama-3-Physician-8B-Instruct
This research result points out the direction for the development of the medical AI field and also provides valuable experience for model training in other fields. In the future, with the continuous advancement of technology, we can expect that the application of large language models in the medical field will be more extensive and in-depth, making greater contributions to human health.