In recent years, the application of artificial intelligence in the medical field has attracted much attention, especially chatbots represented by ChatGPT, which have high hopes for improving doctors' diagnostic efficiency. However, the latest research published in the journal "JAMA Network Open" shows that ChatGPT does not significantly improve doctors' diagnostic capabilities, which has triggered a rethinking of the potential and limitations of AI in medical diagnostic applications. The study conducted experiments on 50 doctors and showed that there was little difference in diagnostic accuracy between doctors who used ChatGPT and those who only used traditional resources, which was in sharp contrast to the high accuracy of ChatGPT's independent diagnosis. Research also points out that the complexity of the actual clinical environment and possible cognitive biases of doctors themselves have an impact on the effectiveness of AI-assisted diagnosis.
Picture source note: The picture is generated by AI, and the picture authorization service provider Midjourney
In the study, participants were 50 physicians, including 26 attending physicians and 24 resident physicians. They were asked to make a diagnosis based on six real cases within an hour. In order to evaluate the auxiliary effect of ChatGPT, the researchers divided doctors into two groups, one group could use ChatGPT and traditional medical resources, and the other group could only rely on traditional resources, such as the clinical information platform UpToDate.
The results showed that doctors using ChatGPT scored 76% in diagnosis, while doctors who relied solely on traditional resources scored 74%. In comparison, ChatGPT achieved a diagnostic score of 90% on its own. Although ChatGPT performed well when working independently, its combination with doctors did not lead to significant improvements, which surprised the research team.
Ethan Goh, co-first author of the study and a postdoctoral researcher at the Stanford Center for Clinical Excellence, said the study was not designed to be conducted in a real clinical setting but was based on simulated data, so the results are not applicable Sex is restricted. He pointed out that the complexity that doctors face when dealing with actual patients cannot be fully reflected in experiments.
Although research shows that ChatGPT performs better than some doctors at diagnosis, this does not mean that AI can replace doctors' decision-making. Instead, Goh emphasized that doctors still need to maintain oversight and judgment when using AI tools. In addition, doctors may be stubborn when making diagnoses, and the preliminary diagnosis they have formed may affect their acceptance of AI recommendations. This is also a direction that future research needs to focus on.
After the process of medical diagnosis is over, doctors also need to answer a series of new questions, such as "How to proceed with the correct treatment steps?" and "What tests are needed to guide the patient's next steps?" This shows the application of AI in the medical field. It still has broad prospects, but its effectiveness and applicability in actual clinical practice still need to be explored in depth.
Highlight:
Research shows that doctors using ChatGPT are only slightly better at diagnosis than doctors using traditional methods, with no significant improvement.
ChatGPT's independent diagnosis score is 90%, which is excellent, but it still requires a doctor's supervision and judgment.
More research is needed in the future to explore how to optimize the application of AI in medical diagnosis to improve its effectiveness.
All in all, this research provides valuable experience for the application of AI in medical diagnosis and also points out the direction of future research. Although AI tools such as ChatGPT have shown certain potential, they still need to be further improved and optimized in actual clinical applications, and doctors need to use them with caution and make judgments based on their own clinical experience in order to better serve patients.