A new study from Brigham and Women's Hospital reveals racial and gender bias issues in medical applications of the large language model GPT-4. The research team conducted an in-depth evaluation of GPT-4's performance in clinical decision-making, including generating patient cases, formulating diagnostic and treatment plans, and assessing patient characteristics. The findings show that GPT-4 has clear biases in each of these links, raising concerns about the use of large language models in the medical field and highlighting the importance of bias assessment of AI models to avoid exacerbating social inequality. .
Researchers at Brigham and Women's Hospital evaluated GPT-4 for racial and gender bias in clinical decision-making. They found that GPT-4 had significant biases in generating patient cases, developing diagnosis and treatment plans, and assessing patient characteristics. Research calls for bias assessment of large language models to ensure their use in medicine does not exacerbate social biases. The findings have been published in the journal The Lancet Digital Health.
The findings are a warning that potential bias issues must be fully considered and addressed when applying artificial intelligence to critical areas such as healthcare. In the future, we need to develop fairer and more just AI models to ensure that they benefit all mankind rather than exacerbating social injustice. The publication of this research also provides an important reference for the development and application of large language models, prompting developers to pay more attention to AI ethics and social responsibility.