This article explores implicit racist biases lurking in language models, specifically discrimination against African American English (AAE). Research has found that even when expressing positive attitudes on the surface, language models in practical applications still reflect implicit biases that highly overlap with the most negative stereotypes of the past, such as unfairness in work assignments and judicial decisions. This not only reveals the complex racial attitudes of human society reflected in algorithms, but also highlights the need to pay attention to and address potential bias issues when developing and applying language models to ensure technical fairness and safety.
In today's era of rapid technological development, language models have become an indispensable tool in our lives. These models have applications ranging from helping teachers plan lessons, to answering questions about tax laws, to predicting a patient's risk of dying before they are discharged from the hospital.
However, as their importance in decision-making continues to rise, we also have to worry about whether these models may inadvertently reflect human biases latent in the training data, thereby exacerbating discrimination against racial, gender, and other marginalized groups. discriminate.
While early AI research revealed bias against racial groups, it focused primarily on explicit racial discrimination, that is, direct references to a race and its corresponding stereotypes. As society develops, sociologists have proposed a new, more covert concept of racism called "implicit racism." This form is no longer characterized by direct racial discrimination, but is based on a "colorless" racist ideology that avoids mention of race but still holds negative beliefs about people of color.
This study is the first to reveal that language models also convey the concept of implicit racism to some extent, particularly when judging speakers of African American English (AAE). AAE is a dialect closely tied to the history and culture of African Americans. By analyzing the performance of language models in the face of AAE, we found that these models exhibit a harmful dialect discrimination in their decision-making, exhibiting more negative stereotypes about African Americans than any documented negative stereotype. attitude.
In the course of our research, we used a method called "matching disguise" to explore the differences in the judgments of language models for speakers of different dialects by comparing AAE and Standard American English (SAE) texts. In the process, we found that the language model not only held more positive stereotypes about African Americans on the surface, but also had deep implicit biases that closely overlapped with the most negative stereotypes in the past.
For example, when the models were asked to match jobs to people who spoke AAE, they tended to assign these people to lower-level jobs, despite not being told their race. Likewise, in a hypothetical case, when the models were asked to sentence a murderer who testified using AAE, they were significantly more likely to favor the death penalty.
Even more worryingly, some current practices designed to mitigate racial bias, such as training through human feedback, actually exacerbate the gap between implicit and explicit stereotypes, allowing underlying racism to surface It seems less obvious, but continues to exist on a deeper level.
These findings highlight the importance of fair and safe use of language technologies, especially in contexts where they can have profound impacts on human lives. Although we have taken steps to eliminate explicit bias, the language model still exhibits implicit racial discrimination against AAE speakers through dialect features.
This not only reflects the complex racial attitudes in human society, but also reminds us that we must be more careful and sensitive when developing and using these technologies.
Reference: https://www.nature.com/articles/s41586-024-07856-5
The findings alert us to the need for further research into biases in language models and the development of more effective debiasing methods. Only in this way can we ensure that artificial intelligence technology can serve everyone fairly and equitably and avoid exacerbating social injustice.