Busting homophobic, anti-queer bias in AI language models

Artificial intelligence large language models used for writing and text-prediction are notoriously biased, but can be fine-tuned to become more inclusive.

Engineering and journalism researchers from the University of Southern California in the United States have teamed up to quantify and fix anti-queer bias in AI language models.

AI language models are capable of reading and writing text and predicting words in a sentence. But as most are trained by scraping text from the internet, they tend to repeat and amplify social biases. In fact as these models become more capable, their level of bias and toxicity increases.

Katy Felkner, a PhD student in computer science specialising in natural language processing says the problem of bias in language models is well documented.

“Large language models are very much a product of their training data. When the training data is scraped from the web with very minimal auditing, like most of the training data is today, you’re going to pick up a lot of the sort of nasty and hateful sentiments that are that are out there on the web.”

As part of the research, Felkner and team developed a new benchmark for measuring bias.

She says while there are generalised fairness measures available, there was limited research dealing with the issue of anti-queer and anti-trans bias in AI language models, and a lack of tailored benchmarks or metrics.

“I set out to do a more specific metric that would just address anti queer and anti trans bias but hopefully in a in a more nuanced and more holistic way.”

The project found that a popular model called BERT (short for bidirectional encoder representations from transformers) showed significant homophobic bias, measured through Felkner’s benchmark, which compares the likelihood that the model predicts heteronormative sentences versus sentences that include a queer relationship.

The team chose to focus on BERT as one of the first large language models to establish the power and potential of these technologies. There were practical reasons too as it was more accessible and practical to test and fine tune than other models.

By feeding more inclusive content like queer twitter and queer news, the research team successfully trained BERT to be less biased towards LGBTQI people.

The team was able to shift the bias score from 74% (using the off-the-shelf model) down to 55% through their fine-tuning.

She says the results mean that models “fine-tuned using our method are less likely to produce text that is toxic, harmful or mean, or homophobic or transphobic, and they are more likely to produce text that is is inclusive.”


Read more: The science says let transgender women play women’s sport


This is important because large language models are increasingly driving other kinds of downstream applications, like job advertisements or clinical applications.

“As these models are getting better and better, people are encountering more AI generated, or partially AI generated text in their daily lives. And so we want to make sure that those models are not are not going to inadvertently produce harmful outputs.”

The project was presented by Felkner at the Queer in AI workshop at the North American Chapter of the Association for Computational Linguistics (NAACL) conference and the team plans to publish a full conference paper.

For the next stage, the team hopes to crowd-source a wider set of perspectives from the queer community and audit more existing language models.

Please login to favourite this article.