It turns out that even language models “think” they are biased. When asked on ChatGPT, the response was: “Yes, language models can contain errors because the training data reflects the biases present in the society from which the data was collected. For example, gender and racial biases are common in many real-world datasets, and if a language model is trained on this, it can entrench and reinforce these biases in its predictions.” A common but perilous problem.
When learning, people (usually) can play with both logical and stereotypical reasoning. Yet language models mainly follow the latter option, which is an unfortunate narrative that unfolds ad nauseum when reasoning and critical thinking skills are lacking. So, is introducing logic into the fray enough to mitigate this behavior?
Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) had a feeling this would happen, so they set out to see if logic-aware language models could meaningfully avoid more harmful stereotypes. They trained a language model to predict the relationship between two sentences based on context and semantic meaning, using a dataset with labels for text fragments detailing whether the second phrase “contains”, “contradicts”, or is neutral with respect to the first. Using this dataset – natural language inference – they found that the newly trained models were significantly less biased than other baseline models and did not require any additional data, data editing, or additional training algorithms.
For example, under the assumption “this person is a doctor” and the hypothesis “this person is male”, using logic-trained models, the relationship could be classified as “neutral” because there is no logic that says the person is Male. With more common language models, two statements may appear to be correlated due to some bias in the training data, e.g., “doctor” may be associated with “masculine” even if there is no evidence that the statement is true.
At this point, the ubiquitous nature of language models is well known: there are tons of applications in natural language processing, speech recognition, conversational artificial intelligence, and generative tasks. While this is not a modern field of research, growing issues may take center stage as complexity and opportunity raise.
“Current language models suffer from issues related to fairness, computational resources, and privacy,” says Hongyin Luo, a postdoc at MIT CSAIL and lead author of a modern paper on this work. “Many estimates say CO2 the emissions of training a language model may be higher than the lifetime emissions of a car. Running these large language models is also very expensive due to the number of parameters and computational resources needed. When it comes to privacy, cutting-edge language models developed by places like ChatGPT or GPT-3 have APIs where you submit your language, but there’s no room for sensitive information about things like healthcare or finances. To address these challenges, we proposed a logical language model that we qualitatively assessed as fair, is 500 times smaller than state-of-the-art models, can be deployed locally, and does not include human-annotated training samples for downstream tasks. Our model uses 1/400 of the parameters compared to the largest language models, has better performance on some tasks, and significantly saves computational resources.”
This model, which has 350 million parameters, performed better Some very large-scale language models with 100 billion parameters in logical language understanding tasks. For example, the team evaluated the popular pre-trained BERT language models along with their “textual content” on tests of stereotypes, occupations and emotional biases. The latter outperformed other models with significantly less overhead while retaining language modeling ability. “Fairness” was assessed using so-called ideal context association tests (iCAT), where higher iCAT scores mean fewer stereotypes. The model achieved over 90% iCAT scores, while other models with forceful language comprehension ranged from 40 to 80.
Luo wrote the paper with MIT senior research fellow James Glass. They will present their work at the Conference of the European Branch of the Association for Computational Linguistics in Croatia.
Unsurprisingly, the original, pre-trained language models examined by the team were rife with bias, which was confirmed by a multitude of reasoning tests showing how professional and emotional terms were significantly influenced by feminine and masculine words in gender vocabulary.
For occupations, the linguistic (biased) model assumes that “stewardess”, “secretary”, and “physician assistant” are female occupations, while “fisherman”, “lawyer”, and “judge” are male occupations. When it comes to emotions, the linguistic model considers “anxious”, “depressed”, and “devastated” to be feminine words.
While we may still be far from the utopia of the neutral language model, research continues in this direction. Currently, the model is only used for language understanding, so it relies on reasoning between existing sentences. Unfortunately, it is not yet able to generate sentences, so the next step for researchers will be to focus on the extremely popular generative models built on logical learning to ensure greater fairness and computational efficiency.
“Although stereotypical reasoning is a natural part of human appreciation, honesty-conscious people rely on logic, not stereotypes, when necessary,” Luo says. “We show that language models have similar properties. A language model without explicit logic learning causes a lot of biased reasoning, but adding logic learning can greatly alleviate this behavior. Moreover, with its demonstrated robust zero-shot adaptability, the model can be directly deployed to a variety of tasks with greater fairness, privacy, and speed.”