Machine learning models can fail when they try to predict people who were underrepresented in the datasets on which they were trained.
For example, a model that predicts the best treatment option for someone with a chronic disease can be trained using a dataset of mostly male patients. This model may make incorrect predictions for female patients after implementation in the hospital.
To improve results, engineers can try to rebalance the training data set by removing data points until all subgroups are represented equally. Although dataset balancing is promising, it often requires removing a immense amount of data, which negatively affects the overall performance of the model.
MIT researchers have developed a fresh technique that identifies and removes specific points in a training dataset that contribute most to model failure in minority subgroups. By removing significantly fewer data points than other approaches, this technique maintains the overall accuracy of the model while improving its performance for underrepresented groups.
Additionally, this technique can identify hidden sources of error in a label-free training dataset. In many applications, unlabeled data is much more common than labeled data.
This method can also be combined with other approaches to improve the fairness of machine learning models deployed in high-stakes situations. For example, it may one day support ensure that underrepresented patients are not misdiagnosed due to a biased AI model.
“Many other algorithms trying to solve this problem assume that every data point has the same meaning as every other data point. In this article we will show that this assumption is not true. There are specific points in our dataset that contribute to this bias. We can find them, remove them and get better performance,” says Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate student at MIT and co-lead author A article about this technique.
She wrote the article with co-authors Saachi Jain PhD ’24 and fellow EECS student Kristian Georgiev; Andrew Ilyas M.A. ’18, Ph.D. ’23, Stein Fellow at Stanford University; and senior authors Marzyeh Ghassemi, associate professor at EECS and member of the Institute of Medical Engineering Sciences and the Information and Decision Systems Laboratory, and Aleksander Madry, professor of Cadence Design Systems at MIT. The research results will be presented at the Conference on Neural Information Processing Systems.
Removing bad examples
Often, machine learning models are trained using huge datasets collected from many sources on the Internet. These datasets are much too immense to be carefully hand-selected, so they may contain bad examples that harm model performance.
Scientists also know that some data points affect model performance on some downstream tasks more than others.
MIT researchers combined these two ideas in an approach that identifies and removes problematic data points. They attempt to address a problem known as worst-group bias, which occurs when a model underperforms minority subgroups in the training dataset.
The researchers’ fresh technique is based on previous work in which they introduced a method called TRACwhich identifies the most significant training examples for a specific model output.
With this fresh technique, researchers take the model’s mispredictions about minority subgroups and utilize TRAK to determine which training examples contributed most to those mispredictions.
“By combining this information from test mispredictions in the right way, we are able to find specific parts of the training that generally lower the accuracy of the worst group,” explains Ilyas.
They then remove these specific samples and retrain the model on the remaining data.
Since having more data tends to provide better overall performance, removing only the samples that cause the worst groups to fail preserves the overall accuracy of the model while improving its performance on minority subgroups.
A more approachable approach
Their method outperformed multiple techniques on three machine learning datasets. In one case, it improved the accuracy of the worst group by removing about 20,000 fewer training samples than the conventional data balancing method. Their technique also achieved higher accuracy than methods that require changes to the internal workings of the model.
Because the MIT method instead relies on changing the dataset, it would be easier for a practitioner to utilize and can be applied to many types of models.
It can also be used when the bias is not known because the subgroups in the training dataset are not labeled. By identifying the data points that contribute most to the feature your model is learning, you can understand the variables it uses to make predictions.
“It’s a tool that anyone can use when training a machine learning model. They can look at these data points and see if they match the capabilities they are trying to teach the model,” Hamidieh says.
Using this technique to detect unknown deviations from subgroups would require intuition about what groups to look for, so the researchers hope to validate the technique and explore it more fully in future human studies.
They also want to improve the efficiency and reliability of their technique and ensure that the method is accessible and straightforward to utilize for practitioners who will one day be able to apply it in real-world environments.
“When you have the tools to look at data critically and figure out which data points will lead to bias or other undesirable behavior, that is the first step toward building models that are more fair and reliable.” – says Ilyas.
This work is funded in part by the National Science Foundation and the U.S. Defense Advanced Research Projects Agency.