Monday, December 23, 2024

Artificial intelligence predicts patients’ race based on their medical images

Share

Algorithm miseducation is a sedate problem; when AI reflects the unconscious thoughts, racism and biases of the people who generated these algorithms, it can lead to sedate harm. Computer programs have, for example incorrectly marked Black defendants twice as likely to reoffend like someone who is white. When AI uses cost as a proxy for health needs, falsely named Black patients were as fit as equally ailing white patients because less money was spent on them. Even artificial intelligence wrote art based on apply harmful stereotypes for casting.

Removing sensitive features from data seems like a real change. But what happens when that’s not enough?

Examples of bias in natural language processing are endless, but MIT researchers investigated another essential, largely understudied modality: medical images. Using both private and public datasets, the team found that the AI ​​could accurately predict patient self-reported race from medical images alone. Using image data including chest X-rays, extremity X-rays, chest CT scans, and mammograms, the team trained a deep learning model to identify race as white, black, or Asian – even though the images themselves made no explicit mention of the patient’s race. It’s a feat that even the most experienced doctors can’t accomplish, and it’s unclear how the model accomplished it.

Researchers have conducted numerous experiments to try to figure out and understand the mysterious “how” of it all. To investigate possible race detection mechanisms, they looked at variables such as differences in anatomy, bone density, image resolution, and more, and the models still prevailed, providing a high ability to detect race from a chest x-ray. “These results were initially misleading because members of our research team were unable to find a good replacement for this task,” says paper co-author Marzyeh Ghassemi, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science and the Institute for Medical Engineering and Sciences (IMES), which is an affiliate of the lab Computer Science and Artificial Intelligence (CSAIL) and the MIT Jameel Clinic. “Even after filtering medical images beyond where they can be recognized as medical images at all, the deep models maintain very high performance. This is troubling because superhuman abilities are generally much more difficult to control, regulate, and prevent harm to humans.”

In the clinical setting, algorithms can assist us determine whether a patient is eligible for chemotherapy, decide how to triage patients, or decide whether they need to be transferred to an intensive care unit. “We believe that the algorithms only take into account vital signs or lab tests, but it is possible that they also take into account your race, ethnicity, gender, whether you are in prison or not – even if all this information is hidden ” – added. says paper co-author Leo Anthony Celi, principal investigator at IMES at MIT and associate professor of medicine at Harvard Medical School. “Just because you include different groups in your algorithms does not guarantee that it will not perpetuate or widen existing disparities and inequalities. Feeding algorithms with more data with representation is not a panacea. This article should make us stop and reconsider whether we are ready to bring artificial intelligence to the patient bedside.”

Tests, “Patient race recognition by artificial intelligence in medical imaging: A modeling study”, published on May 11. Celi and Ghassemi wrote the paper with 20 other authors from four countries.

In setting up the tests, the researchers first demonstrated that the models were able to predict race across multiple imaging modalities, different datasets, and a variety of clinical tasks, as well as across a variety of academic centers and patient populations across the United States. They used three gigantic chest X-ray datasets and tested the model on an unseen subset of the dataset used to train the model and on a completely different one. They then trained racial identity detection models on non-chest X-rays from multiple body locations, including digital radiography, mammography, lateral cervical spine radiographs, and chest CT scans, to see if the model’s performance was confined to chest X-rays.

In trying to explain the model’s behavior, the team examined many fundamentals: differences in physical characteristics between different racial groups (body habit, breast density), disease distribution (previous research has shown that black patients are more likely to develop health problems such as heart disease), specific differences for location or tissue, the effects of social bias and environmental stress, the ability of deep learning systems to detect race when multiple demographic and patient factors are combined, and whether specific areas of the image contributed to the race diagnosis.

The results were truly astonishing: the models’ ability to predict race based on diagnostic labels alone was significantly lower than that of models based on chest X-ray images.

For example, the bone density test used images in which the thicker part of the bone appeared white and the thinner part appeared grayer or translucent. The researchers assumed that because black people generally have higher bone mineral density, the color differences helped the AI ​​models detect race. To cut this out, they cropped the images with a filter so that the model couldn’t see the differences in color. It turned out that cutting off the color supply didn’t affect the model – it could still accurately predict races. (The “area under the curve,” a measure of the accuracy of a quantitative diagnostic test, was 0.94–0.96). Therefore, the learned features of the model seemed to be based on all areas of the image, which means that controlling this type of algorithmic behavior is a intricate and challenging problem.

The researchers acknowledge that the availability of racial identity labels is confined, which led them to focus on Asian, black and white populations, and their ground truth was a self-reported detail. Other upcoming work will include potentially looking at isolating different signals before image reconstruction, since, as with the bone density experiments, it was not possible to account for the residual bone tissue found in the images.

Notably, other work by Ghassemi and Celi, led by MIT graduate student Hammaad Adam, showed that models can also identify a patient’s self-reported race from clinical notes, even when those notes lack explicit indications of race. As in this work, human experts cannot accurately predict a patient’s race from the same redacted clinical notes.

“We need to get sociologists involved. Domain experts, who are typically clinicians, public health specialists, computer scientists and engineers, are not enough. Health care is as much a sociocultural problem as it is a medical problem. We need another group of experts to evaluate and provide their comments and opinions on the way we design, develop, implement and evaluate these algorithms,” says Celi. “Before any data mining, we also need to ask data scientists whether there are discrepancies? Which patient groups are marginalized? What are the causes of these disproportions? Is it access to care? Is this due to the subjectivity of healthcare providers? If we don’t understand this, we won’t have a chance to identify the unintended consequences of algorithms, and we will have no way to prevent algorithms from perpetuating errors.”

“The fact that algorithms “see” race, as the authors convincingly document, can be perilous. But an essential and related fact is that algorithms, if used carefully, can also counteract bias, says Ziad Obermeyer, an associate professor at the University of California, Berkeley, whose research focuses on artificial intelligence applied to health. “In our own work, led by computer scientist Emma Pierson of Cornell, we show that algorithms that learn from patients’ pain experiences can find modern sources of knee pain in X-rays that disproportionately affect Black patients – and are disproportionately missed by radiologists. So, like any tool, algorithms can be a force for evil or good – which one is up to us and the choices we make when creating the algorithms.

The work is supported in part by the National Institutes of Health.

Latest Posts

More News