Thursday, April 30, 2026

Solving the mole dilemma: A smarter way to challenge AI vision models

Share

In today’s hospitals and clinics, a dermatologist can exploit an artificial intelligence model to classify skin lesions and assess whether the lesion is at risk of becoming cancerous or benign. However, if the model targets specific skin tones, it may not identify a high-risk patient.

Perhaps one of the most well-known and enduring challenges that AI research continues to grapple with is bias. Bias is often discussed in relation to training data, but model architecture can also contain and amplify bias, negatively impacting model performance in real-world settings. In high-stakes medical situations, the very real consequences of impoverished performance have made bias a quintessential safety issue.

New paper from researchers at MIT, Worcester Polytechnic Institute, and Google, which was accepted to the 2026 International Science Representation Conference, proposes a novel discussion approach called “Weighted Rotational DebiasING” (i.e., WRING) that can be applied to visual language models (VLMs) such as OpenAI’s OpenCLIP.

VLMs are multimodal models that can simultaneously understand and interpret different data modalities, such as video, image, and text. Although value-added bias approaches exist for VLM, the most commonly used approach is known as “projection bias rejection”, which leads to what is called “The Whac-A-Mole Dilemma”an empirical observation that was formally introduced into artificial intelligence research in 2023.

Projective discussion is a post-processing approach that removes unwanted, biased information from embedded models by “projecting” subspaces from the relationship representation space, thereby eliminating bias. But this approach has its drawbacks.

“When you do this, you inadvertently blow everything up,” says Walter Gerych, first author of the paper, who conducted this research last year as a postdoc at MIT. “All the other relationships the model learns change when you do this.”

Gerych, currently an assistant professor of computer science at Worcester Polytechnic Institute, is joined on the article by MIT graduate students Cassandra Parent and Quinn Perian; Rafiya Javed of Google; and MIT electrical engineering associate professors Justin Solomon i Marzyeh Ghassemiwhich is an affiliate of Abdul Latif Jameel Clinic for Machine Learning and Health and the Information Systems and Decisions Laboratory.

Although projection deflection stops the model from affecting the load that has been thrown out of subspace, it can end up amplifying and creating other deflections, hence the Whac-A-Mole dilemma. According to Ghassemi, the unintended amplification of model errors is “both a technical and practical challenge. For example, challenging a VLM that retrieves images of clinical staff – if racial bias is removed – could have the unintended consequence of reinforcing gender bias.”

WRING works by shifting certain coordinates in the model’s high-dimensional space – those that appear to be responsible for the error – to a different angle so that the model can no longer distinguish between different groups within a particular concept. This changes the representation in a specific space while leaving other model relationships intact. As with projection parsing, WRING relies on post-processing, which means it can be applied “on the fly” to a pre-trained VLM.

“People have already spent a lot of resources and money training these huge models, and we really don’t want to go in and modify something during training because then you have to start from scratch,” explains Gerych. “[WRING is] very efficient. It requires no major model training and is minimally invasive.”

In their results, the researchers found that WRING significantly reduced bias toward the target concept while not increasing bias in other areas. For now, however, this approach is somewhat limited to contrastive language pre-training (CLIP) models, a type of VLM that combines images with language for search or classification purposes.

“Extending this to ChatGPT-style generative language models is a reasonable next step for us,” Gerych says.

This work was supported in part by a National Science Foundation CAREER Award, an AI2050 Award Early Career Fellowship, a Sloan Research Fellow Award, a Gordon and Betty Moore Foundation Award, and an MIT-Google Computing Innovation Award.

Latest Posts

More News