To the untrained eye, a medical image such as an MRI or X-ray appears as a blur of black and white dots. It can be tough to decipher where one structure ends (like a tumor) and another begins.
Once trained to understand the boundaries of biological structures, AI systems can segment (or delineate) areas of interest that doctors and biomedical workers want to monitor for disease and other abnormalities. Instead of wasting valuable time manually tracing anatomy across multiple images, an AI assistant could do it for them.
The catch? Scientists and clinicians need to label countless images to train their AI system before it can segment accurately. For example, you’d have to annotate the cerebral cortex in numerous MRI scans to train a supervised model to understand how the shape of the cortex might vary across different brains.
To bypass such tedious data collection, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts General Hospital (MGH), and Harvard Medical School have developed an interactive “Scribble Tip“framework”: a flexible tool that can help you quickly segment any medical image, even ones you haven’t seen before.
Instead of letting people manually label each photo, the team simulated how users would annotate more than 50,000 scans, including MRIs, ultrasounds, and photographs, on structures of eyes, cells, brains, bones, skin, and more. To label all those scans, the team used algorithms to simulate the way people scribble and click on different areas in medical images. In addition to commonly labeled areas, the team also used superpixel algorithms, which find parts of the image with similar values, to identify potentially new areas of interest for medical researchers and teach ScribblePrompt to segment them. This synthetic data prepared ScribblePrompt to handle real-world segmentation requests from users.
“AI has enormous potential to analyze images and other high-dimensional data to help people perform tasks more productively,” says Hallee Wong SM ’22, MIT PhD candidate and lead author new article on ScribblePrompt and a CSAIL affiliate. “We want to supplement, not replace, the efforts of healthcare professionals with an interactive system. ScribblePrompt is a simple model with the performance to allow clinicians to focus on the more interesting parts of the analysis. It is faster and more accurate than comparable interactive segmentation methods, reducing annotation time by 28 percent compared with the Meta Segment Anything Model (SAM) framework, for example.”
The ScribblePrompt interface is simple: users can scribble on a rough area they want to segment, or click on it, and the tool will highlight the entire structure or the background as desired. For example, you can click on individual veins in a retinal (eye) scan. ScribblePrompt can also label the structure by providing a bounding box.
The tool can then make adjustments based on user feedback. If you want to highlight a kidney in an ultrasound scan, you can use a bounding box, then scribble additional parts of the structure if ScribblePrompt missed any edges. If you want to edit your segment, you can use a “negative scribble” to exclude specific areas.
These self-correcting, interactive capabilities made ScribblePrompt the preferred tool among MGH neuroimaging researchers in a user study. 93.8 percent of these users preferred the MIT approach over SAM in improving its segments in response to scribble corrections. For click-based edits, 87.5 percent of medical researchers preferred ScribblePrompt.
ScribblePrompt was trained on simulated scribbles and clicks on 54,000 images across 65 datasets, including scans of eyes, chest, spine, cells, skin, abdominal muscles, neck, brain, bones, teeth, and lesions. The model was familiar with 16 types of medical images, including microscopies, CT scans, X-rays, MRIs, ultrasounds, and photographs.
“Many existing methods don’t respond well when users scribble on images because it’s hard to simulate such interactions during training. In the case of ScribblePrompt, we were able to get our model to pay attention to different inputs using our synthetic segmentation tasks,” says Wong. “We wanted to train what is essentially a base model on a wide variety of data so that it could generalize to new types of images and tasks.”
After collecting this much data, the team evaluated ScribblePrompt on 12 new datasets. Although they hadn’t seen these images before, it outperformed four existing methods, segmenting them more efficiently and making more accurate predictions about the exact regions users wanted to highlight.
“Segmentation is the most pervasive task in biomedical image analysis, widely used in both routine clinical practice and research—making it both a highly diverse and critical, influential step,” says senior author Adrian Dalca SM ’12, PhD ’16, a CSAIL research scientist and assistant professor at MGH and Harvard Medical School. “ScribblePrompt was carefully designed to be practically useful to clinicians and researchers, and therefore to significantly speed up this step.”
“Most segmentation algorithms that have been developed in image analysis and machine learning rely at least to some extent on our ability to manually annotate images,” says Harvard Medical School radiology professor and MGH neuroscientist Bruce Fischl, who was not involved in the work. “The problem is much more acute in medical imaging, where our ‘images’ are typically 3D volumes because humans have no evolutionary or phenomenological reason to have any competence in annotating 3D images. ScribblePrompt enables manual annotation to be much, much faster and more accurate by training networks on exactly the types of interactions a human would typically have with an image during manual annotation. The result is an intuitive interface that allows annotators to interact naturally with image data with much greater productivity than was previously possible.”
Wong and Dalca wrote the paper with two other CSAIL collaborators: John Guttag, the Dugald C. Jackson Professor of EECS at MIT and principal investigator of CSAIL, and MIT doctoral student Marianne Rakic SM ’22. Their work was supported in part by Quanta Computer Inc., the Eric and Wendy Schmidt Center at the Broad Institute, Wistron Corp., and the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health, with equipment support from the Massachusetts Life Sciences Center.
The work by Wong and her colleagues will be presented at the European Conference on Computer Vision in 2024 and was presented as an oral lecture at the DCAMI workshop at the Computer Vision and Pattern Recognition Conference earlier this year. During the workshop, they were awarded a Bench-to-Bedside Paper Award for the potential clinical impact of ScribblePrompt.