Tuesday, May 6, 2025

Machine learning and the microscope

Share

Thanks to recent advances in imaging, genomics, and other technologies, the life sciences are awash in data. If, for example, a biologist is examining cells taken from the brain tissue of Alzheimer’s patients, she might want to examine any number of characteristics—the type of cell it is, the genes it expresses, its location in the tissue, or more. But although cells can now be studied experimentally using different types of measurements at the same time, when it comes to data analysis, scientists can usually only work with one type of measurement at a time.

Working with “multimodal” data, as it is called, requires fresh computational tools – and that is where Xinyi Zhang comes in.

The fourth-year MIT PhD student combines machine learning with biology to understand fundamental biological principles, especially in areas where conventional methods have limitations. Working in the lab of MIT professor Caroline Uhler in the Department of Electrical Engineering and Computer Science, the Laboratory for Information and Decision Systems, and the Institute for Data, Systems, and Society, and collaborating with researchers at the Eric and Wendy Schmidt Center at the Broad Institute and elsewhere, Zhang has led numerous efforts to build computational frameworks and principles for understanding cell regulation.

“These are all small steps toward the ultimate goal of trying to answer the question of how cells work, how tissues and organs work, why they are diseased, and why sometimes they can be cured and sometimes they can’t,” Zhang says.

Zhang’s free-time pursuits are no less ambitious. Her list of hobbies at the Institute includes sailing, skiing, ice skating, rock climbing, performing with the MIT Concert Choir, and flying single-engine planes. (She earned her pilot’s license in November 2022.)

“I think I like going places I’ve never been and doing things I’ve never done before,” she says with characteristic restraint.

Her advisor, Uhler, says Zhang’s serene humility means there’s something surprising to be found “in every conversation.”

“Every time you learn something like, ‘Okay, now she’s learning to fly,'” Uhler says. “It’s just amazing. Whatever she does, she does it for the right reasons. She wants to be good at the things that interest her, which I think is really exciting.”

Zhang first became interested in biology as a high school student in Hangzhou, China. She enjoyed the fact that teachers couldn’t answer her questions in biology classes, which led her to consider it the “most interesting” topic to study.

Her interest in biology eventually turned into an interest in bioengineering. After her parents, who were both high school teachers, suggested she study in the United States, she pursued the latter, along with electrical engineering and computer science, as an undergraduate at the University of California, Berkeley.

Zhang was set to immediately begin her EECS PhD at MIT after graduating in 2020, but the Covid-19 pandemic delayed her first year. Still, she, Uhler, and two other co-authors published a paper in December 2022.

The paper was pioneered by Xiao Wang, one of the co-authors. She had previously worked at the Broad Institute to develop a form of spatial cell analysis that combined multiple forms of cell imaging and gene expression for the same cell, and also mapped the location of a cell within the tissue sample from which it came—something that had never been done before.

This innovation had many potential applications, including enabling fresh ways to track the progression of various diseases, but there was no way to analyze all the multimodal data the method produced. Zhang came along and became interested in designing a computational method that could.

The team focused on chromatin staining as the imaging method of choice, which is relatively low-cost but still reveals a lot of information about the cells. The next step was to integrate the spatial analysis techniques developed by Wang, and to that end Zhang began designing an autoencoder.

Autoencoders are a type of neural network that typically encodes and shrinks gigantic amounts of high-dimensional data, then expands the transformed data back to its original size. In this case, Zhang’s autoencoder did the opposite, taking the input data and making it high-dimensional. This allowed them to combine data from different animals and remove technical differences that weren’t due to significant biological differences.

In the paper, they used the technology, or STACI for low, to identify how cells and tissues reveal the progression of Alzheimer’s disease when viewed through a range of spatial and imaging techniques. The model could also be used to analyze any number of diseases, Zhang says.

Given unlimited time and resources, her dream would be to build a fully complete model of human life. Unfortunately, both time and resources are finite. Her ambition is not, however, and she says she wants to continue using her skills to solve “the hardest questions that we don’t have the tools to answer.”

He is currently working on completing several projects, one of which focuses on studying neurodegeneration by analyzing frontal cortex imaging and the other on predicting protein images based on protein sequences and chromatin imaging.

“There are still a lot of unanswered questions,” he says. “I want to choose questions that have biological significance, that help us understand things we didn’t know before.”

Latest Posts

More News