Friday, March 6, 2026

3 questions: Using computation to study the world’s best single-cell chemists

Share

Q: What led you to study microbes in extreme environments, and what are the challenges of studying them?

AND: Extreme environments are great places to look for fascinating biology. Growing up, I wanted to be an astronaut, and the closest thing to astrobiology is the study of extreme environments on Earth. And the only thing that lives in these extreme environments are microbes. On a sampling expedition I was on off the coast of Mexico, we discovered a colorful microbial mat about 2 km underwater that was blossoming because the bacteria were breathing sulfur instead of oxygen – but none of the microbes I had hoped to study grew in the lab.

The biggest challenge in studying microbes is that most of them cannot be cultured, which means the only method to study their biology is a method called metagenomics. My latest work is genomic language modeling. We hope to develop a computational system that will allow us to study the organism as precisely as possible “in silico”, using only sequence data. A genomic model of language is technically a enormous model of language, except that the language is DNA rather than human language. He is trained in a similar way, only in a biological language as opposed to English or French. If our goal is to learn the language of biology, we should take advantage of the diversity of microbial genomes. Even though we have a lot of data and as more samples become available, we have just scratched the surface of microbial diversity.

Q: Given how diverse microbes are and how little we know about them, how can studying microbes in silico using genomic language modeling advance our understanding of the microbial genome?

AND: The genome consists of many millions of letters. Man cannot look at it and understand it. However, we can program the machine to divide the data into useful parts. This is how bioinformatics works with a single genome. But if you look at a gram of soil, which can contain thousands of unique genomes, that’s simply too much data to work with – it takes a human and a computer to deal with that data.

During my PhD and master’s degrees, we were just discovering fresh genomes and fresh lineages that were so different from anything that had been characterized or grown in the lab. These were things we simply called “microbial dark matter.” When there are a lot of uncharacterized things, that’s when machine learning can be really useful because we’re just looking for patterns – but that’s not the ultimate goal. We hope to map these patterns to the evolutionary connections between every genome, every microbe, and every instance of life.

Previously, we thought of proteins as a separate entity – this gives us a decent level of information because proteins are related by homology, so things that are evolutionarily related can have a similar function.

It is known in microbiology that proteins are encoded in genomes, and the context in which that protein is constrained – what regions come before and after – is evolutionarily conserved, especially if functional coupling exists. This makes total sense because when you have three proteins that need to be expressed together because they form a unit, you might want them right next to each other.

I want to incorporate more genomic context into the way we search for and annotate proteins and understand protein function, so that we can go beyond sequence or structural similarity and add contextual information to how we understand proteins and hypothesize about their function.

Q: How can your research be applied to harness the functional potential of microorganisms?

AND: Microbes are probably the best chemists in the world. Harnessing microbial metabolism and biochemistry will lead to more sustainable and competent methods of producing fresh materials, fresh therapeutics and fresh types of polymers.

But it’s not just about efficiency – microorganisms perform chemical processes that we don’t even know how to think about. Understanding how microbes work and being able to understand their genome and functional capabilities will also be very crucial as we think about how our world and climate are changing. Microorganisms are responsible for most carbon sequestration and nutrient cycling; If we do not understand how a given microbe is able to fix nitrogen or carbon, then we will encounter difficulties in modeling nutrient flows on Earth.

On a more therapeutic side, infectious diseases pose a real and growing threat. Understanding how microbes behave in diverse environments compared to the rest of our microbiome is really crucial as we think about the future and combating microbial pathogens.

Latest Posts

More News