Wednesday, May 20, 2026

Creating artificial intelligence models that understand chemical principles

Share

It is estimated that of all possible chemical compounds from 1020 and 1060 may have potential as diminutive molecule drugs.

Experimentally evaluating each of these compounds would be far too time-consuming for chemists. Therefore, in recent years, researchers have begun to exploit artificial intelligence to identify compounds that could be good drug candidates.

One of those researchers is MIT Associate Professor Connor Coley PhD ’19, a 1957 career development associate professor with joint appointments in the departments of chemical engineering, electrical engineering and computer science, and the MIT Schwarzman College of Computing. His research transcends the boundary between chemical engineering and computer science, as he develops and implements computational models to analyze the extensive number of possible chemical compounds, design recent compounds, and predict the reaction pathways that may generate these compounds.

“It’s a very general approach that can be applied to any application of organic molecules, but the main application we’re thinking about is small-molecule drug discovery,” he says.

The intersection of artificial intelligence and science

Coley’s interest in science runs in the family. In fact, he says, there are more scientists than non-scientists in his family, including his father, a radiologist; his mother, who earned a degree in molecular biophysics and biochemistry before going to MIT Sloan School of Management; and his grandmother, a mathematics professor.

As a high school student in Dublin, Ohio, Coley competed in Science Olympiads and graduated from high school at the age of 16. He then went to Caltech, where he chose chemical engineering as a major because it allowed him to combine his interests in science and mathematics.

During his undergraduate studies, he also developed an interest in computer science, working in a structural biology lab using the Fortran programming language to solve the crystal structure of proteins. After graduating from Caltech, he decided to pursue a degree in chemical engineering and came to MIT in 2014 to begin his Ph.D.

Under the advice of professors Klavs Jensen and William Green, Coley worked on ways to optimize automated chemical reactions. His work focused on combining machine learning and cheminformatics – the application of computational methods to analyze chemical data – to plan reaction pathways that could produce recent drug molecules. He also worked to design equipment that could be used to perform these reactions automatically.

Some of this work was done as part of a DARPA-funded program called Make-It, which focused on using machine learning and data analytics to improve the synthesis of drugs and other useful compounds from elementary building blocks.

“That was my real starting point for thinking about cheminformatics, about machine learning, and how we can use models to understand how different chemicals can be made and what reactions are possible,” Coley says.

Coley began applying for faculty positions while still an undergraduate and accepted an offer from MIT at the age of 25. He received various pieces of advice for and against accepting a job at the same school where he was studying, until he finally decided that the position at MIT was too tempting to turn down.

“MIT is a unique place in terms of resources and fluidity between departments. MIT seemed to be doing a really good job supporting the intersection of AI and science, and it was a vibrant ecosystem worth staying in,” he says. “The caliber of the students, their enthusiasm and just the incredible power of collaboration far outweighed any potential concerns about staying in the same place.”

Chemical intuition

Coley deferred his teaching position for a year to pursue a postdoctoral fellowship at the Broad Institute, where he sought more experience in chemical biology and drug discovery. There, he worked on ways to identify diminutive molecules from the billions of candidates in DNA-encoded libraries that might have binding interactions with mutant disease-related proteins.

Upon returning to MIT in 2020, he built a lab group whose mission was to exploit artificial intelligence not only to synthesize existing compounds with therapeutic potential, but also to design recent molecules with desired properties and recent ways of producing them. Over the past few years, his lab has developed a variety of computational approaches to achieve these goals.

“We try to think about how best to combine a challenge in chemistry with a potential computational solution. Often this combination motivates the development of new methods,” says Coley. One of the models developed in his lab, known as ShEPhERD, was trained to evaluate potential recent drug molecules based on their interactions with target proteins, based on the three-dimensional shapes of the drug molecules. This model is currently used by pharmaceutical companies to support them discover recent drugs.

“We’re trying to give the generative model more of a medicinal chemistry intuition so that the model is aware of the right criteria and considerations,” Coley says.

In another project, Coley’s lab developed a generative artificial intelligence model called FlowER that can be used to predict the reaction products that result from combining different chemicals.

To design this model, scientists used knowledge of basic physical principles, such as the law of conservation of mass. They also forced the feasibility model to include intermediate steps that must take place along the path from reactants to products. The researchers found that these constraints improved the accuracy of the model’s predictions.

“Thinking about these intermediate steps, the mechanisms involved, and how reactions evolve is something that chemists do very naturally. That’s how you learn chemistry, but it’s not something that models inherently think about,” Coley says. “We spent a lot of time thinking about how to ensure that our machine learning models are based on an understanding of reaction mechanisms, in the same way an experienced chemist would.”

Students in his lab also work on a wide variety of areas related to chemical reaction optimization, including computer-assisted structure elucidation, laboratory automation, and optimal experimental design.

“Through many different research strands, we hope to expand the boundaries of artificial intelligence in chemistry,” says Coley.

Latest Posts

More News