Artificial intelligence is already proving that it can speed up drug development and improve our understanding of diseases. But to transform artificial intelligence into novel treatments, we need to make the latest and most powerful models available to researchers.
The problem is that most scientists are not experts in machine learning. Today, OpenProtein.AI helps scientists stay at the forefront of artificial intelligence with a no-code platform that gives them access to powerful base models and a suite of tools for protein design, protein structure and function prediction, and training models.
The company, founded by Tristan Bepler PhD ’20 and former MIT associate professor Tim Lu PhD ’07, already equips researchers at pharmaceutical and biotech companies of all sizes with its tools, including internally developed core protein engineering models. OpenProtein.AI also offers its platform for free to academic researchers.
“This is a truly exciting time because these models can not only increase the efficiency of protein engineering – which shortens development cycles in therapeutic and industrial applications – but can also increase our ability to design new proteins with specific characteristics,” says Bepler. “We are also looking at applying these approaches to non-protein modalities. The overall picture is that we are creating a language for describing biological systems.”
Advances in biology thanks to artificial intelligence
Bepler came to MIT in 2014 as part of the computational and systems biology doctoral program, studying under Bonnie Berger, the Simons Professor of Applied Mathematics at MIT. It was there that he realized how little we knew about the molecules that make up the building blocks of biology.
“We haven’t characterized biomolecules and proteins well enough to create good predictive models of what, say, the entire genome circuit will behave, or how the protein interaction network will behave,” Bepler recalled. “I became interested in understanding proteins at a more detailed level.”
Bepler began exploring ways to predict the amino acid chains that make up proteins by analyzing evolutionary data. This was before Google released AlphaFold, a powerful protein structure prediction model. The work led to one of the first generative artificial intelligence models for understanding and designing proteins – what the team calls a protein language model.
“I was really excited about the classical protein framework and the relationships between their sequence, structure, and function. We don’t understand these relationships very well,” Bepler says. “So how could we use these basic models to skip the ‘structure’ component and go straight from sequence to function?”
After receiving his PhD in 2020, Bepler began a postdoctoral fellowship in Lu’s lab in MIT’s Department of Biological Engineering.
“It was around the time that the idea of integrating artificial intelligence with biology was starting to emerge,” Lu recalls. “Tristan helped us build better computational models for biological design. We also realized that there was a disconnect between the cutting-edge tools available and biologists who would like to use these things but don’t know how to program. OpenProtein was born from the idea of expanding access to these tools.”
As part of his PhD, Bepler was at the forefront of artificial intelligence. He knew this technology could assist scientists speed up their work.
“We started with the idea of building a general-purpose protein engineering platform based on machine learning in the loop,” says Bepler. “We wanted to build something user-friendly because machine learning ideas are quite esoteric. They require implementation, GPUs, tuning, sequence library design. Especially at that time, biologists had a lot to learn.”
The OpenProtein platform includes an intuitive web interface that allows biologists to transfer data and conduct protein engineering work using machine learning. It includes a range of open source models, including PoET, the flagship OpenProtein protein language model.
PoET, compact for Protein Evolutionary Transformer, was trained on groups of proteins to generate sets of related proteins. Bepler and his colleagues showed that you can generalize protein evolutionary constraints and incorporate up-to-date protein sequence information without requiring retraining, allowing other researchers to add experimental data to improve the model.
“Scientists can use their own data to train models and optimize protein sequences, and then they can use our other tools to analyze those proteins,” Bepler says. “People are creating libraries of protein sequences in silico [on computers] and then running them through predictive models to obtain validation and structural predictors. It’s basically a no-code interface, but we also have APIs for people who want to access it through code.”
The models assist scientists design proteins more quickly and then decide which ones are promising enough for further laboratory testing. Scientists can also introduce proteins of interest, and models can generate up-to-date ones with similar properties.
Since its founding, the OpenProtein team has continually added tools to its platform for researchers, regardless of their lab size or resources.
“We really tried to make the platform an open toolkit,” Bepler says. “It has specific workflows, but it’s not tied specifically to one protein function or class of proteins. One of the great advantages of these models is that they understand proteins very well across a broad scope. They learn about the entire space of possible proteins.”
Enabling next-generation therapies
Major pharmaceutical company Boehringer Ingelheim began using the OpenProtein platform in early 2025. The companies recently announced an expanded collaboration in which the OpenProtein platform and models will be embedded in Boehringer Ingelheim’s work in developing proteins to treat diseases such as cancer and autoimmune or inflammatory diseases.
Last year, OpenProtein also released a up-to-date version of its protein language model, PoET-2, which outperforms much larger models while using a petite fraction of computational resources and experimental data.
“We really want to solve the question of how we describe proteins,” Bepler says. “What is the meaningful, domain-specific protein constraint language that we use when generating them? How can we introduce more evolutionary constraints? How can we describe the enzymatic reaction carried out by a protein in such a way that the model can generate the sequences carrying out the reaction?
moving forward the founders hope to create models that take into account the changing, interconnected nature of protein function.
“The area that excites me goes beyond protein binding events and uses these models to predict and design dynamic features where a protein must engage two, three, or four biological mechanisms simultaneously or change its function upon binding,” says Lu, who currently serves in an advisory role at the company.
As the field of artificial intelligence advances, OpenProtein continues to see its mission as providing scientists with the best tools to develop up-to-date treatments faster.
“As work becomes more complex and approaches include elements such as protein logic and dynamic therapies, existing experimental toolkits become limiting,” Lu says. “It is really important to create open ecosystems around artificial intelligence and biology. There is a risk that artificial intelligence resources may become so concentrated that the average researcher will not be able to use them. Open access is extremely important for scientific progress.”
