Tests
In July 2022, we published AlphaFold protein structure predictions for nearly all catalogued proteins known to science. Read the latest blog post Here.
Today, I am extremely proud and excited to announce that DeepMind is making a significant contribution to humanity’s understanding of biology.
When we AlphaFold 2 announced last December it was announced as a solution to the 50-year-old protein folding problem. Last week we published thesis AND source code we explain how we created this incredibly creative system, and today we share it with you high quality forecasts for the shape of every single protein in the human body, as well as for the proteins of 20 additional organisms on which scientists base their research.
As scientists search for cures for diseases and solutions to other major challenges facing humanity—including antibiotic resistance, microplastic pollution, and climate change—they will benefit from recent insights into the structure of proteins. Proteins are like small, exquisite biological machines. Just as a machine’s structure tells us what it’s doing, a protein’s structure helps us understand its function. Today, we share a treasure trove of information it doubles understanding the human proteomeand reveals the structures of proteins found in 20 other biologically essential organisms, from E. coli to yeast and from fruit flies to mice.
“
It will be one of the most essential data sets since the mapping of the human genome.
Ewan Birney, Deputy Director General of EMBL and Director of EMBL-EBI
As a powerful tool to support the efforts of researchers, we believe this is the most significant contribution AI has made to scientific knowledge to date and a shining example of the benefits AI can bring to humanity. These insights will be the foundation for many electrifying future advances in our understanding of biology and medicine. Thanks to five years of tireless work and great ingenuity by the AlphaFold team, and close collaboration over the past few months with our partners in EMBL European Bioinformatics Institute (EMBL-EBI)we can share this enormous and valuable resource with the world.
Proteins are sophisticated biological machines. Their three-dimensional structures are often not only aesthetic but also functional, and are the building blocks of life.
The latest work is based on announcements We did last December at CASP14, when DeepMind unveiled a radical recent version of our AlphaFold system, which was hailed by the organizers of the evaluation as a solution to the 50-year-old grand challenge of understanding the three-dimensional structure of proteins. Experimentally determining protein structures is a time-consuming and tedious task, but AlphaFold demonstrated that AI can accurately predict protein shape, on a vast scale and in minutes, down to the atom. On CASPWe are committed to sharing our methods and ensuring broad access to this knowledge.
Improvement in median prediction accuracy in the free-modeling category for the best ensemble in each CASP, measured as the best of 5 GDTs.
This month, we completed a tremendous amount of tough work to fulfill that commitment. We published two peer-reviewed articles in Nature (1,2) and making the AlphaFold code available on an open source basis. Today, in cooperation with EMBL-EBIwe are incredibly proud to be able to launch AlphaFold Protein Structure Databasewhich offers the most complete and precise view of the human proteome to date, more than doubling humanity’s accumulated knowledge of the highly precise structures of human proteins.
In addition to the human proteome (all ~20,000 proteins expressed in the human genome), we provide open access to the proteomes 20 other organisms of biological importancein total, more than 350,000 protein structures. The study of these organisms has been the subject of countless scientific papers and numerous groundbreaking discoveries, and has led to a deeper understanding of life itself. In the coming months, we plan to significantly expand the scope to almost every sequenced protein known to science – over 100 million structures covering most of UniProt Reference Database. This is truly a world protein almanac. The system and database will be updated periodically as we continue to invest in future improvements to AlphaFold.
Most excitingly, in the hands of scientists around the world, this recent protein almanac will enable and accelerate research that will expand our understanding of these fundamental building blocks of life. Already, through our early collaborations, we have seen promising signals from researchers using AlphaFold in their own work. For example, Initiative “Cures for Neglected Diseases” (DNDi) They have advanced their research into life-saving drugs for diseases that disproportionately affect poorer parts of the world, and Enzyme Innovation Center at the University of Portsmouth (CEI) is using AlphaFold to facilitate design faster enzymes to recycle some of our most polluting single-use plastics. For those scientists who rely on experimental determination of protein structure, AlphaFold’s predictions have helped speed up their research. As another example, the team at University of Colorado Boulder it is promising to utilize AlphaFold predictions to study antibiotic resistance, while the group University of California, San Francisco he used them to increase your knowledge of the biology of SARS-CoV-2. And this is just the beginning of what we hope will be a revolution in structural bioinformatics. With AlphaFold in the world, there is a treasure trove of data now waiting to be transformed into future advances.
“
AlphaFold is opening up recent research horizons, and it’s inspiring to see cutting-edge AI being used to address diseases that almost exclusively affect impoverished populations.
Ben Perry, Discovery Open Innovation Program Lead, Drugs for Neglected Diseases Initiative (DNDi)
For the AlphaFold team at DeepMind, this work is the culmination of five years of enormous effort, including creatively overcoming many hard setbacks, which resulted in a host of recent, sophisticated algorithmic innovations that were needed to finally solve the problem. It builds on the discoveries of generations of scientists, from the early pioneers of protein imaging and crystallography to the thousands of prediction and structural biologists who have spent years experimenting with proteins. Our dream is that AlphaFold, by providing this foundational knowledge, will facilitate countless scientists in their work and open up entirely recent paths of scientific discovery.
“
What took us months and years, AlphaFold managed to do in one weekend.
Professor John McGeehan, Professor of Structural Biology and Director of the Centre for Enzyme Innovation (CEI) at the University of Portsmouth
At DeepMind, our thesis has always been that AI can radically accelerate breakthroughs in many fields of science and thus advance humanity. We have built Alpha-Composition and AlphaFold Protein Structure Database to support and elevate the efforts of scientists worldwide in the essential work they do. We believe that AI has the potential to revolutionize the way science is done in the 21st century, and we look forward to the discoveries that AlphaFold can facilitate the scientific community discover.
To learn more, go to Nature to read our peer-reviewed articles describing our full methodand human proteomeYou can read more about them in our technical blog. If you want to know our system, here it is sharing the AlphaFold source code AND Colab Notebook to run individual sequences. To study our structures, EMBL-EBI, a world leader in biological data, hosts them in searchable database which is open and free to everyone.
We would love to hear your feedback and understand how AlphaFold has been useful in your research. Share your stories at alphafold@deepmind.com.