Tests
A modern AI tool classifies the effects of 71 million “faulty” mutations.
Discovering the root causes of disease is one of the greatest challenges in the field of human genetics. With millions of possible mutations and constrained experimental data, it is largely still a mystery which ones can cause disease. This knowledge is crucial for faster diagnosis and development of life-saving therapies.
Today we publish approx catalog “missense” mutations, where researchers can learn more about their effects. Missense variants are genetic mutations that can affect the function of human proteins. In some cases, they can lead to diseases such as cystic fibrosis, sickle cell anemia or cancer.
The AlphaMissense catalog was developed using AlphaMissense, our modern AI model that classifies missense variants. In an article published in Sciencewe show that it classified 89% of all 71 million possible missense variants as likely pathogenic or likely benign. In contrast, only 0.1% were confirmed by experts.
Artificial intelligence tools that can accurately predict the impact of variants can accelerate research in fields ranging from molecular biology to clinical and statistical genetics. Experiments aimed at discovering disease-causing mutations are costly and labor-intensive – each protein is unique and each experiment must be designed separately, which can take months. Using AI predictions, researchers can preview results for thousands of proteins at once, which can facilitate prioritize resources and accelerate more elaborate research.
We have made all of our forecasts available for commercial and research apply, and we have made them open source AlphaMissense model code.
AlphaMissense predicted the pathogenicity of all possible 71 million missense variants. It classified 89% – predicting that 57% were probably benign and 32% were probably pathogenic.
What is a missense variant?
A missense variant is the substitution of a single letter in DNA, resulting in a different amino acid in the protein. If you think of DNA as a language, changing one letter can change a word and completely change the meaning of a sentence. In this case, the substitution changes the amino acid being translated, which can affect the function of the protein.
The average person carries over 9,000 variants of change of meaning. Most of them are benign and have little or no effect, but others are pathogenic and can seriously disrupt protein function. Missense variants can be used in the diagnosis of scarce genetic diseases, where several or even one missense variant can directly cause the disease. They are also crucial in studying elaborate diseases such as type 2 diabetes, which can be caused by a combination of many different types of genetic changes.
Classifying missense variants is an crucial step in understanding which of these protein changes may cause disease. Of the more than 4 million missense variants that have already been observed in humans, only 2% have been considered pathogenic or benign by experts, representing approximately 0.1% of the total 71 million possible missense variants. The rest are considered “variants of unknown significance” due to a lack of experimental or clinical data on their impact. Thanks to AlphaMissense, we now have the clearest picture yet, classifying 89% of variants using a threshold of 90% precision across a database of known disease variants.
Pathogenic or benign: How AlphaMissense classifies variants
AlphaMissense is based on our groundbreaking model AlphaFold, who predicted the structures of almost all proteins known to science based on their amino acid sequences. Our adapted model can predict the pathogenicity of missense variants that change individual amino acids of proteins.
To train AlphaMissense, we refined AlphaFold on labels, distinguishing between variants observed in human and closely related primate populations. Variants that are commonly seen are treated as benign, and variants never seen are treated as pathogenic. AlphaMissense does not predict changes in protein structure due to mutations or other effects on protein stability. Instead, it uses databases of related protein sequences and the structural context of the variants to produce a score from 0 to 1 that roughly assesses the likelihood that the variant is pathogenic. The continuous score allows users to select a threshold for classifying variants as pathogenic or benign that meets their accuracy requirements.
An illustration of how AlphaMissense classifies human sense variants. A missense variant is introduced and the AI system rates it as pathogenic or possibly benign. AlphaMissense combines structural context and protein language modeling and is refined based on human and primate population frequency databases.
AlphaMissense achieves state-of-the-art predictions across a wide range of genetic and experimental benchmarks, all without direct training on such data. Our tool outperformed other computational methods when used to classify variants from ClinVar, a public archive of data on the association between human variants and disease. Our model was also the most exact method for predicting laboratory results, demonstrating that it is consistent with various ways of measuring pathogenicity.
AlphaMissense outperforms other computational methods in predicting the effects of missense variants.
Left: Performance comparison of AlphaMissense and other methods for classifying variants from the Clinvar public archive. The methods shown in gray were trained directly in ClinVar, and their performance on this benchmark is likely overestimated because some of their training variants are included in this test set.
Normal: A chart comparing the performance of AlphaMissense and other methods in predicting measurements from biological experiments.
Building a community resource
AlphaMissense builds on AlphaFold to advance the world’s understanding of proteins. We released it a year ago 200 million protein structures predicted using AlphaFold – which helps millions of scientists around the world accelerate research and pave the way to modern discoveries. We look forward to seeing how AlphaMissense can facilitate solve open questions at the heart of genomics and life sciences.
We have made AlphaMissense predictions available to both the commercial and scientific communities. Together with EMBL-EBI, we raise their usefulness by: Ensembl Variant Effect Predictor.
In addition to our missense mutation lookup table, we have provided extended predictions of all possible 216 million single amino acid sequence substitutions in over 19,000 human proteins. We also included the average prediction for each gene, which is similar to measuring a gene’s evolutionary constraints – it indicates how crucial the gene is to the survival of the organism.
Examples of AlphaMissense predictions superimposed on AlphaFold predicted structures (red = predicted to be pathogenic, blue = predicted to be benign, gray = uncertain). Red dots represent known pathogenic missense variants, blue dots represent known benign variants from the ClinVar database.
Left: HBB protein. Variants of this protein can cause sickle cell disease.
Normal: CFTR protein. Variants of this protein can cause cystic fibrosis.
Accelerating research on genetic diseases
A key step in translating the results of this research is collaboration with the scientific community. We worked with Genomics England to explore how these predictions could facilitate research the genetics of scarce diseases. Genomics England compared the AlphaMissense findings with pathogenicity variant data previously collected in humans. Their evaluation confirmed that our predictions are exact and consistent, providing another real-world benchmark for AlphaMissense.
Although our predictions are not intended to be directly applicable to clinical practice – and must be interpreted based on other sources of evidence – this work has the potential to improve the diagnosis of scarce genetic disorders and aid in the discovery of modern disease-causing genes.
Ultimately, we hope that AlphaMissense, along with other tools, will enable scientists to better understand diseases and develop modern life-saving treatments.