Wednesday, January 15, 2025

Novel computational chemistry techniques are accelerating the prediction of molecules and materials

Share

In the venerable days – really venerable times – material design was labor-intensive. For over 1,000 years, researchers had been trying to make gold by combining lead, mercury and sulfur in what they hoped was the right proportion. Even famed scientists such as Tycho Brahe, Robert Boyle, and Isaac Newton tried their hand at the fruitless endeavor we call alchemy.

Materials science has obviously come a long way. For the last 150 years, researchers have used the periodic table, which shows that different elements have different properties and cannot magically transform into one another. Moreover, over the last decade, machine learning tools have greatly improved our ability to determine the structure and physical properties of various molecules and substances. Novel research by a group led by Ju Li, a professor of nuclear engineering at Tokyo Electric Power Company at MIT and professor of materials science and engineering, holds promise for a significant leap in capabilities that could make materials design easier. The results of their investigation are described in December 2024 issue

Currently, most machine learning models used to characterize molecular systems are based on density functional theory (DFT), which offers a quantum mechanical approach to determining the total energy of a molecule or crystal by analyzing the electron density distribution – essentially the average number of electrons found in a unit volume around every point in space near the particle. (Walter Kohn, who co-created this theory 60 years ago, received the Nobel Prize in Chemistry for it in 1998). Although this method has proven to be very effective, it does have some drawbacks, according to Li: “First, the accuracy is not uniformly great. Second, it says only one thing: the lowest total energy of the molecular system.

“Couples therapy” comes to the rescue.

His team is currently relying on another computational chemistry technique also derived from quantum mechanics, known as coupled cluster theory, or CCSD(T). “This is the gold standard of quantum chemistry,” Li comments. The results of CCSD(T) calculations are much more accurate than those that can be obtained from DFT calculations and can be as reliable as those currently obtained from experiments. The problem is that doing these calculations on a computer is very slow, he says, “and the scaling is bad: if we double the number of electrons in the system, the calculations become 100 times more steep.” For this reason, CCSD(T) calculations have typically been limited to molecules with a small number of atoms – on the order of about 10. Anything above this number would simply take too long.

This is where machine learning comes in handy. CCSD(T) calculations are first performed on conventional computers, and then the results are used to train a neural network according to a novel architecture specifically developed by Li and his colleagues. Once trained, the neural network can perform the same computations much faster using approximation techniques. Moreover, their neural network model can extract much more information about a molecule than just its energy. “In previous work, people used many different models to evaluate different properties,” says Hao Tang, an MIT doctoral candidate in materials science and engineering. “Here we use just one model to evaluate all of these properties, which is why we call it a ‘multi-task’ approach.”

The “Multipurpose Electronic Hamiltonian Network” (MEHnet) sheds light on a number of electron properties, such as dipole and quadrupole moments, electronic polarizability, and the optical excitation gap – the amount of energy needed to take an electron from the ground state to the lowest excited state. “The excitation slit affects the optical properties of materials,” explains Tang, “because it determines the frequency of light that can be absorbed by the molecule.” Another advantage of their CCSD-trained model is that it can reveal properties of not only ground states but also excited states. The model can also predict the infrared absorption spectrum of a molecule in conjunction with its vibrational properties, when the vibrations of atoms in the molecule are coupled to each other, leading to different collective behaviors.

The strength of their approach depends largely on the network architecture. Based on the work of an MIT assistant professor Tess Smiththe team uses a so-called E(3-equivalent graph neural network), says Tang, “where the nodes represent atoms and the edges connecting the nodes represent the bonds between atoms. We also use custom algorithms that incorporate principles of physics – related to the way humans calculate molecular properties in quantum mechanics – directly into our model.”

Testing, 1, 2 3

After testing against analysis of known hydrocarbon molecules, the model by Li et al. outperformed DFT counterparts and closely matched experimental results taken from the published literature.

Qiang Zhu, a materials discovery specialist at the University of North Carolina at Charlotte (who was not involved in this study), is impressed with what has been achieved so far. “Their method enables effective training using a small dataset while achieving the highest accuracy and computational efficiency compared to existing models,” he says. “This is exciting work that illustrates the powerful synergy between computational chemistry and deep learning, offering fresh ideas for developing more accurate and scalable methods for creating electronic structures.”

The MIT group first applied their model to small non-metallic elements – hydrogen, carbon, nitrogen, oxygen and fluorine, which can be used to make organic compounds – and have since moved on to study heavier elements: silicon, phosphorus, sulfur, chlorine and even platinum . Once trained on small molecules, the model can be generalized to larger and larger molecules. “Previously, most calculations were limited to analyzing hundreds of atoms with DFT and just tens of atoms with CCSD(T) calculations,” says Li. “Now we’re talking about handling thousands of atoms, and eventually perhaps tens of thousands.”

For now, scientists are still evaluating known molecules, but the model can be used to characterize molecules that have not been seen before, as well as predict the properties of hypothetical materials composed of different types of molecules. “The idea is to use our theoretical tools to select promising candidates that meet a certain set of criteria before suggesting them to the experimenter for testing,” Tang says.

It all depends on the application

Looking to the future, Zhu is optimistic about possible applications. “This approach holds the potential for high-throughput molecular screening,” he says. “This is a task where achieving chemical accuracy may be essential to identifying new molecules and materials with desired properties.”

Once they demonstrate the ability to analyze large molecules, perhaps composed of tens of thousands of atoms, Li says, “we should be able to invent fresh polymers or materials” that could be used in drug design or semiconductor devices. Studying heavier transition metal elements could lead to fresh battery materials – an area of ​​urgent need right now.

The future, as Li sees it, is wide open. “It’s not about one area anymore,” he says. “Ultimately, our ambition is to cover the entire periodic table with CCSD(T)-level accuracy, but at lower computational costs than DFT. This should enable us to solve a wide range of problems in chemistry, biology and materials science. It is currently hard to determine how wide this range may be.

This work was supported by the Honda Research Institute. Hao Tang acknowledges the support of the Mathworks Engineering Fellowship. The calculations in this work were performed in part on the Matlantis swift universal nuclear simulator, Texas Advanced Computing Center, MIT SuperCloud, and National Energy Research Scientific Computing.

Latest Posts

More News