A few years before ChatGPT started chattering, Google developed an entirely different kind of AI program, called AlphaGo, that, through relentless practice, learned to play the board game Go with superhuman skill.
Scientists at the company have just published research combining the capabilities of a gigantic language model (the artificial intelligence behind today’s chatbots) with the capabilities of AlphaZero, a successor to AlphaGo that can also play chess, to solve very complex mathematical proofs.
Their modern Frankenstein-esque creation, dubbed AlphaProof, has proven its mettle by solving several 2024 problems International Mathematical Olympiad (IMO), a prestigious competition for high school students.
AlphaProof uses the Gemini gigantic language model to convert naturally formed mathematical questions into a programming language called Lean. This provides training material for the second algorithm, which learns through trial and error how to find proofs that can be confirmed to be correct.
Earlier this year, Google DeepMind another mathematical algorithm revealed called AlphaGeometry, which also combines a language model with another AI approach. AlphaGeometry uses Gemini to convert geometric problems into a form that can be manipulated and tested with a program that handles geometric elements. Google also announced a modern and improved version of AlphaGeometry today.
The researchers found that their two math programs could provide proofs for IMO puzzles just as well as the silver medalist. The programs solved two algebra problems and one number theory problem out of six in total. They solved one problem in a few minutes, but the others took several days to solve. Google DeepMind did not disclose how much computing power it devoted to the problems.
Google DeepMind calls the approach used in AlphaProof and AlphaGeometry “neuro-symbolic” because it combines the pure machine learning of an artificial neural network, the technology underpinning most recent advances in AI, with a conventional programming language.
“What we saw here was that you could combine the approach that was so successful, and things like AlphaGo, with large language models and create something that was incredibly efficient,” says David Silver, a Google DeepMind researcher who led the work on AlphaZero. Silver says the techniques demonstrated with AlphaProof should, in theory, extend to other areas of mathematics.
Indeed, these studies raise the prospect of addressing the worst tendencies of gigantic language models by applying logic and reasoning in a more grounded way. While gigantic language models can be wonderful, they often struggle to understand even basic mathematics or reason logically about problems.
In the future, the neural symbolic method could provide AI systems with a way to transform questions or tasks into a form that can be rationalized in a way that produces reliable results. Rumors also suggest that OpenAI is working on such a system, codenamed “Strawberry”.
But there’s a key limitation to the systems revealed today, Silver acknowledges. The mathematical solutions are either correct or incorrect, which allows AlphaProof and AlphaGeometry to work toward the right answer. Many real-world problems—thinking about the ideal route to travel, for example—have multiple possible solutions, and which one is ideal can be unclear. Silver says that a solution for more ambiguous questions might be for the language model to try to figure out what constitutes a “correct” answer during training. “There’s a spectrum of different things you can try,” he says.
Silver also notes that Google DeepMind won’t put mathematicians out of work. “Our goal is to create a system that can prove anything, but that’s not the end of what mathematicians do,” he says. “A big part of math is posing problems and finding interesting questions to ask. You could think of it as another tool, similar to a slide rule or a calculator or computational tools.”
