Even the smartest Artificial intelligence models are essentially imitators. They learn either by consuming examples of human work or by trying to solve problems set for them by human instructors.
But perhaps AI can actually learn in a more human way – by coming up with compelling questions to ask itself and trying to find the right answer. Project z Tsinghua University, Beijing Institute of General Artificial Intelligence (BIGAI), and Pennsylvania State University shows that artificial intelligence can learn to reason this way by playing with computer code.
Scientists have developed a system called Absolutely zero rationality (AZR), which is the first to operate a vast language model to generate hard but solvable coding problems in Python. It then uses the same model to solve these problems before checking its operation by trying to run the code. Finally, the AZR system uses successes and failures as signals to improve the original model, increasing its ability to both pose better problems and solve them.
The team found that their approach significantly improved coding and reasoning skills in both the open-source 7 billion and 14 billion parameter versions of the Qwen language model. Impressively, this model even outperformed some models that received human-selected data.
I talked to Andrew Zhaograduate student at Tsinghua University who came up with the original idea for Absolute Zero, as well Zilong Zhengresearcher at BIGAI who worked with him on the project via Zoom.
Zhao told me that this approach is similar to the way human learning goes beyond rote memorization and imitation. “At first you imitate your parents and like the teachers, but then you basically have to ask your own questions,” he said. “And you may finally surpass those who taught you in school.”
Zhao and Zheng noted that the idea of learning artificial intelligence in this way, sometimes called “playing it yourself,” goes back many years and has previously been researched by, among others, Jürgen Schmidhuberrenowned pioneer of artificial intelligence and Pierre-Yves OudeyerIT specialist in Inria, France.
According to Zheng, one of the most thrilling elements of the project is the way it scales the model’s problem-posing and problem-solving skills. “The level of difficulty increases as the power of the model increases,” he says.
The key challenge is that, for now, the system only works for problems that can be easily checked, such as math or coding. As the project progresses, it may be possible to operate it for AI agent tasks such as browsing the Internet or performing office work. This may involve the AI model attempting to assess whether the agent’s actions are correct.
The fascinating possibility of an approach like Absolute Zero is that it could theoretically allow models to go beyond human learning. “Once we achieve this, it will be a kind of way to reach superintelligence,” Zheng told me.
There are early signs that the absolute zero approach is gaining traction in some vast AI labs.
The so-called project Agent0from Salesforce at Stanford and the University of North Carolina at Chapel Hill involves an agent using a software tool that improves through self-play. As with absolute zero, the model is better at general reasoning through experimental problem solving. AND recent article written by researchers at Meta, the University of Illinois and Carnegie Mellon University presents a system that uses a similar type of self-play in software engineering. The authors of this work suggest that it represents “a first step toward training paradigms for superintelligent software agents.”
Finding modern ways for artificial intelligence to learn will likely be a major topic in the tech industry this year. As conventional data sources become scarcer and more pricey and labs look for modern ways to make models more competent, a project like Absolute Zero could lead to artificial intelligence systems that look less like copycats and more like humans.
