AI labs traveling the road to superintelligent systems realize they may have to take a different path.
According to several investors, founders and CEOs who spoke to TechCrunch, “AI scaling regulations,” the methods and expectations that labs have used to enhance the capabilities of their models over the past five years, are now showing signs of diminishing returns. Their feelings echo last reports indicating that models in leading AI labs are improving more slowly than they used to.
Everyone now seems to admit that you can’t just employ more computation and more data to pre-train immense language models and expect them to turn into some all-knowing digital god. It may sound obvious, but these scaling regulations were a key factor in the development of ChatGPT, refining it and probably influencing many CEOs bold predictions about the emergence of AGI in just a few years.
OpenAI and Secure Super Intelligence co-founder Ilya Sutskever told Reuters last week that “everyone is looking for the next thing” to scale your AI models. Earlier this month, a16z co-founder Marc Andreessen said on a podcast that AI models now seem to be coming together the same ceiling of possibilities.
But now, almost as soon as these disturbing trends began to emerge, AI CEOs, researchers and investors are already declaring that we have entered a up-to-date era of scaling regulations. “Test-time compute,” which gives AI models more time and computation to “think” before answering a question, is a particularly promising candidate for the next large thing.
“We are seeing the emergence of a new law on scaling,” said Microsoft CEO Satya Nadella on stage at Microsoft Ignite on Tuesday, addressing the fundamentals of computational research during testing OpenAI o1 model.
He’s not the only one currently pointing to o1 as the future.
“We are now in the second era of scaling regulations, which is scale-in-test,” Andreessen Horowitz partner Anjney Midha, who also sits on Mistral’s board and was an angel investor in Anthropic, told TechCrunch in a recent interview with TechCrunch. .
If the unexpected success – and now sudden slowdown – of previous AI scaling regulations tell us anything, it is that it is very tough to predict how and when AI models will improve.
Regardless, there appears to be a paradigm shift taking place: The way AI labs seek to improve their models over the next five years will likely bear no resemblance to the last five.
What are the laws regarding AI scaling?
The rapid improvements to the AI model that OpenAI, Google, Meta, and Anthropic have achieved since 2020 can largely be attributed to one key insight: employ more processing power and more data in the pre-training phase of the AI model.
When researchers provide machine learning systems with ample resources at this stage – where the AI identifies and stores patterns in immense datasets – the models tend to do a better job of predicting the next word or phrase.
This first generation of AI scaling regulations pushed the boundaries of what computers could do as engineers increased the number of GPUs used and the amount of data fed to them. Even if this particular method has been exhausted, the map has already been redrawn. Every Substantial Tech company has basically bet on artificial intelligence, while Nvidia, which supplies the GPUs on which all these companies train their models, is now the leader the most valuable listed company in the world.
However, these investments were also made with the assumption that scaling would continue as expected.
It is crucial to note that the laws of scaling are not laws of nature, physics, mathematics, or government. Nothing and no one guarantees they will continue at the same pace. Even Moore’s Law, another famed scaling law, eventually stopped holding – although it certainly lasted longer.
“If you just put in more calculations, put in more data, make the model bigger – there will be diminishing returns,” Anyscale co-founder and former CEO Robert Nishihara said in an interview with TechCrunch. “To maintain scaling regulations, to maintain the increasing pace of progress, we also need new ideas.”
Nishihara is quite familiar with the laws of AI scaling. Anyscale has achieved a billion-dollar valuation by developing software that helps OpenAI and other AI modelers scale AI training workloads to tens of thousands of GPUs. Anyscale is one of the biggest beneficiaries of scaling pre-training regulations in the compute space, but even its co-founder realizes the season is changing.
“When you read a million Yelp reviews, maybe more Yelp reviews won’t give you as much,” Nishihara said, referring to the limitations of data scaling. “But this is preliminary training. I would say that the post-training methodology is quite immature and leaves a lot of room for improvement.
To be clear, AI modelers will likely continue to seek larger compute clusters and larger datasets for initial training, and these methods will likely allow for further improvements. Elon Musk recently completed construction a supercomputer with 100,000 graphics processors, called Colossus, for training further xAI models. There will be more and larger clusters.
However, trends suggest that exponential growth is not possible by simply using more GPUs with existing strategies, so up-to-date methods are suddenly gaining more attention.
Computing under test: the next large step for the artificial intelligence industry
OpenAI improved its GPT models mainly thanks to classic scaling laws: more data, more power during pre-training. But now this method apparently doesn’t bring them much benefit. The structure of o1 models is based on a up-to-date concept, calculations during the test, the so-called computations at test time because computational resources are used after the prompt, not before. This technique has not yet been explored in detail in the context of neural networks, but it already shows promise.
Some are already pointing to test-time computation as another method for scaling AI systems.
“Many experiments show that while pre-training scaling laws can slow you down, at-test scaling laws – when you give the model more computation for inference – can provide greater performance gains,” said a16z’s Midha.
About a decade ago, Noam Brown, who currently leads OpenAI’s work on o1, tried to build artificial intelligence systems that could beat humans at poker. While last conversationBrown says he has already noticed poker players taking the time to consider different scenarios before playing a hand. In 2017 presented the method allow the model to “think” for 30 seconds before starting the game. During this time, the AI played various subgames, thinking about how different scenarios would play out to determine the best move.
Ultimately, the AI performed seven times better than his previous attempts.
It’s true that Brown’s 2017 research didn’t employ neural networks, which weren’t that popular at the time. But MIT researchers published a paper last week that shows just that Calculation during testing significantly improves the performance of the AI model on reasoning tasks.
It is not immediately clear how the calculations will be scaled during the test. This may mean that AI systems need a really long time to think through tough questions; maybe hours or even days. Another approach could be to allow the AI model to “think” about questions about multiple tokens at once.
Midha says that if test-time computation becomes the next place where AI systems scale, the demand for AI chips that specialize in brisk inference could enhance dramatically. This could be good news for startups like Groq and Cerebras, which specialize in brisk AI inference chips. If finding the answer is as computationally demanding as training the model, pick-and-flip AI providers will succeed again.
The AI world hasn’t panicked yet
Most of the AI world doesn’t seem to be losing its chilly over the slowdown of senior scaling regulations. Even if test-time calculations don’t turn out to be the next wave of scaling, some believe we’re only scratching the surface of the applications of current AI models.
Popular up-to-date products can give AI modelers some time to find up-to-date ways to improve their basic models.
“I’m absolutely confident that we’ll see at least a 10-20x increase in model performance just by working at the application level, just letting the models shine through intelligent prompts, UX decisions, and pushing context at the right time to the models,” Midha said.
For example, ChatGPT’s advanced voice mode is one of the more impressive applications among current AI models. However, this was largely a user experience innovation, not necessarily related to the underlying technology. You can see how further UX innovations, such as making this feature available to the web or to an app on your phone, will make the product much better.
Kian Katanforoosh, CEO of the startup AI Workera and an assistant professor of deep learning at Stanford University, tells TechCrunch that companies building AI applications like his don’t necessarily need exponentially smarter models to create better products. He also says that the products around the current models have a lot of room for improvement.
“Let’s say you’re building AI applications and your AI is hallucinating about a specific task,” Katanforoosh said. “There are two ways to avoid this. Either the LLM needs to improve and you will no longer have hallucinations, or the tools associated with it need to improve and you will have options to solve the problem.
Regardless of what happens at the frontier of artificial intelligence research, users will likely not feel the effects of these changes for some time. That said, AI labs will do whatever it takes to continue to deliver bigger, smarter, faster models at the same rapid pace. This means that several leading technology companies can now change how they push the boundaries of artificial intelligence.