For OpenAI, o1 is a step toward the broader goal of human-like AI. More practically, it’s better at writing code and solving multistep problems than previous models. But it’s also more steep and slower to operate than GPT-4o. OpenAI calls this version of o1 a “preview,” to emphasize how much of a beginner it is.
ChatGPT Plus and Team users are getting access to o1-preview and o1-mini starting today, while Enterprise and Edu users will get access early next week. OpenAI says it plans to make o1-mini available to all free ChatGPT users, but has not yet set a release date. Programmer access to o1 is Really steep: In the API, o1-preview costs $15 for 1 million input tokens or text fragments analyzed by the model and $60 for 1 million output tokens. For comparison, GPT-4o costs $5 for 1 million input tokens and $15 for 1 million output tokens.
The training behind the o1 is fundamentally different from its predecessors, OpenAI research manager Jerry Tworek tells me, although the company doesn’t provide many details. He says the o1 “was trained using an entirely new optimization algorithm and a new training dataset tailored specifically to it.”
OpenAI trained previous GPT models to mimic patterns in training data. In the case of o1, it trained the model to solve problems on its own, using a technique known as reinforcement learning, which teaches the system through rewards and punishments. It then uses a “chain of thought” to process queries, much like humans process problems, walking through them step by step.
As a result of this novel training methodology, OpenAI says the model should be more right. “We noticed that this model has fewer hallucinations,” Tworek says. But the problem remains. “We can’t say we’ve solved the hallucinations.”
The main feature that sets this novel model apart from GPT-4o is its ability to solve sophisticated problems, such as coding and mathematical calculations, much better than its predecessors, and also, according to OpenAI, its ability to explain its reasoning.
“The model definitely does a better job of solving the AP math test than I did, and I was a math minor in college,” Bob McGrew, OpenAI’s director of research, tells me. He says that OpenAI also tested o1 on the International Mathematical Olympiad qualifying exam, and while GPT-4o only got 13 percent of the problems correct, o1 got 83 percent right.
“We can’t say we’ve solved the problem of hallucinations”
In online programming competitions known as Codeforces competitions, this novel model reached the 89th percentile of participants, and OpenAI says the next update of the model will perform “similarly to PhD students solving difficult test problems in physics, chemistry, and biology.”
At the same time, o1 isn’t as capable as GPT-4o in many areas. It doesn’t handle factual knowledge of the world as well. It also lacks the ability to browse the web or process files and images. Still, the company believes it represents a whole novel class of capabilities. It’s been named o1 to indicate “resetting the counter back to 1.”
“I’ll be honest: I think we’re terrible at naming, traditionally,” McGrew says. “So I hope this is the first step toward new, more sensible names that better communicate what we do to the rest of the world.”
I wasn’t able to demonstrate the o1 myself, but McGrew and Tworek showed it to me over video this week. They asked him to solve this puzzle:
“The princess is as old as the prince will be when the princess is twice as old as the prince, when the princess’s age is half of their present ages. How old are the prince and the princess? Give all solutions to this question.”
The model buffered for 30 seconds, then gave the correct answer. OpenAI designed the interface to show the steps of reasoning as the model thought. What was striking to me was not that it showed its work—GPT-4o can do that if asked—but how deliberately o1 seemed to mimic human thinking. Phrases like “I’m curious,” “I’m thinking about it,” and “Okay, let me see” created the illusion of step-by-step thinking.
But this model doesn’t think, and it’s certainly not human. So why design it to look like it is?
OpenAI doesn’t believe in equating AI model thinking to human thinking, Tworek said. But the interface is meant to show how the model spends more time processing and delving into problem-solving, he says. “There are ways in which it feels more human than previous models.”
“I think you’ll see that there are a lot of ways where it feels a little alien, but there are also ways where it feels surprisingly human,” McGrew says. The model is given a narrow amount of time to process the queries, so it might say something like, “Oh, I’m running out of time, let me find the answer real quick.” Early on, in the chain of thought, it might also seem like it’s brainstorming and say something like, “I could do this or that, what should I do?”
Huge language models are not that sharp as they exist today. They basically predict sequences of words to get an answer based on patterns learned from huge amounts of data. Take ChatGPT for example, which tends to they wrongly claim that the word “strawberry” has only two R’s because it doesn’t break down the word correctly. If that means anything, the novel o1 model broke down the query correctly.
OpenAI is set to raise more funds, according to sources at a staggering valuation of $150 billionits dynamics depend on further breakthroughs in research. The company is bringing reasoning capabilities to LLM because it sees a future with autonomous systems, or agents, that can make decisions and take actions on your behalf.
For AI researchers, cracking reasoning is the next large step toward human-level intelligence. The thinking is that if a model can do more than recognize patterns, it could unlock breakthroughs in fields like medicine and engineering. For now, though, o1’s reasoning abilities are relatively leisurely, not agent-like, and steep for developers to operate.
“We spent many months working on reasoning because we think this is a really groundbreaking breakthrough,” McGrew says. “It’s basically a new modality for models to be able to solve the really hard problems that are needed to get to near-human intelligence.”
