On Thursday, OpenAI released a $200-a-month chatbot — and the AI community wasn’t quite sure what to make of it.
The company’s up-to-date ChatGPT Pro plan gives you access to “o1 pro mode,” which OpenAI says “uses more processing power to get the best answers to your toughest questions.” An improved version of OpenAI’s o1 reasoning model, o1 pro mode, should answer questions related to science, math and coding in a more “robust” and “comprehensive” way, OpenAI says.
Almost immediately, people started asking him to draw unicorns:
I asked ChatGPT mode o1 Pro to create a unicorn SVG.
(This is the model you have access to for $200 per month) pic.twitter.com/h9HwY3aYwU
— Rammy (@rammydev) December 5, 2024
And design a “crab-based” computer:
Finally introducing o1-pro for final apply. pic.twitter.com/nX4JAjx71m
— Ethan Mollick (@emollick) December 6, 2024
And wax poetry about the meaning of life:
I just signed up for an OpenAI subscription for $200/month.
Please reply with questions to ask and I will re-post them in this thread. pic.twitter.com/oTQxbPxnoP— Garrett Scott 🕳 (@thegarrettscott) December 5, 2024
But many people on X didn’t seem convinced that o1 pro replies were in the $200 range.
“Has OpenAI provided any specific examples of hints that failed in regular o1 but succeeded in o1-pro?” he asked British computer scientist Simon Willison. “I want to see a single specific example that shows his advantage.”
It’s a reasonable question; after all, it is the most exorbitant chatbot subscription in the world. The service comes with other benefits, such as removal of rate limits and unlimited access to other OpenAI models. But $2,400 a year isn’t bullshit, and the value proposition of the o1 pro mode in particular remains unclear.
It didn’t take long to find the failure cases. O1 pro mode has problems with Sudoku and is interrupted by an optical illusion joke that is obvious to any human being.
Both o1 and o1-pro failed here, probably still due to vision limitations (same with Sudoku puzzles)https://t.co/mAVK7WxBrq pic.twitter.com/O9boSv7ZGt
— Tibor Blaho (@btibor91) December 5, 2024
OpenAI’s internal tests show that o1 pro mode performs only slightly better than standard o1 for coding and math problems:
OpenAI conducted a “more stringent” evaluation in the same tests to demonstrate the consistency of o1 pro mode: a model was only considered to have solved a question if it answered correctly in four out of four cases. But even in these tests the improvement was not dramatic:
OpenAI CEO Sam Altman once wrote that OpenAI is on the right track path “towards an intelligence too cheap to measure” – he was forced to do so explain many times on Thursday that ChatGPT Pro isn’t for most people.
“Most users will be very happy with o1 w [ChatGPT] Plus level!” – he said in X. “Almost everyone will be best served by our free tier or Plus tier.”
So who is this for? Are there really people who are willing to pay $200 a month to ask questions about toys like “Write a 3-paragraph essay about strawberries without using the letter “e”” Or “solve this Mathematical Olympiad task“? Will they happily part with their hard-earned money, with little guarantee that the standard o1 won’t be able to satisfactorily answer the same questions?
I asked Ameet Talwalkar, associate professor of machine learning at Carnegie Mellon and venture partner at Amplify Partners, for his opinion. “I think it’s a big risk to raise the price tenfold,” he told TechCrunch by email. “I think in just a few weeks we’ll have a much better sense of the appetite for this functionality.”
UCLA computer scientist Guy Van den Broeck was more candid in his assessment. “I don’t know if this price makes sense,” he told TechCrunch, “and whether expensive reasoning models will be the norm.”
o1 is “better than most people at most tasks” because, yes, humans only exist in disembodied, amnestic, multi-spin chat interfaces https://t.co/zbLY2BG5pQ
— Aidan McLau (@aidan_mclau) December 6, 2024
The generous view is that this is a marketing mistake. Describing o1 pro mode as the best at solving “toughest problems” doesn’t say much to potential customers. Neither unclear statements about how the model can “think longer” and demonstrate “intelligence”. As Willison points out, without concrete examples of the supposedly improved capabilities, it’s tough to justify paying more at all, let alone 10 times the price.
As far as I know, the target audience is experts in specialized fields. OpenAI says it plans to provide a handful of medical researchers from “leading institutions” with free access to ChatGPT Pro, which will include o1 pro mode. Errors are of paramount importance in healthcare and, as Bob McGrew, former research director of OpenAI, stated: excellent on X, greater reliability is probably the main unlock of o1 pro mode.
I play a bit with o1 and o1-pro.
They are very good and a little weird. They are also not intended for most people most of the time. You really need to have particularly complex problems to solve to get value from it. But if you have such problems, it is a very grave matter.
— Ethan Mollick (@emollick) December 5, 2024
McGrew too he thought o1 pro mode is an example of what he calls “intelligence overhang”: users (and perhaps modelers) don’t know how to extract value from “additional intelligence” due to the fundamental limitations of a basic, text-based interface. As with other OpenAI models, the only way to interact with o1 pro mode is through ChatGPT, and – according to McGrew – ChatGPT is not perfect.
However, it is also true that $200 sets high expectations. Judging by its early reception on social media, ChatGPT Pro isn’t exactly a hit.