Why is ChatGPT so bad at math?

Share

If you’ve ever tried using ChatGPT as a calculator, you’ve almost certainly noticed the following: The chatbot is bad at math. And it is no exception among artificial intelligence in this respect.

Anthropic Claudius I can’t solve basic word problems. Twins doesn’t understand quadratic equations. And Meta Lama struggles with simplicity addition.

So how is it that these bots can write monologues and yet still get carried away with grade-school arithmetic?

Tokenization has something to do with this. The process of dividing data into chunks (e.g. splitting the word “fantastic” into the syllables “fan”, “tas” and “tic”) tokenization helps the AI densely encode information. However, since tokenizers – the AI models that tokenize – don’t actually know what the numbers are, they often end up failing destroying relationships between digits. For example, a tokenizer might treat the number “380” as one token, but represent “381” as a pair of digits (“38” and “1”).

But tokenization isn’t the only reason why math is a delicate point in AI.

AI systems are statistical machines. Trained on many examples, they learn patterns from those examples to make predictions (e.g., the phrase “to whom” in an email often precedes the phrase “may concern”). For example, given the multiplication problem 5.7897 x 1.2832, ChatGPT – after seeing many multiplication problems – will likely infer the product of a number ending in “7” and a number ending in “2” will end in “4. ” But there will be a problem with the middle part. ChatGPT gave me the answer 742 021 104; the correct one is 742 934 304.

Yuntian Deng, an assistant professor at the University of Waterloo specializing in artificial intelligence, closely compared ChatGPT’s multiplication capabilities in a study conducted earlier this year. He and his co-authors found that the default model, GPT-4o, had difficulty multiplying above two numbers with more than four digits each (e.g., 3459 x 5284).

“GPT-4o struggles with multi-digit multiplication, achieving less than 30% accuracy for four-digit-to-four-digit problems,” Deng told TechCrunch. “Multiplication of multiple digits poses a challenge to language models because error at any intermediate step can accumulate, leading to incorrect final results.”

Is o1 OpenAI a good calculator? We tested this on multiplications up to 20×20 – o1 solves multiplications up to 9×9 with decent accuracy, while gpt-4o handles multiplications above 4×4. For context, this task can be solved with a diminutive LM using a latent CoT with gradual internalization. 1/4 pic.twitter.com/et5DB9bhNL

— Yuntian Deng (@yuntiandeng) September 17, 2024

So will math skills escape ChatGPT forever? Or is there reason to believe that one day a bot will become as adept at using numbers as humans (or TI-84, for that matter)?

Deng is hopeful. In the study, he and his colleagues also tested o1, an OpenAI “reasoning” model that recently appeared in ChatGPT. O1, which “thinks through” problems step by step before answering them, performed significantly better than GPT-4o, solving nine-digit-by-nine-digit multiplication problems about half the time.

“A model can solve a problem in a different way than what we solve manually,” Deng said. “This makes us curious about the model’s internal approach and how it differs from human reasoning.”

Deng believes that progress indicates that at least some types of math problems – multiplication problems being one of them – will eventually be “fully solved” by ChatGPT-like systems. “It’s a well-defined task with known algorithms,” Deng said. “We are already seeing significant improvement from GPT-4o to o1, so it is clear that there is an improvement in reasoning ability.”

Just don’t get rid of your calculator any time soon.

The AI Sckool

Categories

Why is ChatGPT so bad at math?

Nvidia ads, news and more, from GTC 2025

Scientists think they have found a brain region that regulates conscious perception

Five and overheating, most humanoid robots do not end the half -marathon in Beijing

Chatbot from customer service AI submitted the company’s rules – and created a mess

Zoom launches agency mobile AII messages for first line staff

More News

Chatgpt will now exploit its “memory” to personalize internet search

Up-to-date reasoning of AI OpenAi Hallucinations more

Chatgpt refers to users according to their names, and some think it’s “terrifying”

The latest viral CHATGPT trend is “searching for the opposite location” from photos

Nvidia ads, news and more, from GTC 2025

Scientists think they have found a brain region that regulates conscious perception

Five and overheating, most humanoid robots do not end the half -marathon in Beijing