Join our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more
Immense language models (LLMs) have shown impressive performance in a variety of reasoning and problem-solving tasks. However, there are questions about how these reasoning abilities work and what their limitations are.
IN new studyresearchers from University of California, Los AngelesAND Amazon conducted a comprehensive study of the deductive and inductive reasoning capabilities of LLMs. Their findings show that while LLMs can be very good at figuring out task rules from solved examples, they are restricted in following specific instructions. The findings could have essential implications for how we exploit LLMs in applications that require reasoning.
Inductive vs. Deductive Reasoning
Reasoning can generally be divided into two distinct types: deductive and inductive. Deductive reasoning, often described as “top-down” logic, starts with a general principle or rule and applies it to infer specific conclusions. For example, when a formula for converting Celsius to Fahrenheit is given, it can be used to calculate up-to-date measurements.
Inductive reasoning, on the other hand, takes a “bottom-up” approach. It involves observing specific cases or examples and drawing general conclusions or patterns from them. For example, you might observe several Celsius and Fahrenheit measurements on a thermometer and try to infer a pattern that converts one into the other.
Both types of reasoning are vital to intelligence, but they involve different cognitive processes. And while LLMs are often assessed for their reasoning abilities, most studies do not make a clear distinction between their inductive and deductive abilities.
Modern Framework for Testing LLM Reasoning
For example, in an arithmetic task, the researchers tested LLM students’ ability to apply a given mathematical function to solve problems (deductive reasoning) and their ability to infer an underlying mathematical function from a set of input and output examples (inductive reasoning).
To further separate inductive from deductive reasoning, the researchers developed SolverLearner, a two-step program that isolates and evaluates the inductive reasoning process in LLM.
SolverLearner first asks the LLM to generate a function that maps input data points to their corresponding output values based solely on a set of input-output examples. This step focuses on the ability of the LLM to learn an underlying pattern or rule from the data.
In the second step, SolverLearner uses an external code interpreter to execute the proposed function on the up-to-date test data. This separation ensures that the LLM is not involved in applying the function, preventing its deductive reasoning ability from influencing the evaluation of its inductive reasoning.

“By focusing on inductive reasoning and setting aside LLM-based deductive reasoning, we can isolate and study LLM inductive reasoning in its pure form using SolverLearner,” the researchers write.
LLM students demonstrate contrasting strengths in inductive and deductive reasoning
The researchers used SolverLearner to assess the inductive and deductive reasoning abilities of GPT-3.5 and GPT-4 on a variety of tasks, including syntactic reasoning, arithmetic, and spatial reasoning.
The results showed that both law school graduates demonstrated remarkable inductive reasoning skills, achieving near-perfect accuracy on tasks that required them to learn from examples and reason about the underlying mapping function.
However, LLMs had problems when they had to apply specific rules or instructions, especially when those instructions involved scenarios that they had not commonly encountered in their training. This is especially true for “counterfactual” reasoning tasks that differ from conventional cases. For example, LLMs performed well on deductive reasoning involving decimal arithmetic but very poorly on unconventional number systems such as 11 and 9.
The results suggest that LLMs may be better at learning from examples and discovering patterns in data than at following explicit instructions. This has essential implications for the exploit of LLMs in real-life scenarios. While at first glance LLMs may show impressive abilities to follow logical instructions, it is likely that they are simply following patterns they have observed during training, meaning that their performance will deteriorate as soon as the examples they see deviate from the training distribution.
On the other hand, SolverLearner provides a framework that ensures that the model learns the correct rules that map inputs to outputs. However, SolverLearner is only applicable in settings where a verification mechanism, such as a code interpreter, is available.
This study is a painful reminder that we still have a lot to learn about the capabilities of black boxes as they become part of an increasing number of applications.
