Wednesday, December 25, 2024

LLM students develop their own understanding of reality as they improve their language skills

Share

Ask a gigantic language model (LLM) like GPT-4 to smell a rainy campsite, and it will politely decline. Ask the same system to describe the smell to you, and it will wax poetic about “the air thick with anticipation” and “a scent that’s both fresh and earthy,” despite having no prior experience with rain or a nose to support it make such observations. One possible explanation for this is that the LLM is simply mimicking the text in its massive training data, rather than working with any real knowledge about rain or smell.

But does the lack of eyes mean that language models can never “understand” that a lion is “bigger” than a housecat? Philosophers and scientists have long considered the ability to make sense of language to be a hallmark of human intelligence—and have wondered what necessary ingredients enable us to do so.

Delving into this conundrum, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have uncovered intriguing results suggesting that language models can develop their own understanding of reality as a way to improve their generative abilities. The team first created a set of diminutive Karel puzzles that involved thinking up instructions to control a robot in a simulated environment. Then they trained the LLMs on the solutions, but without demonstrating how those solutions actually worked. Finally, using a machine-learning technique called “probing,” they peered into the model’s “thought process” as it generated modern solutions.

After training on more than a million random puzzles, they found that the model spontaneously developed its own conception of the underlying simulation, despite never having been exposed to that reality during training. Such findings challenge our intuitions about what kinds of information are necessary for learning linguistic meaning—and whether LLMs might one day understand language at a deeper level than they do today.

“At the beginning of these experiments, the language model was generating random instructions that didn’t work. When we finished training, our language model was generating correct instructions at a rate of 92.4 percent,” says Charles Jin, a doctoral student in electrical engineering and computer science (EECS) at MIT and a CSAIL associate who is the lead author new article about work“That was a really exciting moment for us because we thought, if your language model can do that with that level of accuracy, we can expect it to understand the meanings within the language. That gave us a starting point to investigate whether LLMs actually understand text, and now we see that they can do a lot more than just blindly stitch words together.”

In the head of a law graduate

The probe helped Jin observe this progress firsthand. Its role was to interpret what the LLM thought were instructions, revealing that the LLM had developed its own internal simulation of how the robot would move in response to each instruction. As the model’s ability to solve puzzles improved, those concepts also became more right, indicating that the LLM was beginning to understand the instructions. Soon, the model was consistently putting the pieces together correctly, creating working instructions.

Jin notes that LLM’s understanding of language develops in phases, much like a baby learning to speak in many steps. At first, it’s like a baby’s babbling: repetitive and largely incomprehensible. Then the language model acquires the syntax, or rules, of the language. This allows it to generate instructions that may look like real solutions but still don’t work.

LLM instructions gradually improve, however. Once the model gains meaning, it begins to produce instructions that correctly implement the desired specifications, like a child producing coherent sentences.

Separating Method from Model: “A Bizarre World”

The probe was intended merely to “enter the LLM’s brain,” as Jin characterizes it, but there was a diminutive possibility that it was also doing some of the thinking for the model. The researchers wanted to make sure that their model understood the instructions independently of the probe, rather than having the probe infer the robot’s movements based on the LLM’s understanding of syntax.

“Imagine you have a pile of data that encodes LM’s thought process,” Jin suggests. “The probe is like a forensic analyst: You give that pile of data to the analyst and say, ‘Here’s how the robot moves, now try to find the robot’s movements in the pile of data.’ The analyst later tells you that he knows what’s happening to the robot in the pile of data. But what if the pile of data is actually just encoding the raw instructions, and the analyst has figured out some clever way to extract the instructions and follow them appropriately? Then the language model hasn’t learned what the instructions mean at all.”

To untangle their roles, the researchers reversed the meaning of instructions for the modern probe. In this “Bizarro World,” as Jin calls it, directions like “up” now meant “down” in instructions moving the robot around its grid.

“If the probe translates instructions into robot positions, it should be able to translate the instructions according to the bizarro meanings just as well,” Jin says. “But if the probe actually finds encodings of the original robot movements in the language model’s thought process, it should have difficulty extracting the robot’s bizarro movements from the original thought process.”

As it turned out, the modern probe had translation errors, failing to interpret the language model, which had different meanings for the instructions. This meant that the original semantics were embedded in the language model, indicating that the LLM understood what instructions were needed independently of the original probe classifier.

“This research directly addresses a central question in modern AI: Do the surprising capabilities of large language models simply stem from large-scale statistical correlations, or do large language models develop meaningful understanding of the reality they are supposed to work with? This research suggests that LLM develops an internal model of the simulated reality, even though it was never trained to develop that model,” says Martin Rinard, MIT professor at EECS, CSAIL member, and senior author of the paper.

This experiment further confirmed the team’s analysis that language models can develop deeper understanding of language. Still, Jin admits that their work has a few limitations: They used a very uncomplicated programming language and a relatively diminutive model to derive their insights. In upcoming workswill try to exploit a more general setting. While Jin’s latest research doesn’t describe how to make a language model learn meaning faster, he thinks future work could build on these insights to improve how language models are trained.

“An intriguing open question is whether LLM actually uses its internal model of reality to reason about that reality in solving the robot’s navigation problem,” Rinard says. “Although our results are consistent with LLM using the model in this way, our experiments are not designed to answer that next question.”

“There’s a lot of debate right now about whether LLMs really ‘understand’ language, or whether their success can be attributed to what are essentially tricks and heuristics that come from consuming large amounts of text,” says Ellie Pavlick, an assistant professor of computer science and linguistics at Brown University, who was not involved in the paper. “These are questions that are at the heart of how we build AI and what we expect to be the inherent capabilities or limitations of our technology. This is a good paper that explores this question in a controlled way—the authors exploit the fact that computer code, like natural language, has both syntax and semantics, but unlike natural language, semantics can be directly observed and manipulated for experimental purposes. The experimental design is elegant, and their findings are optimistic, suggesting that LLMs may be able to learn something deeper about what language “means.”

Jin and Rinard’s work was funded in part by a grant from the U.S. Department of Defense Advanced Research Projects Agency (DARPA).

Latest Posts

More News