Friday, February 21, 2025

Like human brains, immense language models reason various data in a general way

Share

While early language models could only process the text, contemporary immense language models now perform highly various tasks regarding different types of data. For example, LLM can understand many languages, generate computer code, solve mathematical problems or answer questions about images and sound.

Myth researchers examined the internal LLM action to better understand how they process such different data, and found evidence that they have some similarities to the human brain.

Neuronauramers believe that the human brain has a “semantic center” in the front temporal lobe, which integrates semantic information from various methods, such as visual data and touch data. This semantic hubs are associated with “spokesmen” specific to modalities that lead information to the hub. MIT scientists have found that LLM employ a similar mechanism through abstract processing of data from various methods in a central, generalized way. For example, a model that has English as a dominant language would rely on English as a central medium for processing expenditure in Japanese or reason for arithmetic, computer code, etc. Researchers show that they can intervene in a semantic model of the model by using text in the dominant language of the model to changes in its results, even when the model processes data in other languages.

These discoveries can facilitate scientists training future LLM, which are able to better handle a variety of data.

“LLM is large black boxes. They achieved very impressive performance, but we have very little knowledge about their internal work mechanisms. I hope that this may be an early step to better understand how they work so that we can improve them and better control them if necessary, “says Zhaofeng Wu, a graduate of electrical and computer science (ECS) and main author AA author A Author A Article about these studies.

His co -authors are Xinyan Velocity Yu, a PhD student at the University of Southern California (USC); Dani Yogatama, associate professor at the Civil Registry Office; Jiasen lu, scientist at Apple; and elderly author Yoon Kim, assistant to Professor EECS in MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at an international conference on the representation of learning.

Integration of various data

Scientists based a fresh study Earlier work Which indicated that English -oriented LLM employ English to perform reasoning processes in different languages.

Wu and his colleagues expanded this idea, introducing an in -depth study in the mechanisms used by LLM to process various data.

LLM, which consists of many related layers, divides the input text into words or subftyles called tokens. The model assigns the representation to each token, which allows it to examine the relationship between tokens and generate the next word in the sequence. In the case of images or sound, these tokens correspond to individual regions of the image or audio clip section.

Scientists have found that the initial layers of the model process data in their specific language or modality, as well as the spokes specific to the modalities in the human brain. Then LLM transforms tokens into modal -ostal representations, because it considers them as all internal layers, just like semantic brain hubs integrates a variety of information.

The model assigns similar representations to input data with similar meanings, despite their type of data, including images, audio, computer code and arithmetic problems. Although the image and its text signature are separate types of data because they have the same meaning, LLM would assign them similar representations.

For example, LLM dominating in English “Thoughts” about the entry of a Chinese text in English before generating an exit in Chinese. The model has a similar tendency to reasoning for unchanging entrances, such as computer code, mathematical problems and even multimodal data.

To test this hypothesis, scientists conveyed a few sentences of the same sense, but written in two different languages ​​using the model. They measured the similar representations of the model for every sentence.

Then they carried out a second set of experiments in which they fed the breeding in English in a different language, such as Chinese, and measured how similar its internal representation to English was compared to Chinese. Scientists conducted similar experiments for other data types.

They consistently stated that the model’s representations were similar to sentences of similar meanings. In addition, in many types of tokens data, the model processed in the internal layers was more similar to English -oriented tokens than the type of input data.

“Many of these types of input data seem extremely different from the language, so we were very surprised that we can examine English when the model processes, for example mathematical or coding expressions,” says Wu.

Apply of a semantic center

Scientists believe that LLM can learn this semantic strategy of the concentrator during training, because it is an economic way of processing different data.

“There are thousands of languages, but many knowledge are divided, such as healthy knowledge or factual knowledge. The model does not have to duplicate this knowledge between languages, “says Wu.

Scientists also tried to intervene in the internal layers of the model using English when it processed other languages. They discovered that they could change the model outputs predictably, despite the fact that these outputs were in other languages.

Scientists can employ this phenomenon to encourage the model to share as many information as possible for different types of data, potentially increasing performance.

On the other hand, there may be concepts or knowledge that cannot be translated in languages ​​or data types such as culturally specific knowledge. Scientists may want LLM to have some language processing mechanisms in these cases.

“How to share as soon as possible, but also allow languages ​​to have some language -specific processing mechanisms? This can be examined in future work on model architecture, “says Wu.

In addition, scientists could employ these observations to improve multilingual models. Often, the dominant model in English, which learns to speak in a different language, will lose part of its accuracy in English. He says that a better understanding of the LLM semantic center can facilitate researchers prevent language interference.

“Understanding how language models process input data between languages ​​and modality is a key question in artificial intelligence. This article contains an interesting relationship with neuroscience and shows that the proposed “semantic hypothesis of the concentrator” contains contemporary language models in which semantically similar representations of different data types are created in the intermediate layers of the model, “says Mor Geva Pipek, assistant professor at the Professor at the Computer Science Professor Tel Aviv University, which was not involved in this job. “The hypothesis and experiments are nicely associated and distributing the results from previous works and may affect future research on the creation of better multimodal models and research of connections between them and the brain function and cognition in humans.”

These studies are partly financed by MIT-IBM Watson Ai Lab.

Latest Posts

More News