Socrates once said, “What really counts is not the size of a thing, but its quality. For true value is found in the nature of a substance, not in its volume.”
Does size always matter in immense language models (LLM)? In a technology landscape where LLMs are taking center stage, a team of researchers from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that smaller models should not be overlooked, especially for natural language understanding products widely used in the industry.
To this end, researchers have developed an approach to long-standing inefficiency and privacy issues associated with immense, text-based artificial intelligence models – a logic-aware model that outperforms 500 times larger counterparts on some language understanding tasks with hands-free generated annotations while maintaining privacy and reliability at high efficiency.
LLMs, which have shown some promise in language, graphics and code generation, are computationally high-priced and their data requirements may pose a risk of privacy leakage when using application programming interfaces to transfer data. Smaller models have historically been less effective, especially for multitasking and lightly supervised tasks, compared to their larger counterparts.
So what makes these smaller models perform so powerfully? Something called “textual implication” is intended to aid these models understand a variety of linguistic tasks, where if one sentence (premise) is true, then the other sentence (hypothesis) is also likely to be true. For example, if the assumption is “all cats have tails”, then this assumption would result in the hypothesis “the tabby cat has a tail”. This concept is used to train an “engagement model,” which the team has shown to be less biased than other language models, according to the team’s previous research. They then created “hints” that the models could exploit to determine whether a given sentence or phrase contained specific information, according to different tasks. This method improved the model’s ability to adapt to various tasks without additional training, called null adaptation.
In the field of “natural language understanding” there are various applications that rely on determining the relationship between two pieces of text. For example, in sentiment classification, the statement “I think the movie is good” can be inferred or implied from a review of the movie that reads, “I like the story and the great acting,” which indicates positive sentiment. Another method is news classification, where the topic of an article can be inferred from its content. For example, the statement “news article is about sports” may appear if the main content of the article is about the NBA game. The key insight was that many existing natural language understanding tasks can be transformed into an implication task (i.e., logical inference in natural language).
“Our research aims to improve the ability of computer programs to understand and process natural language, the way people speak and write. Our self-learning entailment models with 350 million parameters, without human-generated labels, outperform supervised language models with 137 to 175 billion parameters,” says MIT CSAIL postdoc Hongyin Luo, lead author of the study new article about the study. “This has the potential to change the artificial intelligence and machine learning landscape by providing a more scalable, trusted and cost-effective language modeling solution,” Luo says. “By proving that smaller models can perform at the same level of language understanding as larger ones, this work paves the way for more sustainable and privacy-preserving AI technologies.”
The team found that they could further improve the model’s performance by using a technique called “self-learning,” in which the model uses its own predictions to learn, effectively learning without human supervision or additional annotated training data. The machine learning method significantly improved performance on a number of downstream tasks, including sentiment analysis, question answering, and news classification. It outperformed both LaMDA and Google’s FLAN in terms of zero-shot capabilities, GPT models, and other supervised algorithms.
However, one of the challenges with self-training is that the model can sometimes produce incorrect or cacophonous labels that harm performance. To address this, a recent algorithm called “SimPLE” (Basic Pseudo-Label Editing) was developed, a process for reviewing and modifying the pseudo-labels created in the initial rounds of learning. By correcting any mislabeled cases, the overall quality of the self-generated labels improved. This made the models not only more effective at understanding language, but also more tough in the face of contrasting data.
As with most research, there are some limitations. Self-training on multiclass classification tasks did not perform as well as on binary natural language understanding tasks, highlighting the challenge of applying consistency models to multiple-choice tasks.
“This study demonstrates an effective and efficient way to train large language models (LLMs) by framing natural language understanding tasks as context-aware problems and using a pseudo-labeling machine learning engine to incorporate large amounts of unlabeled text data in the training process,” adds James Glass, senior research fellow at CSAIL who also authored the paper. “Although the LLM field is undergoing rapid and dramatic change, this study shows that it is possible to create relatively compact language models that perform very well on test comprehension tasks compared to peers of roughly the same size or even much larger language models. “
“The entailment task is a popular way to assess an AI model’s understanding of a given context,” says Leonid Karlinsky, a research associate at the MIT-IBM Watson AI Lab. “It is used in many areas, analyzing unimodal models such as LLM and multimodal models such as VLM [visual language models]input data, simplifying the task of answering questions about a given input context to a binary classification problem – does this context entail a particular conclusion (e.g. textual) or not? This article makes two contributions to this space. First, it proposes a way to improve NLU performance from scratch (without additional tuning) and resistance to adversarial attacks by tuning with synthetic (specialized) pull tasks generated for the original NLU task. Second, it offers a self-supervised SimPLE method, incorporating pseudo-labeling and confidence-based filtering, to further improve the NLU performance of large LLMs.”
Luo and Glass wrote the paper with Yoon Kim, a CSAIL member and assistant professor in MIT’s Department of Electrical Engineering and Computer Science, and Jiaxin Ge of Peking University. Their work will be presented this July at the Association for Computational Linguistics meeting in Toronto, Ontario. This research was supported by a grant from the Hong Kong Innovation AI program.