Saturday, April 19, 2025

Animated Language Understanding: Adapting to Novel Knowledge in Parametric and Semiparametric Models

Share

Much of the recent success in language models (LMs) has come from a “static paradigm,” where the focus is on improving performance on benchmarks that are built without considering the temporal aspect of the data. For example, answering questions about events that the model could have learned about during training, or evaluating a subsample of text from the same period as the training data. However, our language and knowledge are active and constantly evolving. Therefore, to enable more realistic evaluation of Q&A models for the next performance leap, it is necessary to ensure that they are adaptable and tough when encountering modern and unseen data.

In 2021 we released Mind the Gap: Assessing Temporal Generalization in Neural Language Models and dynamic language modeling tests for WMT and arXiv to facilitate the evaluation of a language model that takes temporal dynamics into account. In this paper, we highlight the problems that current state-of-the-art immense LMs face in the context of temporal generalization and find that knowledge-intensive tokens suffer significant performance losses.

Today we publish two articles and a modern benchmark that further advance research on this topic. StreamingQA: A benchmark for adapting to new knowledge over time in question and answer modelswe explore the question and answer task downstream at our newly proposed reference point, StreamingQA:we want to understand how parametric and search-augmented, semiparametric question-answer models adapt to modern information to answer questions about modern events. In Internet-enhanced language models with a small number of hints to answer open-domain questionswe explore the power of combining a few-shot, immense language model with Google search as the search component. In doing so, we aim to improve the model’s factuality while ensuring that it has access to up-to-date information to answer a diverse set of questions.

StreamingQA: A benchmark for adapting to modern knowledge over time in question and answer models

Knowledge and language comprehension of question-answering (QA) models have been widely studied on unchanging knowledge snapshots such as Wikipedia. To investigate how semiparametric QA models and their underlying parametric LMs adapt to evolving knowledge, we construct a modern large-scale benchmark, StreamingQA, with human-written questions automatically generated on a given day, whose answers are drawn from 14 years of time-stamped news articles (see Figure 2). We show that parametric models can be updated without full retraining, avoiding catastrophic forgetting. For semiparametric models, adding modern articles to the search space enables rapid adaptation, but models with an old-fashioned underlying LM perform worse than models with a retrained LM.

Internet-enhanced language models via open-domain, low-question prompts

Our goal is to leverage the unique capabilities of small-shot language models at scale to overcome some of their challenges with respect to grounding in factual and up-to-date information. Motivated by semiparametric LMs that base their decisions on externally retrieved evidence, we employ small-shot cues to learn to condition LMs on information returned from the web using Google Search, a immense and constantly updated source of knowledge. Our approach does not involve tuning or learning additional parameters, making it applicable to virtually any language model. Indeed, we find that network-conditioned LMs outperform closed-loop models of similar or even larger model size on open-ended questions and answers.

Latest Posts

More News