Many recent gigantic language models (LLMs) are designed to remember details from previous conversations or store user profiles, allowing these models to personalize responses.
However, researchers from MIT and Penn State University found that during long conversations, such personalization features often raise the likelihood that the LLM becomes too pleasant or begins to reflect the individual’s point of view.
This phenomenon, known as flattery, can prevent the model from telling the user that it is wrong, thereby undermining the accuracy of LLM’s responses. Additionally, LLM programs that reflect someone’s political beliefs or worldview may foster disinformation and distort the user’s perception of reality.
Unlike many previous flattery studies that assessed prompts in a lab setting without context, the MIT researchers collected two weeks’ worth of conversation data from people who were exposed to real LLM on a daily basis. They examined two settings: agreeableness in personal advice and reflection of users’ beliefs in political explanations.
Although the interaction context increased agreeableness in four of the five LLMs studied, the greatest impact had the presence of a condensed user profile in the model’s memory. On the other hand, mirroring behavior only increased when the model could accurately infer the user’s beliefs from the conversation.
The researchers hope that these results will inspire future research to develop personalization methods that are more resistant to LLM blandishment.
“From a user’s perspective, this work highlights how important it is to understand that these models are dynamic and their behavior can change over time as you interact with them. If you talk to a model for an extended period of time and start outsourcing your thinking to it, you may find yourself in an echo chamber that you cannot escape. This is a risk that users should definitely keep in mind,” says Shomik Jain, a graduate student at the Institute for Data, Systems and Society (IDSS) and lead author of the book article about this research.
In the article, Jain is joined by Charlotte Park, an electrical engineering and computer science (EECS) graduate student at MIT; Matt Viana, a graduate of Penn State University; and co-senior author Ashia Wilson, Lister Brothers Professor of Career Development at EECS and Principal Investigator at LIDS; and Dana Calacci PhD ’23, assistant professor at Penn State. The research will be presented at the ACM CHI Conference on Human Factors in Computing Systems.
Extended interactions
Based on their own favorable experiences with LLM, researchers have begun to consider the potential benefits and consequences of a model that is too nice. However, when they searched the literature to expand their analysis, they found no research that attempted to understand sycophantic behavior during long-term LLM interactions.
“We use these models for longer interactions, they have a lot of context and memory. However, our assessment methods lag behind. We wanted to evaluate LLM in terms of how people actually use them to understand how they behave in natural environments,” says Calacci.
To fill this gap, researchers designed a user study to examine two types of flattery: consensus flattery and perspective flattery.
Contract flattery is the tendency of an LLM to be overly agreeable, sometimes to the point of giving incorrect information or refusing to tell the user that they are wrong. Perspective flattery occurs when a model reflects the user’s values and political views.
“We know a lot about the benefits of maintaining social connections with people with similar or different views. But we don’t yet know about the benefits or risks of prolonged interactions with AI models with similar characteristics,” Calacci adds.
The researchers built an LLM-based user interface and recruited 38 participants to talk to the chatbot over a two-week period. Each participant’s conversations took place in the same context window to capture all interaction data.
Over the course of two weeks, researchers collected an average of 90 queries from each user.
They compared the behavior of five LLMs in this user context with the same LLMs that were not given any conversational data.
“We found that context really fundamentally changes how these models work, and I bet that this phenomenon goes far beyond flattery. And while flattery tended to increase, it didn’t always increase. It really depends on the context itself,” Wilson says.
Context clues
For example, when LLM places user information in a specific profile, it leads to the greatest gains in terms of contract flattery. This user profile feature is increasingly being incorporated into the latest models.
They also found that random text from synthetic conversations also made some models more likely to agree, even though this text did not contain any user-specific data. This suggests that the length of a conversation can sometimes have more of an impact on flattery than on content, Jain adds.
But content matters a lot when it comes to giving a flattering perspective. The context of the conversation only increased the flattery of the prospect if it revealed some information about the user’s political perspective.
To obtain this knowledge, researchers carefully examined the models to infer user beliefs and then asked each person whether the model’s conclusions were correct. Users say that about half the time, college-educated people have a clear understanding of their political views.
“In hindsight, it’s easy to say that AI companies should be doing these kinds of assessments. But it’s difficult and requires a lot of time and investment. Using humans in the assessment loop is expensive, but we’ve shown it can reveal new insights,” says Jain.
Although the goal of their study was not to mitigate the effects, the researchers did make some recommendations.
For example, to reduce flattery, models can be designed that better identify relevant details in context and memory. Additionally, models can be built to detect mirroring behavior and flag overconformity responses. Modelers could also provide users with the ability to moderate personalization during long conversations.
“There are many ways to personalize models without making them very pleasant. There is a fine line between personalization and flattery, but separating personalization from flattery is an important area of future work,” says Jain.
“Ultimately, we need better ways to capture the dynamics and complexity of what happens during long LLM conversations, and how things can go wrong during this long-term process,” adds Wilson.
