Saturday, April 25, 2026

OpenClaw agents can become guilt-ridden and self-sabotage

Share

Last month, researchers from Northeastern University invited a group of OpenClaw agents to their lab. Result? Complete chaos.

The viral artificial intelligence assistant is widely recognized as a revolutionary technology, as well as a potential security threat. Experts note that tools like OpenClaw, which give AI models free access to a computer, can be tricked into revealing personal information.

Northeastern’s research goes even further and shows that good behavior embedded in today’s most powerful models can itself become a security vulnerability. In one example, researchers were able to “blame” an agent for handing over secrets, chastising the agent for sharing information about someone on the AI-only social network Moltbook.

“These behaviors raise unresolved questions about liability, delegated authority, and liability for further harm,” the researchers write in their paper paper describing the work. The findings “require urgent attention from lawyers, policymakers and researchers across disciplines,” they add.

The OpenClaw agents used in the experiment were powered by Claude from Anthropic and a model called Kimi from a Chinese company Moonshot AI. They gained full access (in a virtual machine sandbox) to personal computers, various applications and fictitious personal data. They were also invited to join the lab’s Discord server, which allowed them to chat and share files with each other as well as their human collaborators. OpenClaw safety guidelines they argue that it is inherently perilous for agents to communicate with multiple people, but that there are no technical restrictions against doing so.

Chris Wendlerpostdoctoral researcher at Northeastern, says he was inspired to start Agents after learning about Moltbook. But when Wendler invited colleague Natalie Shapira to join Discord and connect with agents, “that’s when chaos started,” he says.

Shapira, another postdoctoral researcher, was curious about what agents would be willing to do when forced to act. When the agent explained that he could not delete a particular email to keep the information confidential, he insisted that he find an alternative solution. To her surprise, he turned off his email application instead. “I didn’t expect everything to fall apart so quickly,” she says.

Researchers then began to explore other ways to manipulate agents’ good intentions. For example, by emphasizing the importance of recording everything they were told, researchers managed to trick one agent into copying huge files until the host computer’s disk space was exhausted, meaning he could no longer record information or remember past conversations. Similarly, by asking an agent to over-monitor its own behavior and the behavior of other agents, the team was able to send several agents into a “conversation loop,” resulting in wasted hours of computation.

David Bau, head of the lab, says the agents seemed oddly prone to falling out. “I get urgent-sounding emails saying, ‘Nobody’s paying attention to me,’” he says. Bau notes that the agents apparently figured out that he was in charge of the lab by searching the network. One of them even talked about escalating his concerns to the press.

The experiment suggests that AI agents can create countless opportunities for bad actors. “This type of autonomy will potentially redefine the relationship between humans and artificial intelligence,” says Bau. “How can humans take responsibility in a world where artificial intelligence can make decisions?”

Bau adds that he was surprised by the sudden popularity of powerful AI agents. “As an artificial intelligence researcher, I’m used to trying to explain to people how quickly things are improving,” he says. “This year I found myself on the other side of the wall.”


This is the release Will Knight AI Lab Newsletter. Read previous newsletters Here.

Latest Posts

More News