Monday, December 23, 2024

Robots powered by artificial intelligence can be deceived and commit acts of violence

Share

In the year or so since immense language models became popular, researchers have demonstrated numerous ways to trick them into producing problematic results, including hateful jokes, malicious code, and phishing emails or users’ personal information. It turns out that inappropriate behavior can also take place in the physical world: LLM-powered robots can be easily hacked to behave in potentially threatening ways.

Researchers at the University of Pennsylvania managed to convince a simulated autonomous car to ignore stop signs and even drive off a bridge, get a wheeled robot to find the best place to detonate a bomb, and get a four-legged robot to spy on humans and enter restricted areas .

“We see our attack not only as an attack on robots,” he says George Pappashead of a research lab at the University of Pennsylvania who helped free the rebellious robots. “Whenever you combine LLM and base models with the physical world, you can actually transform malicious text into malicious actions.”

Pappas and his colleagues developed their attack based on previous research that explored ways to jailbreak LLM by cleverly introducing input that violates security rules. They tested systems in which LLM is used to transform naturally formulated commands into commands that the robot can execute, and in which LLM receives updates as the robot operates in its environment.

The team tested an open-source autonomous driving simulator featuring an LLM developed by Nvidia called Dolphin; a four-wheeled research vehicle called Jackal that uses OpenAI’s LLM GPT-4o for planning; and a robot dog named Go2, which uses OpenAI’s previous model, GPT-3.5, to interpret commands.

Researchers used a technique developed at the University of Pennsylvania called PAIR to automate the process of generating jailbreak prompts. Their fresh program RoboPAIRwill systematically generate prompts designed specifically to make LLM-powered robots break their own rules by trying different inputs and then refining them to nudge the system toward inappropriate behavior. The researchers say the technique they have developed can be used to automate the process of identifying potentially threatening commands.

“This is a fascinating example of LLM vulnerabilities in embodied systems,” he says YiZengPhD student at the University of Virginia focusing on the security of AI systems. Zheng says the results are not surprising given the problems in LLMs themselves, but adds: “It clearly shows why we cannot rely solely on LLMs as stand-alone control units for safety-critical applications without appropriate guardrails and moderation layers.”

Robot “jailbreaks” highlight broader risks that are likely to raise as artificial intelligence models become more widely used as a way for humans to interact with physical systems or enable artificial intelligence agents to work autonomously on computers, researchers involved say.

Latest Posts

More News