You can now sound the alarm when the AI is behaving badly

Share

Writing AI Lab every week means that I sometimes encounter AI models that behave badly and strangely. There’s usually nothing we can do about it except share these stories with you. But that may change soon.

A group of artificial intelligence researchers launched crowdsourcing websiteFault Reporting for Artificial Intelligence (FLARE-AI), enabling reporting and tracking of damage caused by AI. For example, if a chatbot generates malware or a bomb recipe, reveals personal information, or triggers delusional thinking in users, FLARE-AI can be used to sound the alarm. The system’s open source code allows others to verify the problem and submit reports to modelers, as well as organizations such as MITRE, a nonprofit that tracks problems with technical systems. It’s a bit like Downdetector, which compiles real-time user reports of global service outages affecting apps and websites.

The website is the next step in the group’s ongoing work on AI reporting, which I first wrote about last year. Group members also consulted on: congressional bill announced in Junewhich would mean the US government would take a central role in tracking this type of AI misbehavior.

“Right now, there is no centralized, accountable way to report flaws in AI systems,” says Avijit Ghosh, an AI policy researcher at HuggingFace, who co-led development of FLARE-AI with computer scientists Elaine Zhu AND Shayne Longpre.

The alarm system was developed in cooperation with 49 AI experts from 32 different organizations. IN paper presenting the work, the scientists argue that their initiative could prove crucial as artificial intelligence becomes more widely used and the power of agentic systems becomes more powerful. They believe that a major problem is the lack of a consistent way to report AI flaws.

“I think it’s a really good initiative,” says Jessica Ji, a researcher at the Center for Security and Emerging Technology. Ji says researchers rightly note that existing reporting mechanisms are fragmented and that AI models are black boxes. “I support anything that makes AI more transparent,” he says.

While cybersecurity bugs and issues have received a lot of attention – especially recently – Ghosh tells me that problems with AI systems include topics such as psychological harm, discrimination or bias, and disinformation. He adds that different companies have different standards for such issues, which means some problems go unnoticed. “In the absence of a coordinated disclosure regime, there are no external mechanisms to enforce transparency,” says Ghosh.

A series of recent incidents involving popular artificial intelligence tools show how easily technology can break.

This week, a company called LayerX revealed the way to trick AI-enabled web browsers, including OpenAI’s Atlas and Perplexity’s Comet, into bypassing their security barriers. For example, convincing an AI model behind a browser that it is playing a game could cause the browser to go rogue and attempt to hack the website. (The companies responsible for the affected browsers have fixed the problem, LayerX says.) In April this year, Johann Rehberger, a security researcher, discovered way to cheat Claude to disclose personal information using images generated by ChatGTP.

Artificial intelligence also introduces strange up-to-date problems. Last year, OpenAI was forced to do this update your models after discovering that they were overly sycophantic, which sometimes seemed to encourage delusional thinking.

Rumman Chowdhury, CEO and founder of Humane Intelligence PBC, says FLARE-AI could be a useful way for many AI developers to implement ways to report issues using their tools. However, he adds that such initiatives often come with significant challenges.

The AI Sckool

Categories

You can now sound the alarm when the AI is behaving badly

The explosion destroyed the Anduril rocket engine test site in Mississippi

Penalties: Does the team that kicks first have a better chance of winning?

3 questions: Beyond data-driven aesthetics

Almost anyone can now sell you GLP-1 on the Internet

7 Real Python Projects You Can Build in 2026 (with Guides)

More News

The explosion destroyed the Anduril rocket engine test site in Mississippi

Almost anyone can now sell you GLP-1 on the Internet

Trump Administration Lifts Export Controls on Anthropic’s Mythos and Fable AI Models

Trucks full of Tesla batteries are constantly stolen before they even leave the factory

The explosion destroyed the Anduril rocket engine test site in Mississippi

Penalties: Does the team that kicks first have a better chance of winning?

3 questions: Beyond data-driven aesthetics

Categories

You can now sound the alarm when the AI ​​is behaving badly

More News

You can now sound the alarm when the AI is behaving badly