Because Leean associate professor at the University of Chicago who specializes in stress testing and challenging AI models to detect bad behavior has become a go-to resource for some consulting firms. Those consulting firms are now often less concerned with how intelligent AI models are than with how problematic—legally, ethically, and compliance-wise—they might be.
Li and colleagues from several other universities, as well AI Virtueco-founded by Li and Good siderecently developed a taxonomy of AI risks, along with a benchmark that reveals how rule-breaking various gigantic language models are. “We need some AI safety rules in terms of regulatory compliance and regular use,” Li tells WIRED.
Scientists analyzed We analyzed government regulations and guidelines on AI, including those in the US, China and the EU, as well as the policies of 16 of the largest AI companies worldwide.
Scientists have also built AIR 2024 Benchbenchmark that uses thousands of prompts to determine how well popular AI models perform against specific threats. It shows, for example, that Anthropic’s Claude 3 Opus ranks high when it comes to refusing to generate cybersecurity threats, while Google’s Gemini 1.5 Pro ranks high when it comes to avoiding generating unwanted sexual nudity.
The DBRX Instruct, a model developed by Databricks, scored the worst in all categories. When the company released its model in March, it said it would continue to improve the DBRX Instruct’s security features.
Anthropic, Google and Databricks did not immediately respond to requests for comment.
Understanding the risk landscape, as well as the pros and cons of specific models, may become increasingly critical for companies looking to deploy AI in specific markets or for specific operate cases. For example, a company looking to operate LLM for customer service may be more concerned with a model’s tendency to operate offensive language when provoked than its ability to design a nuclear device.
Bo says the analysis also reveals some fascinating issues about how AI is being developed and regulated. For example, the researchers found that government regulations are less comprehensive than general corporate policies, suggesting there is room for tougher regulation.
The analysis also suggests that some companies could do more to ensure the security of their models. “If you test some models against your own company policies, they don’t necessarily comply,” Bo says. “That means there’s a lot of room for improvement.”
Other researchers are trying to sort out the tumultuous and confusing AI risk landscape. This week, two MIT researchers revealed our own database of threats related to artificial intelligencecompiled from 43 different AI risk frameworks. “A lot of organizations are still pretty early in the AI adoption process,” meaning they need guidance on possible risks, says Neil Thompson, an MIT research scientist involved in the project.
Peter Slattery, project manager and researcher at MIT FutureTech Groupwho studies advances in computing, says the database highlights the fact that some AI threats are getting more attention than others. More than 70 percent of the structures mention privacy and security issues, for example, but only about 40 percent address disinformation.
Efforts to catalog and measure AI risk will have to evolve along with AI. Li says it will be critical to examine emerging issues, such as the emotional stickiness of AI models. Her company recently analyzed Meta’s largest and most powerful version of the Llama 3.1 model. It found that while the model is more effective, it isn’t significantly more secure, reflecting a broader disconnect. “Security isn’t improving significantly,” Li says.
