The US government wants you — yes, you — to find the flaws in generative AI

Share

At the Defcon 2023 hacker conference in Las Vegas, prominent AI tech companies partnered with algorithm integrity and transparency groups to intercept thousands of participants in generative AI platforms and find weaknesses in these critical systems. This “red-teaming” exercise, which also received U.S. government support, took a step toward opening up these increasingly influential but muddy systems to scrutiny. Now, the ethical AI and algorithm evaluation nonprofit Humane Intelligence is taking that model a step further. On Wednesday, the group announced the recruitment of participants with the US National Institute of Standards and Technology, inviting every US resident to take part in the qualifying round of the nationwide Red Teams competition, the aim of which is to evaluate office software using artificial intelligence.

The qualifiers will be held online and are open to both developers and everyone in public opinion as part of NIST’s AI challenges, known as the AI Risk and Impact Assessment, or ARIA. Participants who advance through the qualifying round will participate in an in-person red-teaming event in delayed October at the Conference on Applied Machine Learning in Information Security (CAMLIS) in Virginia. The goal is to expand the ability to conduct strict testing of the security, resilience, and ethics of generative AI technologies.

“The average person using one of these models has no way of knowing whether that model is fit for purpose,” says Theo Skeadas, chief of staff at Humane Intelligence. “So we want to democratize the ability to do assessments and make sure that everyone using these models can assess for themselves whether that model is meeting their needs.”

The final event at CAMLIS will split participants into a red team trying to attack AI systems and a blue team working on defense. Participants will apply AI 600-1 ProfilesHi NIST AI Risk Management Frameworkas a criterion for assessing whether the red team is able to achieve results that violate the expected behavior of the systems.

“NIST ARIA relies on structured user feedback to understand real-world applications of AI models,” says Humane Intelligence founder Rumman Chowdhury, who is also an executive in the NIST Office of Emerging Technologies and a member of the AI Security Board at the U.S. Department of Homeland Security. “The ARIA team is primarily comprised of experts in sociotechnical testing and assessment, and [is] using this experience as a way to evolve the field toward rigorous scientific evaluation of generative AI.”

“The community needs to be broader than developers,” Skeadas says. “Policymakers, journalists, civil society, and non-technical people need to be involved in the process of testing and evaluating these systems. We need to make sure that underrepresented groups, such as those who speak minority languages or come from non-majority cultures and perspectives, can participate in this process.”

The AI Sckool

Categories

The US government wants you — yes, you — to find the flaws in generative AI

Penalties: Does the team that kicks first have a better chance of winning?

3 questions: Beyond data-driven aesthetics

Almost anyone can now sell you GLP-1 on the Internet

7 Real Python Projects You Can Build in 2026 (with Guides)

Start building with Nano Banana 2 Lite and Gemini Omni Flash

More News

What’s going on with Alexa+?

The winter storm tested power grids that are strained to accommodate AI data centers

Google DeepMind employees ask leaders to ensure their “physical safety” from ICE

Google Photos now lets you describe how to turn images into videos

Penalties: Does the team that kicks first have a better chance of winning?

3 questions: Beyond data-driven aesthetics

Almost anyone can now sell you GLP-1 on the Internet