Anthropic has a plan to stop its artificial intelligence from building a nuclear weapon. Will it work?

Share

At the end August, artificial intelligence company Anthropic announced that his chatbot Claude will not assist anyone build nuclear weapons. According to Anthropic, it has partnered with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to ensure Claude does not reveal nuclear secrets.

Producing nuclear weapons is both an exact science and a solved problem. Much of the information about America’s most advanced nuclear weapons is top secret, but the original nuclear science is 80 years senior. North Korea proved that a dedicated country interested in obtaining the bomb could do it, and it didn’t need the assist of a chatbot.

How exactly did the U.S. government work with an artificial intelligence company to ensure the chatbot didn’t reveal sensitive nuclear secrets? Also: Was there ever a danger that a chatbot would assist someone build a nuclear weapon?

The answer to the first question is that he used Amazon. The answer to the second question is complicated.

Amazon Web Services (AWS) offers. Top secret cloud services government clients where they may store sensitive and classified information. The DOE already had several such servers when it started working with Anthropic.

“We deployed the then-border version of Claude in a top-secret environment so that NNSA could systematically test whether AI models could create or exacerbate nuclear risks,” Marina Favaro, who oversees national security and partnerships policy at Anthropic, tells WIRED. “NNSA has since been working with additional Claude models in a secure cloud environment and providing us with feedback.”

NNSA’s red-teaming process of vulnerability testing helped Anthropic and American nuclear scientists develop a proactive solution for chatbot-assisted nuclear programs. Together, they “developed a nuclear classifier that can be thought of as a sophisticated filter for AI conversations,” Favaro says. “We created it using the NNSA’s list of nuclear risk indicators, specific topics and technical details that help us determine when the conversation might veer into harmful territory. The list itself is audited but not secret, which is crucial because it means our technical staff and other companies can implement it.”

Favaro says it took months of refinement and testing to get the classifier up and running. “It picks up conversations about conversations without flagging legitimate discussions about nuclear energy or medical isotopes,” he says.

The AI Sckool

Categories

Anthropic has a plan to stop its artificial intelligence from building a nuclear weapon. Will it work?

Britain’s answer to Darpa wants to reprogram the human brain

OpenAI really wants Codex to stop talking about goblins

Elon Musk Testifies He Launched OpenAI to Prevent ‘Terminator Outcome’

‘It’s undignified’: Hundreds of workers training Meta’s artificial intelligence could be fired

Local transcription of the whisper sound

More News

OpenAI really wants Codex to stop talking about goblins

Elon Musk Testifies He Launched OpenAI to Prevent ‘Terminator Outcome’

‘It’s undignified’: Hundreds of workers training Meta’s artificial intelligence could be fired

The United Arab Emirates is leaving OPEC after almost 60 years

Britain’s answer to Darpa wants to reprogram the human brain

OpenAI really wants Codex to stop talking about goblins

Elon Musk Testifies He Launched OpenAI to Prevent ‘Terminator Outcome’