Anthropic released two recent artificial intelligence models on Tuesday called Claude Fable 5 and Claude Mythos 5, which the company says are more capable than the Mythos Preview model, which was made available in April to a narrow group of technology partners. Anthropic says the initial narrow release was driven by concerns that the model’s capabilities could be exploited by bad actors to develop hacking tools that could surprise defenders.
Anthropic is currently only making Claude Mythos 5 available to a narrow group of industry partners, many of whom have been given access to Mythos Preview, and the company says it is working with the U.S. government on the rollout.
Claude Fable 5, which is being released to the public, uses the same base model as Mythos 5, but at launch, the company said, it will have “guardrails” that will prevent the model from answering many of users’ questions related to cybersecurity, biology and chemistry. Instead, these requests will be redirected to the older AI model, Claude Opus 4.8. If Anthropic suspects that a user is trying to perform distillation – training a smaller AI model based on the response of a larger AI model – on Claude Fable 5, those requests will also be redirected to Claude Opus 4.8, the company says.
In an interview with WIRED, Diane Penn, Anthropic’s director of product management, says the company struggled to support Mythos vulnerability detection and other advanced features prior to the April launch, but testing and user input since then have helped refine the strategy.
“We try to make improvements in a positive way, even if we don’t have a perfect solution [solution] for each use case,” says Penn. “Of all the different approaches, this has proven to be the most feasible and the best. We simply concluded that it was the best choice for users to get the most value out of Fable 5.”
For now, Penn says the protection mechanism is built to err on the side of caution, meaning some user queries may be redirected to a less effective AI model, even if they are benign. Anthropic hopes to improve its classifiers over time, but Penn says this was the only safe and sound way the company could currently make the model available on a wide scale.
The company said Tuesday that in addition to offering Claude Mythos 5 to Glasswing project partners, it is also providing access to “selected biology researchers.” Additionally, Anthropic noted in its blog post on Tuesday’s launch that it is providing unlimited versions to these diminutive groups of customers “until our Trusted Access program is available,” hinting at future plans to expand access even further. Since the April launch of the Mythos, Anthropic has repeatedly emphasized that eventually its competitors in both private and open scale spaces will inevitably also offer models with Mythos-level capabilities.
The ability of Claude Mythos and other recent AI models to design hacking tools that can find and exploit vulnerabilities in both recent and older software has forced technology companies and governments around the world to harden their software defenses before this level of AI models become widely available to attackers. Anthropic first made Mythos available to industry partners as part of a consortium called Project Glasswing, with the understanding that it could give members a head start in preparing their own systems and considering global solutions to the problem before wider release.
Anthropic wrote in: update on the Glasswing project last week: “We are working as quickly as possible to safely make Mythos-level features available to the general public. To do this, we will need very robust safeguards to prevent misuse of the model’s cyber capabilities – safeguards that we (and, to our knowledge, all other AI developers) have not yet developed.”
