Join the event trusted by corporate leaders for almost two decades. VB Transforma connects people building AI Real Enterprise. Learn more
Anthropic Dario Amodei CEO created Urgent pushing In April, for the need to understand how AI models think.
This is crucial. As anthropic battle In global AI rankings, it is critical to notice what distinguishes him from the other best AI laboratories. From its foundation in 2021, when seven Openai employees ripped As worried about AI security, Anthropic has built AI models that adjacent to a set of human rules, the system they call Constitutional AI. These rules ensure that the models are “Helpful, honest and harmless“And generally act in the best interest of society. At the same time, Anthropica research arm dives deeply to understand how his models think about the world and Why They give helpful (and sometimes harmful) answers.
The flagship model of Anthropica, Sonnet Claude 3.7, dominated coding tests after starting in February, proving that AI models can stand out both at efficiency and safety. And the recent version of Claude 4.0 Opus and Sonnet again puts Claude on Top of coding benchmarks. However, on today’s brisk and hyperconal artificial intelligence market, anthropic rivals such as Google’s Gemini 2.5 Pro and Open Ai’s O3 have their own impressive coding shows, while they are It is already dominating Claude in mathematics, innovative writing and general reasoning in many languages.
If Amodea’s thoughts are any clue, Anthropic plans the future of artificial intelligence and its implications in critical fields, such as medicine, psychology and law in which model and human security are necessary. And he shows: Anthropic is the leading AI laboratory, which focuses strictly on the development of “interpretative” artificial intelligence, which are a model that allows us to understand what to some degree thinks what he thinks and how a specific conclusion came to.
Amazon and Google have already invested billions of dollars in anthropic, even when they are building their own AI models, so maybe the competitive advantage of anthropics is still beginner. Interpretation models, as anthropic suggests, can significantly reduce the long -term operating costs associated with debugging, audit and risk mitigating in convoluted AI implementation.
Says kapoorAI safety researcher suggests that although the interpretation is valuable, it is only one of many AI risk management tools. In his opinion, “interpretation is neither necessary nor sufficient” to ensure the sheltered behavior of models-this is the most critical in combination with filters, verifiers and a project focused on man. This more comprehensive view perceives interpretation as part of a larger control strategy ecosystem, especially in AI implementation in reality, in which models are components in wider decision -making systems.
AI interpretation is needed
Until recently, many thought that AI was still years of progress, such as those that now facilitate Claude, Gemini and Chatgpt boast Unique market reception. While these models are already pushing the limits of human knowledge, their widespread utilize can be assigned how good they are in solving a wide range of practical problems that require innovative analysis of problem solving or detailed analysis. Because models have a task about more and more critical problems, it is critical to give true answers.
Amodei is afraid that when AI responds to the prompting: “We have no idea … why he chooses certain words for others or why he sometimes makes a mistake, even though he is usually accurate.” Such errors – hallucinations of faulty information or answers that are not consistent with human values - will stop AI models due to achieving full potential. Indeed, we saw many examples of AI, which he is still fighting Hallucinations AND unethical behavior.
For Amodea, the best way to solve these problems is to understand how Ai thinks: “Our inability to understand the internal mechanisms of models means that we cannot significantly predict such [harmful] Behaviors, and therefore try to exclude them … If instead you could look into the models, we could systematically block all Jailbreak, and to characterize what dangerous knowledge models have. “
Amodei also perceives the coverage of current models as a barrier in the distribution of AI models in “financial or high -pond security, because we cannot fully determine the limits of their behavior, and a small number of errors can be very harmful.” In making decisions that affect people directly, such as medical diagnosis or mortgage assessments, legal statute Require artificial intelligence to explain your decisions.
Imagine a financial institution using a enormous language model (LLM) to detect fraud – interpretation may mean an explanation of the rejected loan application to the client in accordance with the requirements of the law. Or a production company optimizing supply chains – understanding why AI suggests that a specific supplier can unlock performance and prevent an unpredictable bottleneck.
For this reason, Amodei explains: “Anthropic doubles the interpretation, and we are aimed at obtaining” interpretation, can reliably detect most model problems “by 2027”
To this end, Anthropic recently participated in $ 50 million investment IN GoodfireAI research laboratory makes breakthrough progress in AI “brain scanning”. Their model control platform, Ember, is an agnostic tool that identifies learned concepts in models and allows users to manipulate them. In the last demonstrationThe company showed how Ember can recognize individual visual concepts within AI of the image generation, and then allow users paint These concepts on canvas to generate modern images that follow the user’s design.
The Anthropiku investment in Embera indicates that the development of interpretable models is hard enough for the Anthropian not to have a labor force to achieve an interpretation himself. Innovative interpretation models require modern tools and qualified programmers to build them
Wider context: AI researcher perspective
To break the perspective of Amodea and add a very needed context, Venturebeat conducted an interview with Kapoor and AI security researcher at Princeton. Kapoor is the co -author of the book AI snake oilA critical study of exaggerated claims related to the possibilities of leading AI models. He is also a co -author of “And as normal technology“In which he is in favor of treating artificial intelligence as a standard, transformational tool, such as internet or electricity, and promotes the realistic perspective of integration with everyday systems.
Kapoor does not question that the interpretation is valuable. However, he is skeptical about treating him as the central pillar of AI equalization. “This is not a silver ball,” said Kapoor Venturebeat. He said that many of the most effective safety techniques, such as filtering after answers, does not require opening the model.
He also warns about what researchers call “an error of insufficiency” – the idea that if we do not fully understand the internal system, we cannot utilize or regulate it. In practice, full transparency is not as most technologies are assessed. What matters is whether the system works reliably in real conditions.
This is not the first time Amodei warned about the risk of overtaking our understanding. In his October 2024 post“The machine loving grace”, he sketched the vision of more and more talented models that could take significant actions in the real world (and maybe double our lives).
According to Kapoor, there is an critical difference between the model capacity And this power. The possibilities of the model undoubtedly grow rapidly and can soon develop sufficient intelligence to find solutions for many convoluted problems questioning humanity. But the model is as powerful as interfaces that we provide it with interaction with the real world, including where and how models are implemented.
Amodei separately argued that the United States should keep the leader in the development of AI, partly by Export controls This limitation of access to powerful models. The point is that authoritarian governments can utilize Frontier AI systems irresponsibly – or take over the geopolitical and economic advantage that is associated with their implementation.
For Kapoor, “even the greatest supporters of export control agree that he will give us at most a year or two.” He believes that we should treat artificial intelligence as “Normal technology“Like electricity or internet. Although the revolutionary took decades to make both technologies fully realized in the whole society. Kapoor thinks that the same for artificial intelligence: the best way to maintain a geopolitical advantage is to focus on the” long game “about the transformation of industries to effectively utilize AI.
Others criticize Amodea
Kapoor is not the only criticism of the Amodea position. Last week in Vivatech in Paris, Jansen Huang, CEO Nvidia, He declared his misunderstanding with amodea views. Huang asked if the permission to develop AI should be confined to several powerful entities, such as anthropics. He said: “If you want everything to be done safely and responsibly, you do it in the open … Don’t do it in a dark room and tell me it’s safe.”
In response anthropic It was found: “Dario never claimed that” only anthropic “could build a safe and powerful artificial intelligence. As the public record will show, Dario was in favor of the national standard of transparency for AI programmers (including anthropic) so that societies and decision -makers should be aware of the possibilities and risk of models and can prepare properly. “
It is also worth noting that Anthropic is not alone in the pursuit of interpretation: Google deep interpretation team, led by Neel Nandy serious contribution for researching the ability to interpret.
Ultimately, the best AI laboratories and scientists provide forceful evidence that interpretation can be a key distinction on the AI competitive market. Enterprises that give priority early interpretation can gain a significant competitive advantage by building more trusted, compatible and adapted AI systems.
