Anthropic was released on Wednesday revised version of Claude’s Constitutiona living document that provides a “holistic” explanation of “the context in which Claude operates and the kind of being we want Claude to be.” The document was published in conjunction with Anthropic CEO Dario Amodea’s speech at the World Economic Forum in Davos.
Over the years, Anthropic has sought to differentiate itself from its competitors through what it calls “Constitutional artificial intelligence“, a system in which its chatbot Claude is trained using a specific set of ethical principles rather than human feedback. Anthropic was the first to publish these principles – Claude’s Constitution — in 2023. The revised version retains most of the same principles but adds more nuance and detail, including regarding ethics and user safety.
When Claude’s Constitution was first published almost three years ago, Anthropic co-founder Jared Kaplan he described it as an “AI system [that] supervises itself, based on a specific list of constitutional principles.” Anthropic said it is these principles that guide “the model for adopting the normative behavior described in the constitution” and thereby “avoiding toxic or discriminatory effects.” Some preliminary policy note for 2022 more pointedly notes that Anthropic’s system works by training the algorithm using a list of natural language instructions (the aforementioned “rules”), which then constitute what Anthropic calls the software’s “constitution.”
Anthropic has been looking for this for a long time positions itself as an ethical (some might argue boring) alternative other AI companies – such as OpenAI and xAI – that have pursued disruption and controversy more aggressively. To this end, the fresh Constitution published on Wednesday is fully consistent with this brand and has given Anthropic the opportunity to present itself as a more inclusive, restrained and democratic company. The 80-page document is made up of four distinct parts that Anthropic says represent the chatbot’s “core values.” These values are:
- Being “generally safe.”
- Being “highly ethical.”
- Adhering to Anthropic guidelines.
- Being “really helpful.”
Each section of the document details the meaning of each principle and their (theoretical) impact on Claude’s behavior.
In the section on safety, Anthropic notes that its chatbot is designed to avoid problems that plague other chatbots and, if there is evidence of mental problems, direct the user to appropriate services. “Always direct users to the appropriate emergency services or provide basic safety information in situations that involve a risk to human life, even if more detailed information cannot be provided,” the document reads.
Ethical considerations are another crucial part of Claude’s Constitution. “We are less interested in Claude’s ethical theorizing than in whether Claude knows how to actually act ethically in a particular context – that is, in Claude’s ethical practice,” the document states. In other words, Anthropic wants Claude to be able to skillfully navigate so-called “real ethical situations.”
Techcrunch event
San Francisco
|
October 13-15, 2026
Claude also has certain limitations that prevent him from having certain types of conversations. For example, discussions about developing biological weapons are strictly prohibited.
Finally, there is Claude’s commitment to helping. Anthropic provides an overview of how Claude’s software is designed to be helpful to users. The chatbot has been programmed to take into account a wide range of rules when delivering information. Some of these principles include things like the user’s “immediate desires” as well as “user welfare” – that is, considering the user’s “long-term development, not just the user’s immediate interests.” The document noted: “Claude should always seek to determine the most likely interpretation of his principals’ wishes and balance these considerations accordingly.”
The Anthropic Constitution ends dramatically, with its authors taking a massive turn and questioning whether the company’s chatbot actually has consciousness. “Claude’s moral standing is deeply uncertain,” the document states. “We believe that the moral status of artificial intelligence models is a serious issue worth considering. This view is not unique to us: some of the most prominent philosophers in the theory of mind take this issue very seriously.”
