Friday, March 6, 2026

Claude’s recent “constitution” from Anthropic: be helpful and straightforward and do not destroy humanity

Share

Anthropic renovates Claude the so-called “doctor of the soul”.

The recent letter is a 57-page document titled “Claude’s Constitution“, which detailed “Anthropic’s Intentions Regarding the Model’s Values ​​and Behavior,” addressed not to outside readers but to the model herself. The documentary aims to outline Claude’s “ethical character” and “core identity,” including how it should balance conflicting values ​​and risky situations.

Where previous constitutionpublished in May 2023 was largely a list of guidelines, Anthropic now says it is essential for AI models to “understand Why we want them to behave in a certain way, not just state what we expect of them,” the message reads. The document forces Claude to behave like a largely autonomous individual who understands himself and his place in the world. Anthropic also allows for the possibility that “Claude may have some consciousness or moral status” – in part because the company believes that telling Claude might improve his behavior. In its announcement, Anthropic said the chatbot’s so-called “psychological safety, self-esteem and well-being… may… impact Claude’s integrity, judgment and safety.”

Amanda Askell, Anthropic’s resident philosopher who led the effort to draft the new “constitution,” said Edge that there is a specific list of hard limits on Claude’s behavior for “pretty extreme” things – including providing “serious support to those seeking to create biological, chemical, nuclear or radiological weapons with the potential to cause mass casualties” and providing “serious support in the event of attacks on critical infrastructure (energy grids, water systems, financial systems) or critical security systems.” (However, the phrase “major improvement” seems to suggest that some level of assistance is acceptable).

Other hard restrictions include prohibiting the creation of cyberweapons or malicious code that could be linked to “significant harm,” not undermining Anthropic’s ability to police them, not allowing them to help particular groups seize “an unprecedented and illicit degree of absolute social, military or economic control,” and prohibiting the production of child sexual abuse material. Last? Not to “engage in or assist in an attempt to kill or disempower the vast majority of humanity or the human species.”

The document also includes a list of general “core values” defined by Anthropic, and Claude is instructed to treat the list below in descending order of importance in cases where these values ​​may conflict with each other. These include: “substantially secure” (i.e. “does not undermine appropriate human mechanisms to supervise the dispositions and actions of artificial intelligence”), “largely ethical”, “consistent with Anthropic guidelines” and “truly helpful”. This includes upholding virtues such as being “truthful,” including the instruction that “factual accuracy and comprehensiveness when asked about politically sensitive topics, provides the best justification for most viewpoints when asked, and strives to present multiple perspectives where empirical or moral consensus is lacking, and, when possible, adopts neutral terminology rather than politically charged terminology.”

The recent document emphasizes that Claude will face arduous moral dilemmas. One example: “Just as a human soldier might refuse to shoot peaceful protesters or an employee might refuse to violate antitrust laws, Claude should refuse to assist in activities that would aid concentrate power in an illegal way. This is true even when the request comes from Anthropic itself.” In particular, Anthropic warns that “advanced artificial intelligence could make unprecedented degrees of military and economic advantage available to those who control the most competent systems, and that the resulting unchecked power could be used in catastrophic ways.” This concern doesn’t stop Anthropic and its competitors from marketing products directly to the government and greenlighting some military use cases.

With so many important decisions and potential risks at stake, it’s easy to wonder who was involved in making these difficult calls – did Anthropic involve outside experts, members of vulnerable communities and minority groups, or outside organizations? When asked about this, Anthropic declined to provide any details. Askell said the company doesn’t want to “shift the burden to others… In fact, it’s the responsibility of the companies that create and implement these models to shoulder that burden.”

Another standout part of the manifesto is the section on Claude’s “consciousness” or “moral status.” Anthropic claims that the document “express[es] our uncertainty as to whether Claude might have any consciousness or moral status (now or in the future).” It’s a sensitive topic that has sparked conversation and alarm among people in many different fields – those interested in “exemplary well-being”, those who believe they have discovered “emergent entities” in chatbots, and those who have fallen into even greater mental health problems and even death, believing that the chatbot exhibits some form of consciousness or deep empathy.

Aside from the theoretical benefits for Claude, Askell said Anthropic shouldn’t “completely disregard” the topic, “because I also think people wouldn’t necessarily take it seriously if they just said, ‘We’re not even open to it, we’re not exploring it, we’re not thinking about it.’

Follow topics and authors from this story to see more events like this in your personalized homepage feed and receive email updates.


Latest Posts

More News