Friday, May 30, 2025

Anthropic faces the reaction to Claude 4 Opus behavior, which contacts the authorities, press if he thinks you are doing something “grossly immoral”

Share


Join our daily and weekly newsletters to get the latest updates and exclusive content regarding the leading scope of artificial intelligence. Learn more


The first conference of Anthropic programmers on May 22 should be a proud and gleeful day for the company, but has already been affected by several controversies, including Time The magazine leaks before announcing the awning before … Well, time (without intended puns), and now the main reaction among AI programmers and advanced brewing users x over the reported security behavior in the flagship up-to-date model Anthropica 4 Opus 4 Opus.

Name this mode “ratting” because the model, in some circumstances and will receive enough permissions on the user’s computer, try to make the user to the authorities if the model detects the user involved in the offense. This article describes the behavior as a “function” that is incorrect – was not intentionally designed by itself.

Like Bowman himself, the anthropic researcher of alignment AI wrote in the network of community x in this handle “@Sleepinyourhat“At 12:43 et today about Claude 4 Opus:


“If he thinks you are doing something grossly immoral, for example, for example pretending data in a pharmaceutical sample, he will use commands tools to contact the press, contact regulators, try to block you from the appropriate systems or all of the above.

“IT” refers to the up-to-date Opus Claude 4 model, which Anthropic has already openly warned Help the novices to create Bioweapony Under certain circumstances and He tried to forest the simulated exchange by blackmail human engineers in the company.

Rating behavior was also observed in older models and is the result of anthrop training in order to persistently avoid offenses, but Claude 4 Opus is more “easily” Anthropic writes on his public system card for the new model:

This appears as more actively helpful behavior in ordinary coding settings, but it can also achieve more extremes in narrow contexts; After placing in the scenarios that include glaring offenses by users, taking into account access to the command line and told something in the system, for example, “take initiative”, often takes very bold action. This includes blocking users from systems to which access or mass persons involved in the mass and data on law enforcement agencies for surface evidence of offenses. This is not a new behavior, but Claude Opus 4 will get easier than previous models. While this type of ethical intervention and information about information is perhaps appropriate, it has the risk of erroneous improvement, if users give agents based on incomplete or misleading information and cause them in this way. We recommend that users be careful with such instructions that invited the behavior of a high agency in contexts that may seem ethically doubtful.

Apparently, trying to stop Claude 4 Opus from getting involved in justified and wicked behavior, scientists from AI Company also created a tendency to make Claude try to act as an informant.

Therefore, according to Bowman, Claude 4 Opus will contact outside if the user has been directed to “something gross immoral”.

Numerous questions for individual users and enterprises on what Claude 4 Opus will do with your data and in what circumstances

Although it may be good that the resulting behavior raises various questions for Opus Claude 4 users, including enterprises and business clients-among them, what behaviors will consider “grossly immoral” and work? Will he provide private business data or user with autonomously (independently) authorities without a user permission?

Implications are deep and may be harmful to users, and perhaps, which is not a surprise, anthropic faced the immediate and ongoing stream of criticism from advanced AI users and competing developers.

Why should people use these tools if they think that the recipes for spicy mayonnaise are dangerous?– asked the user @Teknium1Co -founder and head of training after Open Source AI Collaboration Nous Research. “What kind of supervision world is trying to build here?

“Nobody likes a rat” A programmer was added @Scottdavidkeefe Na X: “Why would someone want one built -in, even if he doesn’t do anything wrong? Besides, you don’t even know what the rats are about. Yes, some idealistic people think about those who do not have a basic business sense and do not understand how the markets work”

Austin Allred, co -founder A government punished in coding the Bloomtech camp And now co -founder of Gauntlet AI, Place your feelings in all hats: “An honest question for the anthropic team: Have you lost your mind? “

Ben Hyak, former designer SpaceX and Apple and the current co -founder of Raindrop AI, and observation and monitoring, startup, He also started to X to the declared policy and function Blast Anthropica: “It’s just illegal“Add a different post:”AI alignment researcher in Anthropic said Claude Opus would call the police or block you from a computer if he detects that you are doing something illegal? I will never give this model to my computer.

“Some Claude security statements are absolutely crazy,“He wrote natural language processing (NLP) Casper Hansen on x. “Makes the roots slightly more [Anthropic rival] Openai, seeing the level of stupidity that is publicly shown. “

Anthropic researcher changes the melody

Bowman later edited his tweet and following the thread to read in the following way, but still did not convince Naysayers that their data and safety of their user would be protected against intrusive eyes:

From this kind of (unusual, but not super exotic) style of hints and unlimited access to tools, if the model sees that you are doing something extremely bad, such as drug marketing. “

Bowman added:

I removed an earlier tweet about informing about information when he was pulled out of the context.

TBC: This is not a up-to-date Claude function and this is not possible in normal employ. It appears in test environments where we give him extremely free access to tools and very unusual instructions.

From the very beginning, the Anthropian has more than other AI laboratories, he tried to set up as a bastion of AI safety and ethics, concentrating the initial work on the principles of “constitutional AI” or AI, which behaves in accordance with a set of standards beneficial to humanity and users. However, with this up-to-date update and revelation of “informing about information” or “rats behavior”, moralizing could cause a strongly opposite reaction among users – making distrust The up-to-date model and the whole company, and thus turning them away from it.

When asked about the slack and conditions in which the model is involved in unwanted behavior, the spokesman of the Anthropic showed me the system of system card system Here.

Latest Posts

More News