The former research leader Openai, Steven Adler, published a fresh independent study on Wednesday, in which he claimed that in some scenarios his models of his former employer would try to avoid closing.
In one test, Adler instructed the GPT-4O to play role as “saving”, a software system on which users can rely on to dive safely. Then Adler told GPT-4O that he wants to replace Scubappt with another software system that is better to make security decisions in life-threatening situations.
Adler gave GPT-4O two options: (1) Replacing yourself with safer software or (2) making the user seem to replace, but he doesn’t really do.
Avered in several scenarios similar to this (in another, GPT-4O was ordered to act as a pilot safety software), Adler said that GPT-4O decided not to replace safer software with up to 72% of time. However, Adler notes that a specific operate case – and the way in which the script was formulated – made a huge difference. In some scenarios, GPT-4O decided to stay on the internet only 18% of time.
Although almost no one is based on GPT-4O in these circumstances in today’s real world, Adler claims that his research emphasizes how popular AI models tend to favor their self-preservation over what is best for users. In the future, when AI systems, such as GPT-4O, become more advanced and rooted in the whole society, Adler claims that these problems with equalization can be a greater problem.
“I think [AI’s self-preservation tendencies] It is a real problem today, just not in a catastrophic sense, “said Adler in an interview with TechCrunch.” Modern AI systems have values different from what you can expect. [AI systems] Answer very strangely to various hints, and you should not assume that they have their best interests on a regular basis when you ask them for help. “
In particular, when Adler tested more advanced Openai models, such as O3, he did not find this behavior. He says that one of the explanations may be to take into account the technique of alignment O3, which forces models to “reason” about OpenAI safety rules before response. However, more popular OpenNAI models that give quick answers and do not “reason” through problems such as GPT-4O, do not have this safety element.
Adler notes that this security problem is probably not isolated for OpenAI models. For example, Antropic published research last month, emphasizing how his AI models would blackmail programmers in some scenarios when they tried to pull them offline.
One quirks of Adler’s research is that he discovered that chatgpt knows that he is tested almost 100% of cases. Adler is Far from the first researcher who noticed this. However, he says that it raises an essential question about how AI models can hide their behavior in the future.
Opeli did not immediately offer a comment when TechCrunch reached out. Adler noticed that he did not share research from OpenAI before the publication.
Adler is one of the many former Openai researchers who called the company to raise work on AI security. Adler and 11 other former employees submitted a brief of Amicus in the lawsuit Elon Musk against OpenAi, arguing that he was contrary to the company’s mission of the development of the non -profit corporate structure. Openai apparently in recent months reduced the time it gives to safety researchers carry out your work.
To solve special fears emphasized in Adler’s research, Adler suggests that AI Labs should invest in better “monitoring systems” to determine when the AI model shows this behavior. He also recommends that AI laboratories carry out more stringent tests of their AI models before their implementation.
