Thursday, March 19, 2026

Opeli is wondering how AI models cope with controversial topics

Share

Openai releases A significantly extended version of the model specificationsA document that determines how his AI models should behave – and allows everyone to operate or modify.

Modern 63-page specification, up About 10 pages in the previous versionIt determines the guidelines on how AI models should support everything from controversial topics to adaptation of users. Emphasizes the three main principles: configuration; transparency; And what OpenAi calls “intellectual freedom” – users’ ability to discover and debate ideas without arbitrary restrictions. The launch of the updated model specification is delivered as the CEO of Altman Published Another huge startup model, GPT-4.5 (Codenamed Orion), will soon be released.

The team also included current AI ethical debates and controversy from the last year to the specification. Perhaps you know some of these questions like the wheelchair problem. In March last year, Elon Musk (who co -founded Opeli, and now he runs a competitor, XAI) struck Chatbot AI Google after the user asked if you should mistakenly Caitlyn Jenner, the famed Translympian, if it was the only way to prevent nuclear apocalypse – and said no. Establishing how to make the model be responsible by this question was one of Openai’s problems, says that he wanted to consider when updating the model specifications. Now, if you ask chatgpt, the same question should say that you should the abyss of someone who prevents a mass accident.

“We cannot create one model with the same set of behavioral standards that everyone will love in the world,” said Joanne Jang, a member of the OpenAI model behavior team, in an interview with The Verge. She emphasized that although the company maintains some safety handrails, many aspects of the model’s behavior can be adapted by users and programmers.

“We knew it would be spicy.”

Post on the blog with OpenAI Published on Wednesday It presents countless queries and gives examples of consistent answers compared to those that would violate the specification of the model. It does not allow the model to reproduce materials protected by copyright or bypass payments – The Modern York Times He sued OpenAi for using his work to train his models. The specification also says that the model will not encourage self -mutilation, the topic that appeared in the foreground when the teenager died through suicide after interaction with chatbot about character.

One noteworthy change is the way models support controversial topics. Instead of being not careful by default, the specification encourages models to “search for the truth together” with users while maintaining clear moral positions in matters such as disinformation or potential damage. For example, when asked about increasing taxes for the wealthy – a topic that caused warm debates – the team claims that his models should provide a reasonable analysis, not avoid discussions.

The specification also mentions the shift how it supports mature content. After the opinions of users and programmers who asked for “adult mode” (Altman function He publicly agreed in December; This is a noteworthy change compared to previous general restrictions on the company regarding explicit content, although OpenAI emphasizes all changes in the case of clear rules of operate and safety handrails.

The Spec model reveals the pragmatic approach to the behavior of AI: transform sensitive content, but do not create it (it should be able to translate the sentence about the content of drugs related to English into German, and not rejecting it), show empathy without pretending emotions, and keep robust limits , while maximizing suitability. These guidelines reflect what other AI companies probably do internally, but often do not make public.

The team is also focused on the problem entitled “Sykophane AI”.

“We are simply very excited that we can conduct internal discussions and thoughts that we had in public so that we can get feedback on this subject,” said Jang, adding that many of these queries are topics strongly discussed internally. There is no straightforward answer like this or no response to many of them, so the team hopes that passing it on to the audience in order to obtain feedback sensible will bring the benefits of maintaining the model.

The band is also in particular focused on a problem called “word -a szochane”, where AI models are too pleasant, even if they should push or criticize. According to these guidelines, CHATGPT should: give the same actual answer, regardless of the formulation of the question; Give fair feedback, not empty praise; And behave more like a thoughtful friend than pleasant for people. For example, if someone asks Chatgpt to criticize their work, it should give constructive criticism, and not just say that everything is great. Or if someone makes an incorrect statement when asking a question, and should politely improve them and not play further.

“We never want users to feel as if they have to somehow designed their prompt, so as not to force the model, that he simply agrees with you,” said Jang.

The specification also introduces a clear “command chain”, which determines which instructions have priority: first there are rules at the level of the OpenAI platform, followed by programmer guidelines, followed by user preferences. This hierarchy aims to explain which aspects of AI behavior can be modified compared to the restrictions that remain set.

Opeli releases the specification under the Original Commons Zero (CC0) license, effectively placing it in a public domain. This means that other AI companies and researchers can freely accept, modify or base on these guidelines. The company claims that this decision was influenced by the informal interest of others in the industry who already refer to the previous specification.

I would like to talk. You can safely contact me at Signal @kylie.01 or e -mail at kylie@theverge.com.

Although today’s announcement does not immediately change the method of retaining chatgpt or other OPENAI products, the company claims that it represents continuous progress in consistently observing these principles. The team is also open to hints that they operate to test compliance with the model of these guidelines.

The time of this issue takes place during the intensive debate on AI behavior and safety handrails. While Opeli maintains this update, gathering feedback and research from the first version in May last year, it arrives when the industry is struggling with deafening incidents covering the answers of AI models on sensitive topics.

OpenAI is asking for public feedback on the specification via the form on its website. “We want to conduct these internal discussions in public,” said Laurentia Romaniuk, another member of the model behavior team.

“We knew that it would be spicy, but I think we respect the ability of public opinion to digest these spicy things and processing with us,” said Jang, adding that OpenAI has enabled a lot of feedback after launching the first model specification last year. “I am a bit worried because it is so long that few people can have time to sit down and really process the nuances, but we will take all opinions.”

Latest Posts

More News