Saturday, April 19, 2025

Up-to-date reasoning of AI OpenAi Hallucinations more

Share

Recently running models O3 and O4-Mini AI are in many respects the most current. However, recent models are still hallucinates or come up with something – in fact they hallucinate than a few older Openai models.

Hallucinations turned out to be one of the largest and most tough problems to solve in artificial intelligence, even affecting the best competent systems. Historically, each recent model slightly improved in the hallucinations department, hallucinating less than its predecessor. But it seems that this is not the case with O3 and O4-Mini.

According to OPENAI, O3 and O4-Mini internal tests, which are so-called reasoning models, hallucicnic than previous models of reasoning of the company-O1, O1-Mini and O3-Mini-A, also customary “unjustified” models of OpenAI, such as GPT-4O.

Perhaps more disturbing, the manufacturer Chatgpt does not really know why this is happening.

In your technical report for O3 and O4-MiniOpeli writes that “further research is needed” to understand why hallucinations deteriorate when the reasoning models scale. O3 and O4-Mini achieve better results in some areas, including coding and mathematics tasks. However, because “they generally raise more claims, they often lead to” more correct claims, as well as more incorrect/hallucinated claims, “in accordance with the report.

Opeli stated that O3 was hallucinated in response to 33% of questions regarding Personq, the company’s internal reference point to measuring the accuracy of the knowledge of the model about people. This is about twice as high as the hallucinations indicator of previous models of OPENAI, O1 and O3-Mini reasoning, which obtained 16% and 14.8%, respectively. O4-Mini was even worse in personqa-halucinalat 48% of cases.

Page three Testing Through Transluce, non -profit AI Research Lab, he also found evidence that O3 tends to create actions that he took during the implementation of the response. In one example of Transluce he observed O3, claiming that he launched a code on the MacBook Pro 2021 “outside Chatgpt”, and then copied the numbers to his answer. Although O3 has access to some tools, it can’t do it.

Sarah Schwettmann, a co -founder of Transluce, added that the Halucination indicator O3 may make him less useful than differently.

Kian katanforoosh, assistant professor Stanford and general director of Upskilling startup worker, told Techcrunch that his team is already testing O3 in his coding flows and that they discovered that he was a step above the competition. However, katanforoosh claims that O3 tends to hallucinate broken links to the page. The model will provide a link that does not work after clicking.

Hallucinations can facilitate in the models of reaching captivating ideas and be artistic in their “thinking”, but make some models tough for companies on markets where accuracy is the most critical. For example, the office would probably not be satisfied with the model that introduces many actual errors to the customer’s contracts.

One promising approach to increasing the accuracy of models is to provide them with the possibility of searching for the network. GPT-4O OPENAI with network search 90% accuracy At Simpleq, another of the OPENAI accuracy tests. Potentially, searching can also improve the hallucinations of reasoning models-at least in cases where users are willing to reveal hints for the search for third-party search provider.

If the scaling of reasoning models actually still worsens hallucinations, it will hunt for the solution, the more urgent.

“Turning to hallucinations in all our models is the current area of ​​research and we are constantly working on improving their accuracy and reliability,” said OpenAi spokesman Niko Felix We -mail for TechCrunch.

Last year, the wider AI industry focused on reasoning models after the techniques of improving customary AI models that began to show decreasing phrases. The reasoning improves the performance of the model in various tasks, without requiring huge amounts of calculations and data during training. However, it seems that reasoning can also lead to more hallucinations – it is a challenge.

Latest Posts

More News