On the final day of the ship masses, OpenAI unveiled a modern set of frontier “reasoning” models called o3 and o3-mini. Edge announced for the first time that a modern reasoning model would emerge during this event.
The company is not currently releasing these models (and acknowledges that final results may evolve with more training). However, OpenAI is accepting applications from the research community to test these systems before publication (which date has not yet been set). OpenAI launched o1 (codenamed Strawberry) in September and jumps straight to o3, skipping o2 to avoid confusion (or trademark conflicts) with the British telecommunications company O2.
Deadline reasoning has recently become a buzzword in the artificial intelligence industry, but it essentially means that a machine breaks down instructions into smaller tasks that can produce better results. These models often show how the answer was arrived at, rather than simply providing a final answer without explanation.
According to the company, o3 exceeds previous performance records in all areas. It beats its predecessor in coding tests (called SWE-Bench Verified) by 22.8 percent and outperforms OpenAI’s chief scientist in competitive programming. The model almost won one of the toughest math competitions (called AIME 2024) by missing one question and achieved 87.7% in the benchmark for expert-level science problems (called GPQA Diamond). For the most arduous math and reasoning challenges that typically hamper AI, o3 solved 25.2 percent of the problems (where no other model exceeds 2 percent).
The company also announced modern research on deliberative alignment, which requires an AI model to process security decisions step by step. So instead of simply giving the AI model yes/no rules, this paradigm requires it to actively consider whether a user’s request is consistent with OpenAI’s security policy. The company claims that by testing it on the o1, it was much better at adhering to security guidelines than previous models, including GPT-4.