Opeli has released a novel one benchmark On Thursday, how his AI models work compared to professionals from a wide range of industries and jobs. The test, PDPVAL, is an early attempt to understand how close Openai systems are exceeding people in the valuable economic matter – the key part of the mission of the founding company consisting in developing artificial general intelligence or Aga.
Opeli claims that its GPT-5 and Claude Opus 4.1 model from Anthropik “are already approaching the quality of work produced by industry experts.”
This does not mean that OPENAI models will start to replace people at work immediately. Despite the forecasts of some presidents And will take up the work of people in a few years, Opeli admits that PDPVAL today includes a very narrow number of tasks performing in their real work. However, this is one of the latest ways in which the company measures the progress of AI towards this milestone.
PDPVAL is based on nine industries that most contribute to the American gross domestic product, including domains such as healthcare, finance, production and government. Benchmark tests the performance of the AI model in 44 competitions among these industries, from software engineers to nurses to journalists.
In the first version of the OPENAI test, GDPVAL-V0, OpenAI asked experienced professionals to compare reports generated by AI with other professionals, and then choose the best. For example, one of the quick asked investment bankers to create a landscape of competitors for the last mile delivery industry and compare them with reports generated by AI. Openai then average AI “wins” in relation to human reports in all 44 competitions.
In the case of the GPT-5-in-one version of the GPT-5 with additional computing force, the company claims that the AI model has been considered better than or on an equal footing with industry experts in 40.6% of cases.
Opeli also tested the Claude Opus 4.1 model from Anthropica, which was better than or on a par with industry experts in 49% of tasks. Opeli says that Claude has gained such a high rating because of his tendency to create pleasant graphics, not pure performance.
TechCrunch event
San Francisco
|.
October 27-29 2025
It is worth noting that most of the working professionals do much more than sending research reports to their boss, which is everything to which PDPval-V0 is testing. Opeli confirms this and says that in the future it plans to create more solid tests that can take into account more industries and interactive work flows.
Nevertheless, the company perceives progress in PDPVAL as noteworthy.
In an interview with TechCrunch, the main economist Opeli, Dr. Aaron Chatterja, said that GDPVAL results suggest that people in these works can now employ AI models to spend time on more significant tasks.
“[Because] The model becomes good in some of these things, “says Chatterja -” People in these tasks can now use the model, more and more often when the possibilities become better to relieve part of their work and perform a potentially higher value. “
Openai’s assessments lead this Patwardhan, says Techcrunch that she encourages her to the pace of progress in PDPVAL. The GPT-4O Openai model obtained only 13.7% (wins and ties compared to people), which was published about 15 months ago. Now GPT-5 results almost triple that Patwardhan’s trend expects to continue.
The Silicon Valley has a wide range of comparative tests, which it uses to measure the progress of AI models and assess whether a given model is the most contemporary. Among the most popular are AIME 2025 (test of competitive mathematical problems) and GPQA Diamond (test of scientific questions at doctoral level). However, several AI models are approaching the saturation of some of these comparative tests, and many AI researchers quoted the need for better tests that can measure AI’s proficiency on real tasks.
Benchmarks, such as PDPVAL, can become more and more critical in this conversation, because OpenAI claims that his AI models are valuable for a wide range of industries. But OpenAI may need a more comprehensive test version to finally say that his AI models can surpass people.
