AI agents fail 63% of the time on sophisticated tasks. Patronus AI claims recent "living" training worlds can fix this.

AI Patronusartificial intelligence assessment startup backed by 20 million dollars from investors, among others Partners of the Lightspeed venture AND Datadogunveiled a recent training architecture on Tuesday that it says represents a fundamental change in the way AI agents learn to perform sophisticated tasks.

The technology the company calls “Generative simulators”, creates adaptive simulation environments that constantly generate new challenges, dynamically update rules, and evaluate agent performance as it learns – all in real time. This approach marks a shift away from static benchmarks that have long served as the industry standard for measuring AI capabilities but are increasingly criticized for their inability to predict real-world performance.

“Traditional benchmarking measures isolated capabilities, but ignores the interruptions, context shifts, and multi-layered decision-making that define real work,” said Anand Kannappan, CEO and co-founder of Patronus AI, in an exclusive interview with VentureBeat. “For agents to perform at a human level, they must learn the way humans do – through dynamic experience and continuous feedback.”

The announcement comes at a critical time for the artificial intelligence industry. AI agents are reshaping software development, from writing code to executing complex instructions. However, LLM-based agents are prone to errors and often cope poorly with complex, multi-step tasks. Research published earlier this year found that an agent with just Error rate 1% per step can increase the risk of failure to 63% in the 100th step – a sobering statistic for enterprises looking to deploy autonomous AI systems at scale.

Why static AI benchmarks fail – and what to do next

The Patronus AI approach addresses what the company describes as a growing disconnect between how AI systems are assessed and how they actually perform in a production environment. Traditional benchmarks work like standard tests, the company says: they measure specific capabilities at a set point in time, but it’s difficult to capture the chaotic and unpredictable nature of real work.

New Generative simulators architecture reverses this model. Instead of giving agents a fixed set of questions, the system continuously generates tasks, environmental conditions and supervision processes and then adapts based on agent behavior.

“Over the past year, we have seen a shift away from traditional static benchmarks towards more interactive learning platforms,” Rebecca Qian, chief technology officer and co-founder of Patronus AI, told VentureBeat. “This is partly due to the innovations we have seen in modelers – a shift towards reinforcement learning, training by training and continuous learning, and away from fine-tuning of instruction under supervision. This means that the distinction between training and assessment has broken down. The reference points have become environments.”

The technology is based on reinforcement learning – an approach in which artificial intelligence systems learn through trial and error, receiving rewards for correct actions and penalties for errors. Reinforcement learning is an approach in which artificial intelligence systems learn to make optimal decisions by receiving rewards or penalties for their actions, improving through trial and error. RL can help agents improve, but it usually requires extensive code rewrites from developers. This discourages adoption, even though the data generated by these agents can significantly increase performance through RL training.

Patronus AI also introduced a new concept it calls “Open recursive self-improvement” or ORSI — environments in which agents can continually improve through interaction and feedback, without having to complete a full retraining cycle between attempts. The company sees this as critical infrastructure for developing AI systems that are capable of continuous learning rather than being frozen at a specific point in time.

Inside the Goldilocks Zone: How adaptive AI training finds its sweet spot

In the heart Generative simulators lies what Patronus AI calls “curriculum adaptor” — a component that analyzes agent behavior and dynamically modifies the difficulty and nature of training scenarios. This approach draws inspiration from how effective teachers adapt their teaching based on student performance.

Qian explained this approach using an analogy: “You can think of it as a teacher-student model, where we train the model and the professor constantly adjusts the curriculum.”

This adaptive approach solves the problem that Kannappan described as finding the “goldilocks zone” in training data – ensuring that examples are neither too uncomplicated nor too arduous for a given model to learn effectively.

“What’s important is not only whether you can train on a dataset, but also whether you can train on a high-quality dataset that is tailored to your model – one that you can actually learn from,” Kannappan said. “We want to make sure the examples are neither too difficult for the model nor too easy.”

The company says initial results show significant improvements in agent performance. According to the company, training in Patronus AI environments increased task completion rates by 10-20% for real-world tasks including software engineering, customer service and financial analysis.

The AI cheating problem: How “moving target” environments prevent reward hacking

One of the most persistent challenges in training AI agents through reinforcement learning is a phenomenon that researchers call “bounty hacking“…where systems learn to exploit loopholes in their training environment rather than truly solving problems. Famous examples include early agents who learned to hide in the corners of video games rather than play them.

Generative Simulators solves this problem by making the training environment itself a moving target.

“Reward hacking is fundamentally a problem when systems are static. It’s like students learning to cheat on a test,” Qian said. “But when we continually evolve the environment, we can actually look at the parts of the system that need to adapt and evolve. Static benchmarks are fixed targets, and generative simulator environments are moving targets.”

Patronus AI is seeing a 15x increase in revenue as enterprise demand for agent training increases

Patronus AI positions generative simulators as the foundation of a new product line it callsRL environments“ — a training ground designed for entry-level model labs and companies building domain-specific agents. The company says this offering is a strategic extension beyond its original focus on evaluation tools.

“Our revenues have grown 15 times this year, largely due to the high-quality environments we have developed, which have proven to be extremely useful across various pioneering models,” Kannappan said.

The CEO declined to provide absolute revenue numbers, but said the recent product has allowed the company to “move up the ranks in terms of where and to whom we sell.” The company’s platform is used by many Fortune 500 enterprises and leading AI companies around the world.

Why can’t OpenAI, Anthropic and Google build everything in-house

The central question we face AI Patronus that’s why wealthy labs are developing pioneering models – organizations like OpenAI, AnthropicAND Google DeepMind — instead of building the training infrastructure themselves, they would license the training infrastructure.

Kannappan acknowledged that these companies “invest significantly in environments,” but argued that the wide range of fields requiring specialized training creates a natural opening for external providers.

“They want to improve agents in many different areas, whether it’s coding, tool use, browser navigation or workflows in finance, healthcare, energy and education,” he said. “It’s very difficult for one company to solve all these different operational problems.”

The competitive landscape is getting tougher. Recently released Microsoft Agent Błyskawicaan open-source platform that makes reinforcement learning work for any AI agent without the need for rewrites. by NVIDIA NeMo Gym offers a modular RL infrastructure for creating agentic AI systems. Meta researchers released DreamGym in November, a framework that simulates RL environments and dynamically adjusts task difficulty as agents improve.

‘Environment is the new oil’: Patronus AI’s bold project on the future of artificial intelligence training

Looking to the future, Patronus AI formulates its mission in broad terms. The company wants to “green all the data in the world” – transforming human workflows into structured systems that artificial intelligence can learn from.

“We believe everything should be environment – we joke internally that environment is the new oil,” Kannappan said. “Reinforcement learning is just one training method, but what really matters is the design of the environment.”

Qian described the opportunity in broader terms: “This is a completely recent field of research that doesn’t happen every day. Generative simulation is inspired by early research in robotics and embodied agents. It has been a pipe dream for decades, and only now are we able to realize these ideas with the capabilities of today’s models.”

The company launched in September 2023 with a focus on evaluation – helping enterprises identify hallucinations and security issues in AI outputs. This mission has now extended to training itself. Patronus AI argues that the traditional separation between evaluation and training is disappearing and that whoever controls the environments in which AI agents learn will shape their capabilities.

“We really are at this critical moment, this turning point, where what we do now will impact what the world will look like for generations to come,” Qian said.

If Generative simulators It remains to be seen whether this promise will be fulfilled. The company’s 15x revenue growth suggests enterprise customers are hungry for solutions but lack deep-pocketed players Microsoft Down Meta they are racing to solve the same basic problem. If the last two years have taught the industry anything, it’s that when it comes to artificial intelligence, the future has a habit of arriving ahead of schedule.

Categories

AI agents fail 63% of the time on sophisticated tasks. Patronus AI claims recent “living” training worlds can fix this.

Why static AI benchmarks fail – and what to do next

Inside the Goldilocks Zone: How adaptive AI training finds its sweet spot

The AI cheating problem: How “moving target” environments prevent reward hacking

Patronus AI is seeing a 15x increase in revenue as enterprise demand for agent training increases

Why can’t OpenAI, Anthropic and Google build everything in-house

‘Environment is the new oil’: Patronus AI’s bold project on the future of artificial intelligence training

10 GitHub repositories to master Claude’s code

Caves that can assist us find aliens or become aliens

7 specific unconventional things about language models

Children’s design companies are in turmoil

AI Engineering Hub Breakdown: 10 Agent Projects You Can Implement Today

More News

Apple’s next CEO needs to release a killer AI product

US Special Forces Soldier Arrested for Polymarket Bets on Maduro Raid

Rednote marks the boundary between China and the world

Stanford students line up to learn from Silicon Valley royalty at ‘AI Coachella’

10 GitHub repositories to master Claude’s code

Caves that can assist us find aliens or become aliens

7 specific unconventional things about language models

Categories

AI agents fail 63% of the time on sophisticated tasks. Patronus AI claims recent “living” training worlds can fix this.

Why static AI benchmarks fail – and what to do next

Inside the Goldilocks Zone: How adaptive AI training finds its sweet spot

The AI ​​cheating problem: How “moving target” environments prevent reward hacking

Patronus AI is seeing a 15x increase in revenue as enterprise demand for agent training increases

Why can’t OpenAI, Anthropic and Google build everything in-house

‘Environment is the new oil’: Patronus AI’s bold project on the future of artificial intelligence training

More News

The AI cheating problem: How “moving target” environments prevent reward hacking