Tuesday, March 10, 2026

Anthropic claims to have solved long-standing AI agent problem with up-to-date multi-session Claude SDK

Share

Agent memory remains a problem that companies want to solve because agents tend to forget certain instructions or conversations as they work for longer periods of time.

Anthropic he thinks he has solved this problem for himself Agent Claude SDKdeveloping a dual solution that allows the agent to work in different context windows.

“The main challenge for long-running agents is that they have to work in separate sessions, and each new session starts with no memory of what came before,” Anthropic wrote in blog post. “Because context windows are limited and because most complex projects cannot be completed in a single window, agents need a way to bridge the gap between coding sessions.”

Anthropic engineers proposed a two-fold approach to the Agent SDK: an initialization agent to configure the environment and an encoding agent that ensures incremental progress in each session and leaves artifacts for the next one.

Agent memory problem

Because agents are built on base models, their behavior is circumscribed by circumscribed, but constantly growing, context windows. For long-running agents, this can create a bigger problem, leading to forgotten instructions and unusual behavior while performing a task. Increasing agent memory becomes crucial for consistent, unthreatening operation.

Several methods have emerged over the past year, all attempting to bridge the gap between context windows and agent memory. LangChainLangMem SDK, Notes database AND OpenAI‘s Swarm are examples of companies offering memory solutions. Research on agentic memory has also recently exploded, with proposals frameworks such as Memp and Nested learning paradigm With Google offering up-to-date alternatives to improve memory.

Many current memory structures are open source and can be ideally adapted to the various agents powering enormous language models (LLMs). The Anthropic approach improves on the Claude Agent SDK.

How it works

Anthropic stated that while the Claude Agent SDK has context management capabilities and “should allow the agent to continue useful work for an arbitrary length of time”, this was not sufficient. The company said in its blog post that this is the model as Opus 4.5 running the Claude Agent SDK may “not be enough to build a production-quality web application if you only get a high-level prompt, such as ‘create a claude.ai clone.’

Anthropic found that failure manifested itself in two patterns. First, the agent was trying to do too much, so there was a lack of context in the middle of the model. The agent then has to guess what happened and cannot pass clear instructions to the next agent. The second failure occurs later when some features have already been built. The agent sees the progress and simply declares the task completed.

Anthropic researchers have come up with a solution: setting up a starting environment that lays the groundwork for functionality and encourages each agent to make incremental progress toward the goal, while leaving a blank slate at the end.

This is where Agent Anthropic’s two-part solution comes into play. The initiating agent configures the environment by recording what the agents have done and what files have been added. The encoding agent will then ask the models to progress incrementally and leave ordered updates.

“The inspiration for these practices came from knowing what effective software engineers do every day,” Anthropic said.

The researchers said they added testing tools to the coding agent, improving its ability to identify and fix bugs that weren’t obvious from the code itself.

Future research

Anthropic noted that its approach is “one possible set of solutions within a long-term set of agents.” However, this is only the beginning of a stage that may become a broader research area for many people dealing with artificial intelligence.

The company said its experiments to escalate agents’ long-term memory did not demonstrate whether a single general-purpose encoding agent performed best in different contexts or in a multi-agent structure.

The demo also focused on building full-stack web applications, so other experiments should focus on generalizing the results for different tasks.

“It is likely that some or all of these conclusions can be applied to the types of long-term agentic tasks required in, for example, scientific research or financial modeling,” Anthropic said.

Latest Posts

More News