Why Most Enterprise AI Coding Pilots Underperform (Hint: This is the Wrong Model)

Generation AI in software engineering has gone far beyond autofill. The emerging frontier is agent-based coding: artificial intelligence systems capable of planning changes, executing them in multiple stages, and iterating on feedback. Yet despite the enthusiasm around “AI agents that code,” most enterprise implementations are underperforming. The limiting factor is no longer the model. His context: The structure, history, and intent of the code being changed. In other words, enterprises currently face a systems design problem: they have not yet designed the environment in which these agents operate.

Moving from aid to agency

The past year has seen a rapid evolution from assistive coding tools to agentic workflows. They began to explore what agent behavior means in practice: the ability to reason at the level of design, testing, execution and validation, rather than generating isolated fragments. Work like this dynamic action resampling shows that allowing agents to branch, reconsider, and revise their own decisions significantly improves performance in vast, interdependent codebases. At the platform level, vendors like GitHub are now creating dedicated agent orchestration environments such as Co-pilot’s agent and agent’s headquartersto support multi-agent collaboration in real enterprise processes.

However, early field results are cautionary. When organizations introduce agent tools without considering the workflow and environment, productivity can decline. A randomized controlled trial this year found that developers who used AI support in unchanged workflows completed tasks more slowly, largely due to verification, rework, and confusion about intent. The lesson is basic: autonomy without orchestration rarely delivers efficiency.

Why context engineering is the real unlocker

In every failed implementation I’ve seen, the failure was context-specific. When agents lack a structured understanding of the code base, particularly the relevant modules, dependency graph, test harness, architectural conventions, and change history. They often generate results that seem correct but are disconnected from reality. Too much information overwhelms the agent; too little keeps you guessing. The goal is not to feed the model more tokens. The goal is to define what, when and in what form should be observable to the agent.

Teams seeing significant benefits treat context as an engineering surface. They create tools for snapshotting, compacting, and versioning the agent’s working memory: what is persisted in subsequent rounds, what is discarded, what is summarized, and what is combined rather than inserted. They plan discussion stages, not prompt sessions. They make the spec a high-end artifact, something you can review, test, and own, rather than a transient chat history. This change is part of a broader trend that some researchers call “specifications becoming the new source of truth.”

The workflow must change with the tooling

But context alone is not enough. Enterprises must redesign workflows around these agents. How McKinsey report for 2025 “Year of Agentic Artificial Intelligence” It was noted that productivity gains do not come from overlaying artificial intelligence on top of existing processes, but from rethinking the process itself. When teams simply plug the agent into an unchanged workflow, conflicts arise: engineers spend more time validating AI-written code than they would have spent writing it. Agents can only reinforce what is already structured: well-tested, modular code bases with clear ownership and documentation. Without these foundations, autonomy becomes chaos.

Security and management also require a shift in thinking. AI-generated code introduces fresh forms of risk: untested dependencies, subtle license violations, and undocumented modules that defy peer review. Mature teams are beginning to integrate agent activity directly into their CI/CD pipelines, treating agents as autonomous collaborators whose work must undergo the same unchanging analysis, audit logging, and approval gates as any human developer. GitHub’s own documentation highlights this trajectory, positioning Copilot agents not as replacements for engineers, but as organized participants in secure, reviewable workflows. The goal is not to enable the AI to “write everything”, but to ensure that when it does run, it does so within certain protective barriers.

What should corporate decision-makers focus on now?

For tech leaders, the path forward starts with readiness, not hype. Monoliths with limited testing rarely produce net gains; agents evolve where testing is reliable and can lead to iterative improvement. That’s what a loop is Anthropic calls coding agents. Piloting in narrow domains (test generation, legacy modernization, isolated refactorings); treat each deployment as an experiment with clear metrics (defect escape rate, PR cycle time, change failure rate, burned-in security arrangements). As your exploit of agents grows, treat your agents like a data infrastructure: every plan, context snapshot, activity log, and test run is data that contributes to a searchable memory of engineering intent and lasting competitive advantage.

In fact, agent coding is not a tooling problem, but a data problem. Each context snapshot, test iteration, and code revision becomes a form of structured data that must be stored, indexed, and reused. As these agents proliferate, enterprises will need to manage an entirely fresh layer of data: one that will record not only what was built, but also how it was justified. This change transforms engineering logs into a knowledge graph of intent, decision making, and validation. Over time, organizations that can query and retrieve this contextual memory will overtake those that still treat code as unchanging text.

The coming year will likely determine whether agent coding becomes a cornerstone of enterprise growth or another overpromise. The difference will depend on context engineering: how intelligently teams design the information substrate on which their agents rely. The winners will be those who see autonomy not as magic, but as an extension of disciplined systems design: crystal clear workflows, measurable feedback, and demanding management.

Conclusion

Platforms converge on orchestration and guardrails, and research continues to improve context control during inference. The winners over the next 12 to 24 months won’t be the teams with the flashiest models; they are the ones who develop the context as a resource and treat the workflow as a product. Do this and autonomy will fold. Skip this and the review queue will do it.

Context + agent = leverage. Skip the first half and the rest falls apart.

Dhyey Mavani is accelerating the development of generative artificial intelligence at LinkedIn.

Read more in our guest authors. You might also consider submitting your own entry! See ours guidelines here.

Categories

Why Most Enterprise AI Coding Pilots Underperform (Hint: This is the Wrong Model)

Moving from aid to agency

Why context engineering is the real unlocker

The workflow must change with the tooling

What should corporate decision-makers focus on now?

Conclusion

OpenAI delays ChatGPT ‘adult mode’ again

5 useful Python scripts to automate exploratory data analysis

Sleep apnea often goes undetected in women. This is starting to change

Anthropic’s contract with the Pentagon is a warning to startups chasing federal contracts

When AI companies go to war, security gets left behind

More News

When AI companies go to war, security gets left behind

War with Iran threatens global chip supplies and the expansion of artificial intelligence

ByteDance’s artificial intelligence ambitions are hampered by computational limitations and copyright concerns

OpenAI banned military applications. The Pentagon tested its models through Microsoft anyway

OpenAI delays ChatGPT ‘adult mode’ again

5 useful Python scripts to automate exploratory data analysis

Sleep apnea often goes undetected in women. This is starting to change