How Agentic Coding Is Changing Software Delivery

Author: Dusan Salovic, Data Scientist

|

8.6.2026

Writing code is no longer the bottleneck, trust is. AI can generate, test and ship faster than humans can review. The winners will not be those who create the most code, but those who turn AI output into trusted, observable software at scale. That requires governance and judgment, not just speed.

Entwicklerteam nutzt Agentic Coding zur Automatisierung und Optimierung moderner Softwareentwicklung in einer kollaborativen Arbeitsumgebung.

Something quietly shifted in software development over the years and most organisations are still catching up to what it means.

Writing code used to be the bottleneck. Not anymore. A capable AI agent can now inspect a codebase, propose a plan, change multiple files, run tests, and iterate, often before a human engineer has worked through the full requirements. The constraint that shaped how software teams were structured, how projects were scoped and how delivery timelines were estimated has fundamentally weakened. The next bottleneck in software development is not speed, tooling, or even talent. It is trust.

The hardest problem is no longer producing software-shaped output. What remains genuinely difficult is turning intent into correct, safe, observable behaviour in systems that actually matter.

Why the trust gap will not close on its own

The market signal is clear enough that this can no longer be dismissed as hype. Stack Overflow’s 2025 Developer Survey found that 84% of respondents are already using or planning to use AI tools in their development process and 51% of professional developers report daily use. Yet trust is lagging badly: more developers actively distrust the accuracy of AI output (46%) than trust it (33%).

That gap is not a temporary growing pain. It is structural.

AI systems generate code by predicting statistically likely continuations based on large amounts of training data. They are extremely good at reproducing familiar patterns: standard APIs, common business logic, template-based components. Problems appear when deep understanding of context and consequences is required: distributed systems with complex fault-tolerance needs, business logic with subtle constraints, edge cases that only surface under real load.

An AI agent has no concept of what an hour of downtime costs in terms of revenue, reputation, or contractual obligations. The gap between code that satisfies the conditions it was tested against and behaviour that actually serves the system’s purpose is exactly where the trust issue lies.

Consider a loyalty discount feature built by an agent. The spec said “20% off for loyalty members.” The agent implemented it correctly. What the spec never defined was whether the discount should stack with active promotional codes. In production, during a high-traffic campaign, it did. The tests had passed. The business logic had not.

Understanding this distinction is the first job of leadership. AI does not need to be perfect to be useful. But it does need to be governed.

The bigger shift: implementation becomes orchestration

The classic software delivery model was built around a simple scarcity: human implementation time. Requirements moved to design, design moved to engineering, engineering moved to QA, QA eventually moved to operations. Every handoff created delay, but output was bounded by how much code humans could write and review.

When agentic systems enter the workflow, that constraint weakens. Generation scales far faster than organisational confidence.

Think of it as two distinct loops. One is the loop of figuring out what to build and what good looks like: the goals, the outcomes, the decisions about direction. That loop stays human. And there is the loop of actually building it: creating the code, tests, tools and infrastructure that make the outcome real. That second loop is increasingly what agents handle. The gap between those loops is where most failures occur. Intent stated loosely, agent implementing literally and a result that technically works but does not actually serve the purpose.

The engineer’s job shifts from working inside it to working on it, designing the conditions under which the agent operates well, rather than executing each step personally. The scarce skill is no longer typing code. The scarce skill is converting messy human intent into explicit instructions, constraints, checks and rollout conditions. Requirements, in other words, can no longer afford to be approximate.

This has real implications for how leaders think about talent and team structure. As lower and mid-level implementation tasks become more automated, demand shifts toward engineers who can review generated output and reason about system-level consequences. Ticket ownership becomes system ownership. Coding skill expands into planning skill.

Governance is now a competitiveness issue

Many organisations still frame AI adoption as a policy question: allow it or ban it. That framing is too shallow. The real question is whether AI use will happen in a governed way or in the shadows.

Shadow AI is now widely recognised. It refers to the unsanctioned use of AI tools without formal IT approval or oversight. A 2025 WalkMe survey of 1,000 working adults found that 78% of employees admit to using AI tools not approved by their employer. A separate UpGuard report found that over 80% of workers, including nearly 90% of security professionals, use unapproved AI tools in their jobs.

Hard bans often do not eliminate use. They simply remove visibility and governance from something the workforce already finds useful.

Governed enablement is usually the stronger approach

Approve a sanctioned stack. Define where AI may and may not be used. Restrict access to sensitive repositories and data flows. Specify review obligations. Log actions. Train teams on safe patterns. Competitiveness and compliance increasingly point to the same move: controlled adoption with clear rails.

The business case is direct: ungoverned AI code that reaches production creates revenue exposure through incidents, compliance violations, and eroded customer trust. Governed adoption, by contrast, shortens time-to-market by removing the friction of ad hoc review, while keeping operational stability intact.

Governance is not a brake on innovation. It is what keeps innovation from turning into untraceable operational risk.

The SDLC Is becoming an Intent-to-Release System

In the classic software life cycle, requirements, implementation, QA and operations were separated both organisationally and in time. Teams could afford that separation because humans were able to keep the whole picture in mind across handoffs. In an agentic life cycle, those separations become liabilities.

The new loop looks like this: clarify the outcome, state the non-goals, define constraints, turn acceptance criteria into executable checks, let the agent generate within those boundaries, validate aggressively, release behind flags, observe behaviour and feed learning back. Written out that way, it sounds straightforward. But what changes underneath is significant.

Requirements become executable

In the classic model, a product manager could write “the user should be able to filter by date range” and leave it to an engineer to figure out what edge cases that implies. An agent will implement exactly what the spec says and nothing else. If the spec does not say what happens when the start date is after the end date, the agent will make a choice, probably the wrong one. Acceptance criteria that a human engineer might interpret charitably become literal contracts. The PM who writes vague requirements is no longer slowing down an engineer. They are misconfiguring a system.

QA moves earlier and becomes continuous

Waiting until code reaches a CI pipeline to check whether it behaves correctly is no longer viable when an agent can generate five pull requests before lunch. Test coverage, static analysis and behavioural validation need to run inside the development loop, not downstream from it. The QA engineer’s role does not disappear; it shifts from executing test plans to designing the automated checks that run constantly.

Architecture becomes documentation

Agents do not accumulate institutional knowledge across sessions. Every time an agent starts a new task, it reads your code the way a new hire would on day one, but without the benefit of asking questions or sitting in on meetings. If your system boundaries are implicit, your naming is inconsistent, or your data flows require tribal knowledge to trace, the agent will get it wrong. Teams that have invested in clear architecture, honest interfaces and explicit conventions get better output, not because the model is smarter, but because the agent has a more accurate surface to reason over.

The product manager role shifts most

When implementation was slow and expensive, PMs could afford to iterate requirements based on what engineers built. In an agentic workflow, the cost of generating an incorrect implementation is nearly zero, but the cost of reviewing and correcting it is not. The bottleneck moves from “getting engineers to build it” to “specifying it correctly the first time.” That is a different skill set and not every product organisation has made that transition yet.

Engineering leadership, in turn, moves from counting outputs to designing the system that produces trusted change in the first place.

What leaders should measure now

Agentic coding makes traditional output metrics even less useful than before. Code volume inflates too easily. Developers report that 42% of the code they commit is already AI-assisted, a share expected to rise by more than half by 2027. Measuring how much code the team shipped tells you almost nothing about whether that code is safe or correct.

Better metrics reflect trust, flow and business outcome:

Time-to-verify: tracked separately from time-to-generate. Measure from PR open to passing all human review gates; if that number is not falling as generation rises, you have a bottleneck. Without this separation, verification debt accumulates invisibly.

AI-assisted incident rates: Tag incidents in post-mortems to show whether AI-assisted code was involved; even a rough heuristic helps build the feedback loop. Without this, teams have no signal that their review process is actually working.

Rollback frequency: Deployment tooling already captures this. Segment AI-assisted deployments as a separate cohort to see whether the pattern differs. The cost of shallow reviews stays hidden until it becomes painful.

Time from spec to safe release: Measure from the moment acceptance criteria are written and frozen to the moment the feature ships behind a flag in production. This is the true measure of whether agentic tooling is accelerating the team or just moving the bottleneck. It is also the number that translates most directly into time-to-market and competitive positioning.

If AI makes software production easier, competitive advantage shifts toward organisations that can convert more ideas into trusted releases without losing control. In that world, verification capability, architecture quality, governance maturity and security discipline become revenue-relevant capabilities, not just engineering concerns.

The most useful conclusion

The future of software development is not “AI writes code while humans watch.” It is “humans design systems that make machine-generated change trustworthy.”

The organisations that benefit most will not be the ones running the highest volume of generated code. They will be the ones with the strongest translation layer between idea and production reality.

Code is getting cheaper. Judgment is not. Verification is not. Safe release is not.

Getting ahead of that shift is a strategic decision, not a technical one.

At IBM iX, that decision is already made. We are embedding agentic coding into how we deliver, using IBM Bob as part of our tooling stack. A dedicated working group ensures that adoption stays governed, measurable and repeatable across teams.

The goal is to deliver faster without trading away the trust and quality that clients depend on.

Is your organisation ready for the next generation of software development?

This might also interest you