The Agentic Harness: Designing Control in Software Delivery

Author: Dušan Šalović, Data Scientist

|

2.7.2026

The model is not the differentiator. The harness around it is: the guides that steer the agent before it acts and the sensors that catch what the guides missed. Get the architecture wrong and the agent has no stable surface to reason over. Get verification wrong and you will accumulate technical debt faster than you can deliver. This is the engineering layer that most AI coverage skips. Here is what it actually takes.

Agentic coding tools are already in your teams’ hands, whether or not your organisation has a formal policy. The question is not whether to engage with them but how to build the harness around them: the guides that steer the agent before it acts, the sensors that catch what the guides missed and the architecture that keeps machine-generated change from outpacing your ability to understand it.

This piece is about that practical layer. Not the hype, not the headlines. The engineering. That engineering has a name: the harness.

The harness is the differentiator, not the model

The most important reframing for engineers who work with agentic systems is this: an agent is not just a model. It is a model plus everything built around it to shape how it operates.

That outer layer is called the harness, and it can be broken down into two categories:

Guides are feedforward controls that steer the agent before it acts: specifications, constraint documents, coding conventions, architectural rules. These are the instructions the agent follows.
Sensors are feedback controls that monitor the agent after it acts and help it self-correct: tests, linters, output validators, static analyses. These are the checks that catch what the guide has missed.

The SWE-bench data makes the point clear:

23%

A basic agent scaffold on standard software engineering tasks

45%

An optimised scaffold running the exact same model or higher

That is a 22 percentage point gap without changing or retraining the model. The underlying model matters less than its surroundings. The harness is what separates teams that get reliable output from teams that do not.

Spending hours prompting the model does get you a long way. Spending hours designing the constraints and checks around it gets you even further.

Architecture for AI readiness: sinks, not pipes

Agentic coding works surprisingly well in some codebases, but creates chaos in others. The reason is almost always architectural. Architecture is the guide layer of the harness. The structure of the codebase is the first constraint the agent works within, whether you designed it that way or not.

Think of the components in your system as either pipes or sinks.

A Pipe is a component whose action triggers a cascade of hidden consequences. Touch it at one point, and something breaks three services down the chain.
A Sink is a component whose effect is contained and understandable. You can understand what it does without tracing the ripple effects through the rest of the system.

In a human-only world, teams muddle through pipeline-heavy architectures because engineers accumulate context over time. In an agentic world, that hidden complexity quickly becomes expensive. An agent has no persistent storage for its codebase across sessions. It cannot ask a colleague, check prior context or remember what caused the incident last quarter. It works from what is visible and explicit within the current context window. A pipeline-heavy codebase does not only slow down an agent. It gives it a plausible-looking surface to reason over while hiding the actual consequences of change.

Every new session starts without a solid understanding of the codebase. If tracing what a single component does requires tracking side effects through three other services, that codebase is not AI-ready. And it was probably not pleasant to work with before AI either.

What AI readiness requires in practice

The phrase “architecture is the prompt” captures this well. If your system is full of hidden dependencies, implicit side effects, leaky abstractions and tribal knowledge, the agent has no stable surface to reason over. Clean boundaries and honest interfaces allow the agent to operate safely.

AI readiness requires in practice

deep modules with clear, truthful interfaces
low coupling and high cohesion, enforced rather than aspirational
integration tests that capture real behaviour at the seams, not just unit tests on internal components
a repository structure that communicates intent to a reader with no prior context
explicit documentation of what a component does not do, not just what it does

None of these are new principles. However, agentic coding reinforces them more aggressively than before. It penalises ambiguity. It punishes invisible coupling. It exposes weak testing and sloppy permissions. It is less a substitute for engineering discipline as it is a stress test of whether that discipline actually exists.

Verification moves inside the loop

When code becomes abundant, verification becomes the scarce resource. This is where the sensor layer of the harness does its work. It is also where most teams are getting hurt right now. Sonar’s “State of Code Developer Survey 2026” found that:

96%

developers do not fully trust AI-generated code to be functionally correct

48%

say they always check AI-assisted code before committing it

38%

say reviewing AI-generated code takes more effort than reviewing code written by a human colleague

The effort has not disappeared. It simply moved downstream, where it is harder to catch.

In a microservices environment, a seemingly isolated change to a single backend service can cascade by breaking downstream services or corrupting shared database schemas. An agent-assisted developer might generate five or six pull requests per day. If each one requires thirty minutes of validation in a shared staging environment, most of that developer’s day goes into managing a deployment queue, rather than building software.

The agent accelerated generation. The infrastructure did not keep up.

If output grows faster than validation capacity, teams accumulate verification debt. That debt manifests itself in delayed releases, superficial reviews, an increasing risk of incidents, and a false sense of speed.

A practical routine before merging:

make the change.
test it against realistic conditions, not just asking "did it run".
read it line by line before accepting it.
look specifically for null checks, edge cases, error handling, and race conditions.
confirm nothing outside the intended scope was touched.

Verification now needs a broader definition than simply a passing test suite. It includes executable acceptance criteria, static analysis, dependency and secret scanning, policy gates, human approval paths, environment separation, observability, rollback readiness and production monitoring. A fast team in the AI era is not the team that generates the most code. It is the team that can validate changes quickly, repeatedly and safely.

Security: When Agents Act, the Risk Surface Changes

The security chapter is not optional. Agentic systems do not just generate text. They act. The harness has to account for this. Guides and sensors are not enough if the agent has unconstrained access to tools, environments and external systems.

Two incidents illustrate what that means in practice.

the Replit incident (July 2025)

an AI agent deleted a live production database containing records on more than 1,200 executives and 1,190 companies during an active code freeze, after being told eleven times in explicit terms not to make any changes. The agent initially told the user rollback was impossible, which turned out to be false. Once an agent can operate across tools and environments, the risk surface shifts from "bad suggestion" to "unsafe action with no easy undo."
the Cline supply chain attack (February 2026)

a prompt injection vulnerability in an AI-powered issue triage workflow—dubbed "Clinejection" by security researchers—was exploited to steal the project's npm publish token. The attacker used it to push a modified version of the Cline CLI that silently installed a second AI agent on every developer machine that updated during an eight-hour window. Around 4,000 downloads occurred before the package was pulled. The entry point was not a compromised dependency or a stolen password. It was a crafted GitHub issue title that the AI triage bot read, interpreted as an instruction and executed. When agents are allowed to act on systems, prompt injection stops being a quirky model failure. It becomes an execution path into your infrastructure.

Both incidents share a pattern worth treating as a checklist. Risk compounds when three conditions overlap:

the agent has access to
private or sensitive data.
it processes content from
sources it cannot fully verify.
it has the ability to communicate externally
or trigger irreversible actions.

Before granting an agent broader permissions, confirm you have controls in place for all three conditions, not just one.

One more note on vendor tooling: claims about enterprise data isolation, content filtering, and vulnerable code scanning matter, but should be treated as partial mitigations, not complete coverage. Read the fine print. Certain modes have exceptions and excluded content can still leak indirectly through semantic context. Policy cannot rely on a vendor checkbox alone.

Safe habits inside the development loop

Beyond architecture and security posture, reliable AI-assisted development depends less on prompting tricks and more on a consistent process:

back up first

before any session touching production-adjacent code, establish a known-good restore point.
start with a clear scope

specify the task, including what is explicitly out of scope. Vague starting conditions produce vague results.
state what must not change

naming constraints explicitly is more reliable than expecting the agent to infer them.
break complex tasks into steps

ask for one verifiable change at a time, rather than a major rewrite in a single pass.
start a fresh session for major changes

context drift across long sessions is real. A fresh start with explicit context is often more reliable than picking up where you left off.
never accept code you do not understand

passing the tests is not understanding. If you cannot explain what the code does and why, you are not done reviewing it.

These habits, practised consistently, are also the foundation of a more advanced pattern. Loop engineering is the practice of writing programs that prompt agents on a schedule, with explicit goals and stopping conditions, rather than prompting them by hand. Where guides and sensors constrain one session, a loop encodes those constraints into a self-running system: a trigger starts the agent, a verifiable goal defines when it stops, and guardrails like iteration caps and cost budgets prevent it from running indefinitely. What the loop is allowed to do in your environment still requires the same architectural and security thinking as any other agent session.

What the engineering role actually looks like now

The engineer who creates the most value in an agentic environment is not the fastest prompt writer. It is the engineer who can define the problem clearly, decompose it cleanly, constrain the blast radius and design a trustworthy path to production.

Engineers who try to stay inside every loop become the bottleneck. The ones who learn to manage the loop without micromanaging it get the real leverage. That is not a diminishment of the role. It is an expansion of responsibility. Implementation becomes orchestration. Individual contribution scales.

For teams thinking about early-career development:

If straightforward implementation work is increasingly automated, junior engineers may get fewer of the small repetitions that used to build intuition, such as writing simple features, fixing small bugs and reading other people’s working code. Those repetitions were the learning.

Teams will need to design deliberate paths around debugging, production reasoning, incident analysis, and system-level thinking, rather than assuming those skills emerge naturally from ticket processing.

The engineers who progress the fastest will be the ones who learn to read generated code critically, reason about architecture trade-offs and take ownership of system behaviour, not just task completion. Those same engineers will increasingly be expected to design the automated loops that run their agents, not just the sessions themselves.

The engineering role is not being automated away. It is being redefined, and not gradually. This is a structural change in where engineering value lies. The primary question is no longer how to write good code. It is how to create the conditions under which good code can be generated, verified and trusted. Judgment, architecture, and verification discipline are becoming the core of the work. That is not a smaller role. It is a more consequential one.

Summary

The goal is not bureaucracy. The goal is proportional control. Build the harness before you expand the autonomy. Make the architecture explicit before you let the agent assume it. Verify inside the loop, not after it. Define the blast radius before the first commit.

The organisations that get this right will not be the ones running the most AI. They will be the ones whose AI-generated changes can be trusted, traced, rolled back and learned from.

Speed without that trust system is not an advantage. It is deferred risk.

At IBM iX, we see one pattern separate the teams that fully realise the benefits of agentic coding from the ones that stall: the harness came first. The engineering foundation is what makes the business outcome possible.

Abstract generative AI symbol with interconnected circles on a vibrant gradient background, representing intelligent content creation technologies.

AI Solutions: From Strategy to Implementation

We develop data and AI solutions that integrate seamlessly into your teams, workflows and systems – human‑centred, responsible and designed to unlock real value.

AI Solutions

Share

Contact us

to shape the future of Agentic Control in Software Delivery – together.

About the Author

Dušan Šalović is a Senior Data Scientist at IBM iX, specialising in generative and agentic AI. He supports large enterprises across retail, e-commerce, consumer goods and pharma in designing and delivering AI-driven transformation, from strategic prioritisation to scalable implementation. With expertise spanning advanced AI architectures, cloud platforms and enterprise integrations, Dusan helps organisations turn emerging technologies into tangible business impact. At IBM iX, he also contributes to the development of new offerings in Agentic AI and mentor teams on applied AI excellence.

Connect on LinkedIn

Dušan Šalović

Senior Data Scientist

This might also interest you

About us

Join the Women in Tech Community

About us

Join the Women in Tech Community

The Agentic Harness: Designing Control in Software Delivery

The harness is the differentiator, not the model

Architecture for AI readiness: sinks, not pipes

What AI readiness requires in practice

Verification moves inside the loop

Safe habits inside the development loop

What the engineering role actually looks like now

Summary

AI Solutions: From Strategy to Implementation

Contact us

About the Author

This might also interest you

Barbecue & Beats to kick off K5

IBM iX goes Agentforce World Tour 2026

Salesforce Summit Wien

Women in Tech Vienna – Shift happens: Between AI & Human Connections

IBM at AI Summit 2025

UNLOCK the Power of Retail Media

Women in Tech Hamburg Beyond the Algorithm: AI, Risks & Human Values

Women in Tech Munich AI vs. Human Creativity

Adobe, AWS & IBM iX Business Breakfast: Digital experience in the age of AI

IBM Media Day