ARTICLE
AI Agent Governance: 10 Takeaways from Engineering Leaders on Agentic Development
Discover 10 key insights from engineering leaders on AI agent governance and agentic development. Learn how to scale AI tools effectively.

Guy Podjarny

Agentic development starts as a productivity story, but at scale it quickly becomes a governance problem.
At AI Native DevCon London, we hosted a set of Chatham House roundtables with senior engineering leaders from a range of organizations. I won’t attribute comments to individuals or companies, but the patterns were strikingly consistent: agentic development is moving from an individual tooling conversation into an enterprise operating model question.
The first wave was familiar enough: devs tried GitHub Copilot, Cursor, Claude Code, Codex, Devin and similar tools, and many found obvious value. They wrote code faster, produced tests faster, explored ideas faster, and in some cases revived work that had been sitting in the backlog because it was too costly to attempt.
The interesting question is what happens once agents stop being a personal accelerator and start touching the way an engineering organization works. At that point, the problem shifts from “does the tool help?” to “can we make this safe, repeatable, measurable, and economically sane?”
That shift is why I think the most useful frame is AI agent governance. It means the systems that let teams move faster without losing control, including identity, permissions, context, evals, model routing, cost visibility, policy, ownership, and feedback loops.
On a side note, you can hear my talk “skills are the new code”, where I share my personal framework towards agent governance and a proposed solution towards enterprise agent enablement.
Let’s now look at the 10 main takeaways from our roundtable.
1. Agent adoption starts with enthusiasm, but scaling it requires deliberate rollout
Most organizations seem to start the same way: give developers access to AI coding tools and let the motivated teams run.
This is the right instinct at the start, because the space is moving too quickly for a purely top-down programme to discover all the useful patterns. Bottom-up energy creates learning quickly. It also surfaces where agents are genuinely useful, rather than where a transformation deck hoped they might be.
But it also creates fragmentation.
Different teams adopt different tools, build different prompts, store skills in different repos, and develop different assumptions about what is safe enough to automate. One group may use agents for test generation, another for code review, another for product specs, another for deployment automation. Before long, the organization can have dozens of useful experiments that don’t yet add up to a system.
The trick is not to kill the experimentation but to create a path from local learning to shared practice.
The first wave of adoption was mostly about individual productivity. The next wave has to be about repeatable, governed team workflows. That means rollout phases, clear ownership, a view of which tools are approved for which classes of work, and a way to convert the best local experiments into standards others can reuse.
This is a familiar pattern from cloud and DevOps: the early adopters prove what is possible, then the platform forms around them. The difference this time is that the cycle is much faster, and the unit being governed is not just infrastructure or code, but the agentic workflow itself.
2. The strongest ROI case is not productivity. It is increased ambition
A lot of the public conversation around AI in software development is still framed around productivity.
- Can engineers do the same work faster?
- Can teams ship more with the same number of people?
- Can the business do the same with less?
Many business leaders will look for savings, and it would be naive to pretend otherwise. It is also worth acknowledging that some of this is hard to say openly in a group setting, however intimate. In practice, some leaders will seek to capitalize on productivity by doing the same work with fewer people, reducing costs, or slowing future hiring.
But the roundtables reinforced a concern I have had for a while: if we hype AI productivity too aggressively, we may slow adoption by making people fear what adoption means.
If the internal narrative is mostly about headcount reduction, people will defend themselves. They may hide the real gains, avoid showing how much faster a workflow became, or keep their best agent patterns private because sharing them feels like making the case for fewer people.
That is not a cultural foundation for transformation. A better frame is ambition.
Agents make prototypes cheaper. They let senior engineers explore ideas that have been trapped behind calendar time. They change the build-versus-buy equation, because a capability that once required an RFP and a vendor project may now be plausible for a small internal team to try.
This is the version of the story that leaders should emphasize publicly and internally. The question should be “what can we now attempt that we previously would not have attempted?”
That framing does not deny the economics but it does point them in a healthier direction. The long-term narrative should not be about lowering the floor, but about raising the ceiling. If AI is understood as a way to increase ambition rather than quietly reduce capacity, more people will lean in, and the organization is more likely to discover the compounding benefits.
3. Why context engineering is becoming a first-class engineering asset
Agents are only as useful as the context they can apply.
That context includes specs, tests, policies, architecture guidance, product requirements, runbooks, coding conventions, incident patterns, security rules, and domain language. Most organizations already have some of this knowledge, but it is rarely as clean or discoverable as the agentic era requires. Some of it lives in docs, some in Slack, some in tickets, some in code comments, and a great deal of it lives in people’s heads.
In the pre-agent world, weak documentation was annoying but survivable. A dev could ask the person who knew the system, or learn the convention through review comments. In the agentic world, missing context becomes a direct limit on what the agent can do.
This is why skills matter.
Skills turn tacit engineering knowledge into reusable context that agents can apply. They are not just prompts with nicer packaging; they are a way to encode how an organization wants work done, from API usage to security checks to writing style to deployment workflow.
This is also where Tessl’s view of agentic development comes in. If agents are going to participate across the SDLC, organizations need a way to collaboratively develop, discover, evaluate, and improve the context those agents rely on. Skills and evals are two sides of that problem: skills package the knowledge agents need, while evals show whether that knowledge actually improved the outcome.
Once you see context this way, and move the mental framework from SDLC → CDLC (Context Development Lifecyle illustrated above), documentation stops being a hygiene task and becomes infrastructure. The teams that write down how they work, keep that knowledge current, and make it available to agents will have a structural advantage over teams that treat context as tribal knowledge.
4. Cost matters, but the wrong framing leads to the wrong decisions
Model costs are becoming real.
In the earliest adoption phase, many teams did not feel the cost directly. Usage was limited, pilots were small, and in some cases vendor pricing or subsidies made the economics look less material than they would eventually become. But that phase is ending…
As agents become part of daily development, cost shows up in more places: large context windows, repeated attempts, long-running tasks, model upgrades, autonomous workflows, and agents that call other tools in loops.
A prompt that is cheap as a one-off experiment can become expensive when it runs across hundreds of devs every day, each with a large repo context, multiple retries, and a frontier model selected by default.
This is why AI FinOps needs to become a real discipline!
The cloud analogy is useful (but only up to a point). In cloud, cost followed infrastructure usage. In AI, cost follows cognition-like work: reasoning, context, retries, tool calls, evals, and orchestration. That makes it harder to map spend to value, because the bill may be attached to a workflow that saved a week of engineering time, avoided a security incident, accelerated a customer feature, or simply produced three bad attempts before a human rewrote it.
Even in the few weeks since these roundtables took place, awareness of AI costs has increased substantially. That will continue as agent adoption broadens. Leaders will need visibility into where spend goes, which models are used for which tasks, where context is being wasted, and which workflows justify their cost because they improve delivery, quality, risk, or ambition.
The wrong answer is to suppress usage blindly. The better answer is to manage it deliberately: model routing, caching, context discipline, budgets, observability, and evals that help teams know whether cheaper options are good enough.
5. Model routing will be part of AI agent governance
There was broad agreement that not every task should use the largest or most expensive frontier model. A good example is how we’ve recently switched Tessl’s default eval model from Sonnet 4.6 to GLM 5.1. The principle is easy to accept, but the operational question is harder: how does an organization know which model is good enough for which job?
The answer will not be one model - it will be routing.
Frontier models will remain valuable for ambiguous reasoning, complex planning, and tasks where the cost of a poor answer is high. Smaller models may be better for bounded, repeatable work where the task is well specified and the output can be validated. Open models have become capable enough that, for many narrow tasks, they may be more than sufficient and much cheaper. Local or private deployments may make sense when data sensitivity, latency, or control matters more than raw capability.
The risk is that every team solves this independently. One team standardises on Claude Code, another on Cursor, another on Codex, another experiments with open models, and the organization ends up with duplicated eval work and no shared view of quality, cost, or risk.
This is why model routing belongs inside AI agent governance. The decision should depend on the task, the data, the quality bar, the blast radius, the cost, and the validation available. The real capability is not choosing a favorite model; it is building the measurement and routing layer that lets teams use the right model for the right task.
The important test is not whether a smaller model works once. It is whether it meets the quality bar repeatedly under realistic inputs, with the context and constraints the workflow will actually have in production.
6. Why AI agent governance is becoming the enterprise security bottleneck
Cost is rising, but security is still the concern most likely to limit enterprise adoption.
The risks are easy to understand once you stop thinking about agents as chatbots and start thinking about them as actors inside the development environment. A coding agent running with a developer’s credentials may be able to access internal repositories, package registries, logs, deployment systems, tickets, customer data, and production-adjacent systems. If that agent can browse the web, install packages, execute scripts, or move data between systems, the blast radius changes materially.
This does not mean the right answer is to block agents. It means the trust model has to mature.
One useful mental model from the roundtables was to treat agents like new employees or interns. You would not give an intern every credential and full production access on day one. You would start with a defined scope, observe their work, review their decisions, and expand trust over time. Agents need a version of the same path.
That path includes identity, entitlements, sandboxing, audit trails, tool restrictions, policy enforcement, and incident response. It also includes a decision about whether the agent acts as the human, as a separate identity, or as a constrained delegated identity. Without that, security teams are left with a choice between approving risky autonomy or blocking usage entirely.
There is also an important cost dynamic here. In many enterprises, security constraints currently limit usage, which means they also shield the organization from the full cost curve. If only a small number of teams can use agents in limited ways, the token bill remains constrained. Once identity, permissions, sandboxing, and audit controls mature, adoption will expand, and costs that were previously hidden by limited rollout will become much more visible.
So security may be the immediate bottleneck, but cost is waiting behind it.
7. As coding gets cheaper, alignment becomes the bottleneck
Agents reduce the cost of implementation, but that does not mean the organization automatically moves faster. It means the bottleneck moves.
If code becomes cheaper to produce, the relative cost of everything around code increases: product clarity, architecture decisions, security approvals, change management, compliance, release coordination, and cross-team alignment. Several leaders described a version of the same pattern, where teams can now build faster than the organization can decide, approve, or absorb.
This changes the economics of software delivery.
For years, engineering organizations optimised heavily against duplication. Build the shared capability once, coordinate across teams, extract commonality, and reuse the platform. That instinct still matters, but the trade-off changes when implementation becomes cheaper and coordination remains expensive. In some cases, duplicating a capability inside a clear domain boundary may be more effective than forcing multiple teams through a shared dependency.
This is not an argument against architecture. It is an argument for architecture that recognises where the bottleneck has moved.
Agentic development works best when work has clear ownership, limited dependencies, strong tests, and a constrained blast radius. It struggles when success depends on many teams agreeing before anything can move. The practical leadership question is therefore not just “how do we make developers faster?” but “what will become the constraint once they are?”
8. Enterprise AI agent governance needs explicit, automated controls
Most organizations already have controls for software delivery: code review, change management, access approval, security review, compliance checks, deployment gates, incident response, and audit logging.
The problem is that many of those controls were designed for humans.
They rely on judgement, institutional memory, informal interpretation, or manual process. People know what the policy really means. Reviewers know when something feels risky. Security teams know which exceptions matter. Auditors accept a workflow because they recognise the human pattern behind it.
Agents force these assumptions into the open.
If a policy is ambiguous, an agent cannot reliably follow it. If a control depends on a human noticing something subtle, it may not scale. If a process is only documented in training material, it is not agent-ready. If an approval exists mainly so another team can find out what is happening, it may need to be redesigned.
This is governance debt, and agentic development exposes it.
The answer is not to invent an entirely new governance model from scratch. It is to make existing controls explicit, automated, and measurable. That means clearer policies, better identity systems, structured workflows, automated checks, traceability across agent actions, and evals that test whether the agent is actually following the standards it was given.
You cannot govern what you cannot see, and you cannot improve what you cannot evaluate. That is why skills, observability, and evals belong in the same conversation as security.
9. Standardization matters, but premature standardization can kill learning
Every organization adopting agents faces the same tension: how much freedom should teams have?
Too little standardization creates chaos. Too much standardization too early kills discovery.
The roundtables surfaced many examples of parallel experimentation: multiple teams creating skills, multiple repositories collecting prompts, different approaches to code review, different rules for test generation, different ideas about how much autonomy is acceptable. Some duplication happened because teams wanted control. Some happened because they did not know someone else had already solved the problem.
Early duplication is not always bad. It can be how teams learn. It can reveal which patterns work across different environments, and it can create local champions who are credible because they solved a real problem rather than followed a mandate.
But local learning only becomes organizational advantage if it becomes visible.
The healthiest pattern is to let teams experiment, make the work discoverable, then converge deliberately. That requires communities of practice, internal demos, shared repos, skill registries, lightweight review processes, and a platform team that sees its job as amplifying the good patterns rather than suppressing all variation.
The question is not whether to standardise. The question is when. Experimentation should be broad while the organization is learning. Production patterns should become intentional once that learning starts to repeat.
10. The talent model is shifting from writing code to directing, verifying, and integrating work
Agentic development changes what great engineering looks like.
It does not remove the need for engineering skill. If anything, judgement becomes more important. But the work shifts from producing every line of code to defining the task, supplying the context, delegating to agents, verifying the output, integrating the result, and knowing when something is subtly wrong.
Some engineers will thrive in that environment. They are comfortable with ambiguity, orchestration, and context switching. They can hold the goal in their head while inspecting partial outputs. They know how to specify, review, and correct without needing to manually produce every detail.
Others may struggle, especially if their identity is tied primarily to deep, single-threaded implementation or writing every line by hand. That style of work will not disappear, but it will become part of a larger system in which humans increasingly design and supervise the machinery of software creation.
One analogy that came up in the discussions was the shift from building the furniture to building or operating the factory that builds the furniture. Another is management: working with agents can feel like defining work, delegating it, reviewing the output, and intervening when needed.
That does not mean every engineer becomes a people manager. It means more engineers will need management-like skills for systems of agents: specification, delegation, verification, feedback, and accountability.
The emerging role is less “the person who writes all the code” and more “the person who ensures the right system gets built.”
Closing thoughts: What are the main blockers for enterprise agent adoption?
| Blocker | What leaders are seeing | Why it matters |
|---|---|---|
| Security | Agents inherit human permissions, touch sensitive systems, browse the web, or act without enough containment. | It limits rollout today, but also defines the trust model for everything that follows. |
| Cost | Usage grows through larger context windows, repeated runs, frontier models, and always-on workflows. | AI FinOps becomes a durable discipline, not a one-off optimisation project. |
| Model deployment | Frontier models are powerful, but many enterprise tasks may be better served by smaller, open, or specialised models. | The capability to route work across models becomes more strategic than picking a single model. |
| Context | Agents need specs, policies, tests, docs, runbooks, examples, and domain language to do useful work reliably. | Context becomes infrastructure, and weak documentation becomes an adoption blocker. |
| Alignment | Implementation gets cheaper, while decisions, approvals, architecture, and cross-team coordination still move at human speed. | The bottleneck moves from writing code to agreeing what should be built and how it should fit. |
Most of the roundtable discussion reinforced what enterprise leaders already feel: agentic development is useful, the tools are improving quickly, and adoption is uneven.
From my perspective, three novel points stood out:
- Hyping AI productivity can hinder adoption. If the story inside a company is mostly about doing the same work with fewer people, employees will quite reasonably hear a threat. A better transformation narrative is ambition: agents let teams attempt more, build more, explore more, and pursue work that previously looked out of reach. This shift turns the questions around and focuses on nurturing an enterprise culture directed at empowering devs (not scaring them!).
- We need AI FinOps! Managing AI costs is not a short-lived problem that disappears once models get cheaper. As agents become embedded in development workflows, usage expands, model choice diversifies, and context-heavy workflows become normal. Cost needs to be observed, managed, and tied to value.
- In the enterprise, the security bottleneck currently shields organizations from the full cost curve. Many companies are not yet seeing the true cost of broad agent adoption because security constraints are limiting usage. Once the controls mature, adoption will expand, and the cost question will become much sharper.
The next generation of engineering teams won’t be defined by how many agents they use, but by how well they govern them.
At Tessl, this is the approach we’re building towards: agent governance rooted in context, evaluations, and security. A practical place to start is to point your coding agent at the Tessl CLI and ask it to evaluate your context. It is a simple way to see assess the quality of your context, understand where the gaps are, and think what governance will need to cover next.
COPY & SHARE

Guy Podjarny
Founder of Tessl and Snyk, angel investor, ex-Akamai CTO, and co-host of the AI Native Dev podcast.
READING
·
0%
IN THIS POST
COPY & SHARE

Guy Podjarny
Founder of Tessl and Snyk, angel investor, ex-Akamai CTO, and co-host of the AI Native Dev podcast.
YOUR NEXT READ
Securing the Coder, Not the Code: Notes on Agentic Development and Security
Agentic development shifts security focus from code to coder, requiring new tools and metrics as AI agents rapidly create and modify software.

Guy Podjarny



