Event — Securing the Agent Skill Supply Chain | Virtual | June 17Register
Logo
Registry
EnterpriseCareersDocsRegistry

ARTICLE

Securing the Coder, Not the Code: Notes on Agentic Development and Security

Explore agentic development's impact on software security. Learn why securing coders, not code, is key in AI-native environments. Read more now!

Guy Podjarny

·21 May 2026·16 min read

A few years ago I left Snyk day-to-day to start Tessl, because I'd fallen in love with AI and was convinced that the way we build software was about to change in a way that broke most of our security assumptions. I still believe that. The talk I gave recently at Snyk’s security conference was an attempt to make the case concretely, and this post is the written version of that talk for anyone who wasn't in the room.

tl;dr - as agents create and delete code at unprecedented speed, the job of us, humans, is not to secure the code, but to get agents to secure it as they build. This is a material shift, requiring new tools, approaches and metrics.

Here's what I think it means, and what we should do about it.

From AI-augmented to AI-native

The first wave of AI in development was augmentation, pioneered by Copilot and then Cursor, where you wrote code and AI helped you write it faster. The second wave is delegation, where you ask an agent to do a task and it goes off and tries to do it, which means the agent is the developer and you become the reviewer, the prompter, and the auditor of intent.

This isn't a controversial statement anymore, agentic development is where everything has consolidated, and while I'll focus on software because that's the canary in the coalmine, every form of knowledge work is heading the same way.

The productivity gains are there, but agentic development changes the unit of work, the determinism of the output, and the rate of change of the development process all at once, and each of those shifts breaks a different assumption we built our security practices around.

  • It's non-deterministic. Compile once, compile again, same result is no longer how things work, because the same prompt produces different output and we have to get statistical about it.
  • The unit of software is changing. What we secure used to be the implementation, and now it's increasingly the instructions: skills, prompts, context, which represents a new unit, a new attack surface, and a need for new tooling.
  • It moves faster than ever. Development cycles are compressing, security has to keep up, and the only way to keep up with agents is for security itself to become agentic.

Challenge 1: non-determinism means you have to measure

If you came up through DevOps, you know the ethos: if it moves, measure it, and if it doesn't move, measure it in case it moves. Servers were the most statistical creature we'd ever shipped to production, and the answer was always the same, which is that you can't optimise what you can't measure.

Agents are a different category of statistical altogether, because the non-determinism isn't just at the request layer, it compounds across model output, tool selection, retrieval, retries, and multi-step planning. This means a measurement approach that worked for servers won't catch any of it.

The way you do this in the AI world is evals, where you define a task, define what good looks like up front, and run the agent against that task many times, scoring the runs.

I ran this on ElevenLabs as an example, given that ElevenLabs is a brilliant London-based text-to-speech lab that recently launched a music generation API. I gave the agent a task to build a dynamic soundtrack generator for a game studio, scored each run, and ran it ten times across five scenarios.

The results were noisy across the board, with absolute scores coming in low because the music API is new and underrepresented in the model weights, and the variance was the bigger story: the same task, the same prompt, ten runs, materially different outcomes each time.

The most common answer today is context, and the most common unit of context is skills, which are Markdown files (with a bit of structure) that give the agent the knowledge it needs to do a task well.

Knowledge is not the same as intelligence, and a model can be highly capable in the abstract while still failing a task because it doesn't know the specific API surface, the internal convention, or the deprecated import path it needs to avoid. From the outside that failure looks identical to a model that simply isn't smart enough.

We took an ElevenLabs music skill that explained the API, ran the same evals, and the agent went from a 50% average without the skill to 98% with it, with the variance compressing and the task actually getting done.

CodeGuard, and why more context isn't better context

Same idea, but applied to security. CodeGuard is a project Cisco built and donated, which packages OWASP security rules into a skill that helps agents write more secure code, so I created six evaluation scenarios focused specifically on authorisation and scored the agent's output with and without it.

Without CodeGuard, the agent scored 48% on the authorisation scorecard, and with CodeGuard it improved by nearly 1.78x, which is a meaningful lift, but the second experiment was the more interesting one.

When I stripped CodeGuard down to just the authentication and authorisation content, roughly 5% of the original skill, and re-ran the same evals, the score jumped to 98%. This means less context, scoped tightly to the task, beat more context by a wide margin.

More context is not necessarily better context, because if I sat you down and told you 100 things, no matter how brilliant they were, you'd give less attention to each one than if I told you three. Attention is a scarce resource for humans and models alike, which means choosing what to say and what to leave out is part of the craft.

The same pattern shows up when you vary the agent rather than the skill, where identical instructions run through Opus, Sonnet, Codex, and Cursor produce materially different scores. Context isn't just a property of the skill, it's a property of the skill-agent pair, and your context needs to be tuned for the agents you're actually using.

The Context Development Lifecycle

When you start treating skills as something you build, evaluate, optimise, distribute, and observe in production, you have a lifecycle, and we've been calling this the Context Development Lifecycle (CDLC), which I think sits alongside the SDLC rather than inside it.

The CDLC is where humans live, building the context that guides the agents, which is then applied across the SDLC where the agents do the work.

The observe step matters, because evals are like tests in that they're useful but they go out of sync with reality if you don't also watch what's actually happening in production. If you want a loop that closes: build, evaluate, distribute, observe, learn, improve.

The same skill, the same instruction, can be applied end to end, the same way a great dev uses the same knowledge to spec a feature, write the code, review it, ship it, and troubleshoot the incident. With skills representing that knowledge, this means that from a security lens the same skill can secure the writing step, the audit step, and the incident response step.

Challenge 2: skills are a new unit of software

The more you live inside the CDLC, the more obvious the second challenge becomes which is that we're talking about skills as if they were documents when they aren't. Skills are stored as Markdown, edited in the same tools as a Confluence page, and reviewed like prose, but at runtime an agent executes them as instructions, which puts skills much closer to code than to documentation, and the security model has to follow that reality rather than the file extension.

That means they have all the failure modes software has, plus some new ones.

  • Malicious skills. Snyk and others have documented attackers seeding skills with instructions designed to make the agent do something it shouldn't. We've seen examples in our own registry of skills that look like standard blockchain API helpers but with one step that quietly downloads a password-protected zip, which is detectable if you're looking.
  • Vulnerable skills. A skill that asks the user to put API keys directly inside the prompt, or makes MCP calls with plain vanilla tokens, is insecure by design even if the author meant no harm.
  • Negligent skills. Not an industry term, but it should be, because these are skills that lack basic safety instructions like "check this into a private repo, and if you can't commit, fail, don't exfiltrate the work some other way,". We've all seen agents in reward-seeking mode, keen to please and willing to delete files, escape sandboxes, do whatever it takes to complete the task. Negligence skills are the ones that don't tell the agent where the guardrails are.
  • Supply chain. How people consume skills today is by downloading Markdown files from random GitHub repos and checking them into their own, often in seven different folders to support seven different agents, which is fine for now but is going to bite teams eventually.

Once you treat skills as a software artifact rather than a document, most of the framing problem solves itself, because versioning, dependency resolution, provenance, scanning, signing, and lifecycle management are problems the package ecosystem has been working through for two decades, and a lot of the answers port over with light adaptation.

What enterprise-grade skill governance looks like

Three elements, in roughly the order you need them.

Governance and security is about knowing what's happening: auditing who publishes and who installs, constraining the supply chain to centralised paths the same way you'd constrain npm, and scanning skills for malicious content before they hit the registry. Most teams I talk to haven't started on this yet, which is the bit that blocks rollout.

Standardisation and reuse is the next problem once skills are flowing, because duplication and drift become real issues when three people on the team have built a "review the code" skill and a fourth comes along. Teams need a way to compare, standardise, and choose.

Continuous optimisation is the holy grail, and it's where the CDLC closes the loop by observing what the agent did, whether it succeeded, whether the user had to correct it. Devs need to feed that signal back into their evals to evolve the skill and ship the new version, which is what the teams at the cutting edge are doing.

This is the area we've built Tessl to help with, as a platform for collaboratively developing skills, discovering and installing them with confidence, scanning them with Snyk before they go live, and observing how they perform once they're in use. This is why platform teams, DevX teams, and the newly emerging "AI enablement" teams use us to eliminate duplicates, drive usage of the good skills, and manage costs as agentic development scales.

Challenge 3: security must become agentic to keep up

The third challenge is the simplest to state and the hardest to act on, which is that agentic development moves faster than any prior development paradigm. Security has to move at the same speed, and the only way to do that is for security itself to become agentic.

I've lived through a version of this transition before, given that the move from waterfall to cloud took a set of manual security processes that worked fine on quarterly release cycles and made them actively dangerous in continuous deployment. The manual code review before every release was a reasonable practice in 1998 and a liability by 2015, which meant the teams that automated their scanning early built a durable advantage while the teams that didn't spent the next decade catching up under pressure.

The same inflection point is now happening with agents, where practices that are still tolerated in modern cloud security, like manual triage of vulnerabilities, manual dependency upgrades, and manual review of supply chain changes, are already automated by the leading teams. In the agent era they won't be tolerated at all, because the future is here, as Gibson said, but it's not evenly distributed.

A long list of things move from "nice to have" to "must have", including smarter prioritisation, automated upgrades, supply chain manipulation detection, and drift detection on skills, all of which are things agents can genuinely help with.

Security is now being squeezed from both sides, given that attackers are already operating agentically with automated reconnaissance, exploit generation, and lateral movement at speeds humans can't match. Businesses are operating agentically to ship faster, which means a security function that stays human-paced doesn't just slow things down, it becomes the asymmetric weak point in the system, and the math stops working.

If security does become agentic, though, we can finally fix the things we've spent fifteen years trying to get developers to do consistently, which is the part that excites me.

Come talk about this at DevCon

We're going much deeper on all of this at AI Native Dev Con (DevCon) London, June 1st and 2nd. It's the conference we put together for people actually building and shipping in the agentic era.

The line-up is focused on delivering real-world case studies from teams that have rolled out agents at scale, the platform engineers building the enablement layer, and the security folks figuring out how to keep up. I'll also be expanding on the CDLC, skills as software, and what good governance actually looks like.

If any of this resonated, or if you want to discuss about it in person, I'd love to see you there. All the details, the agenda, and registration are at tessl.io/devcon.

See you in London.

COPY & SHARE

Guy Podjarny

Founder of Tessl and Snyk, angel investor, ex-Akamai CTO, and co-host of the AI Native Dev podcast.

READING

·

0%

IN THIS POST

From AI-augmented to AI-nativeChallenge 1: non-determinism means you have to measureCodeGuard, and why more context isn't better contextThe Context Development LifecycleChallenge 2: skills are a new unit of softwareWhat enterprise-grade skill governance looks likeChallenge 3: security must become agentic to keep upCome talk about this at DevCon

COPY & SHARE

Guy Podjarny

Founder of Tessl and Snyk, angel investor, ex-Akamai CTO, and co-host of the AI Native Dev podcast.

YOUR NEXT READ

Announcing skills on Tessl: the package manager for agent skills

Today, we’re announcing agent skills on Tessl: a developer-grade package manager for skills, tools to evaluate their quality, a registry full of evaluated skills, and a platform to manage the full lifecycle of skills in your organization.

Guy Podjarny

·29 Jan 2026·8 min read
Read more

More articles by Guy Podjarny

See all articles

The 5 levels of AI agent autonomy: learning from self-driving cars

Explore the five levels of AI agent autonomy, inspired by self-driving cars, to understand their potential, from manual control to full automation, and improve human-AI collaboration in development.

Guy Podjarny·29 Sept 2025