ainativedevcon2026/talk-podjarny-skills-are-the-new-code-aindc

Skills are the new Code by Guy Podjarny

1.38x

Quality

90%

Does it follow best practices?

Impact

87%

1.38x

Average score across 4 eval scenarios

Securityby

Passed

No known issues

Outline — Skills are the new Code

Name: ainativedevcon2026/talk-podjarny-skills-are-the-new-code-aindc
Rating: 89.1 (1 reviews)
Author: ainativedevcon2026

Speaker

Guy Podjarny — Founder and CEO of Tessl, which is "reimagining software development for the AI era" and powers AI Native Dev Con. Previously founded Snyk (created the Developer Security category, now a multi-billion-dollar company with 1000+ employees). Former CTO at Akamai (following acquisition of his first startup). Active angel investor, co-host of the AI Native Dev podcast.

Abstract (as provided)

Software development has always revolved around the instructions we give the machine - punch cards, then assembly, then code. As agentic development takes hold, agent skills are becoming our unit of software - but we're not treating them this way.

In this keynote, I'll make the case that agent skills deserve the same rigour we've spent decades applying to code. Crafted with intention. Tested against real behaviour. Versioned and maintained in step with the project around them. Treating skills as an afterthought isn't just technical debt, it's the difference between AI that ships and AI that drifts.

Thesis (synthesis)

The new units of agentic software are tools, context, and harnesses — composing upward into factory lines and factories. Within those, reusable skills are the asset developers edit most, and they now exhibit the same failure modes code does: security risk, collaboration friction, and rot. The remedy is to import the toolchain we built for code — static analysis, dynamic tests (evals), dependency management, security tooling, and observability — and wrap them in a Context Development Life Cycle (CDLC) that humans own while agents handle the SDLC.

Section TOC

Section	Summary	Transcript lines
Pre-talk disclaimers	Guy notes this is a dry run, slides are still rough, asks for feedback	1–18
Opening & three-part agenda	New units of software; skills are the new code; dev tools for skills	19–44
Unit 1 — Tools	CLIs, MCPs, APIs; deterministic, save tokens, compose	45–68
Unit 2 — Context	Practices/policies, specs, workflows; rules, skills, passive context; skills compose	69–98
Unit 3 — Harnesses	Deterministic software wrapping probabilistic models; Claude Code as example; plugins and hooks; harnesses compose into factory lines	99–138
The agentic software stack	Tools → context → harnesses → factory lines → factories	139–158
Why skills are the dominant reusable context	Skills are "code-like" reusable context units; usage exploding	159–172
Problem 1 — Security & governance	Malicious skills (30%+ in OpenClaw), negligent skills, vulnerable skills (API keys)	173–204
Problem 2 — Collaboration & reuse	Unicorn platform team story; quality testing gap; dependency management gap	205–232
Problem 3 — Lifecycle & rot	Skills rot like software; opportunity for automated optimization from agent logs	233–262
Solution: treat skills as code	Five tools: static analysis, dynamic tests, dependency mgmt, security, observability	263–278
Tool 1 — Static analysis	Inspect skills without executing; Tessl Review; quality scores	279–298
Tool 2 — Dynamic tests (evals)	Evals are the new tests; scenarios at varying scope; can't scale without them	299–334
Tool 3 — Dependency management	Skills compose ⇒ skills are dependencies; versioning, manifests, platform compatibility	335–360
Tool 4 — Security tooling	Static analysis, supply chain, red teaming; Snyk integration	361–388
Tool 5 — Observability	Mine agent logs and PRs for real scenarios, gaps, new skill opportunities	389–410
The CDLC	Generate → test → optimize → distribute → observe; humans own CDLC, agents own SDLC	411–428
Wrap-up & the Tessl agent	Vertical agent for skill development; local, pipeline, control center	429–470

Terminology glossary (Guy's own definitions)

Tools — "pieces of software that we call to be able to do something deterministically, and they really are what turns a model into an agent." Dominant kinds: CLIs (shell/Bash), MCPs, APIs.
Context — "giving the agent information that it either cannot know, like opinions, or that it can find out but it is very inefficient or error prone to do so." Three buckets: practices and policies, specs, workflows.
Rules — context "which we sort of always shove into the context. You will always have this in any LLM message that you send."
Skills — context that is "loaded on demand by the user or with some hints."
Passive context — "docs, information like architecture MD and such that might just sit in your repository and be available to the agents to find."
Harness — "deterministic software that wraps the probabilistic model... For instance, Claude code is a hardness [harness]." Chooses where context comes from, which tools are available, provides UX, sometimes constrains the model.
Hooks — "deterministic software that would run at specific points in time... takes away decision power from the model." Example: Intercom blocking gh pr open unless a PR skill is loaded.
Factory line — composed harnesses; Guy uses this term "just to disambiguate that from the harness that we use locally."
Malicious skill — "skills that are explicitly written to manipulate the agent to do something malicious."
Negligent skill — "skills that lack safety instructions" (e.g. failing to say "do not delete tables" when updating a DB).
Vulnerable skill — skills that do unsafe things, "the most common example here is API keys that are used in the open."
Evals — the skill equivalent of tests: "Defining scenarios that say, in this situation, this is how the agent should [behave]. Here's the setup. Here's the task for the agent. [Here]'s the judgment about what good looks like."
CDLC (Context Development Life Cycle) — the loop Guy proposes humans should own: generate skills, write evals, optimize from learnings, distribute with package management, observe in the wild, all while maintaining security and quality.

Named frameworks / concepts

The agentic software stack — tools (bottom) → context → harnesses → factory lines → factories. "This is not a finite list, and... this will change often."
Three problems that emerge as skills scale — (a) security & governance, (b) collaboration & reuse, (c) lifecycle & continuous optimization.
The five code-development tools to bring to skills — static analysis, dynamic tests, dependency management, security tooling, observability.
The CDLC loop — generate → test (static & dynamic) → optimize → distribute (with dependency mgmt) → observe → feed back. Wrapped in security and quality throughout.
Carrot-and-stick of skill lifecycle — neglected skills rot and become harmful; invested skills can yield "double our return, right, or an order of magnitude improvement" through automated optimization.
Humans on CDLC, agents on SDLC — Guy's positioning of the division of labour.

Open questions / not covered

Concrete metrics for what a "good" quality score looks like, or thresholds.
How to write good eval scenarios in practice — Guy notes "you can write very useful ones, or you can write... ones that just waste your time" but doesn't give criteria.
Specifics of skill versioning semantics (semver? something else?).
How harness compatibility is actually expressed in skill manifests.
Pricing, availability, or release timeline of the Tessl agent.
How to handle multi-tenant or cross-org skill sharing/governance.
Comparison with specific competing platforms beyond a brief mention of Snyk and Socket.
Detailed worked example of red-teaming a skill.
The participants list and audience Q&A — this is a dry run, not the live talk.

evals