Skills on Tessl: a developer-grade package manager for agent skillsLearn more
Logo
Back to podcasts

The End of Fragmented Agent Context

Transcript

Chapters

Trailer
[00:00:00]
Introduction
[00:01:18]
The Rise of Skills Standards
[00:04:39]
Context Engineering and Skills
[00:07:58]
Deep Dive into Evals
[00:17:20]
Importance of Scenarios in Skill Evaluation
[00:24:45]
The Role of Package Managers in Skill Distribution
[00:32:32]
Lifecycle and Professional Development of Skills
[00:39:30]
The Future of Skill Development
[00:41:52]
Engaging with the Tessl Registry
[00:43:12]

In this episode

One skill took coding success from 28% to 71%. Another made things worse.


Guy Podjarny and Simon Maple tested 1000+ agent skills and reveal which ones actually work, which hurt performance, and why anecdotal evidence isn't enough anymore.


Tessl Skills Registry is the first package manager for agent skills with built-in evaluations, versioning, and lifecycle management. Explore tested skills and see real performance data:[ https://tessl.io/registry](https://tessl.io/registry)


On the docket:

  • Claude roasted Anthropic's own skill with a 27% score ("monolithic wall of text")
  • Why some popular skills actually decrease agent performance
  • How Tessl is bringing package managers and evals to agent skills

Whether you're creating your first skill or managing them across your dev team, this is your roadmap for making agent skills actually work.

In this episode of the AI Native Dev podcast, co-hosts Simon Maple and Guy Podjarny (CEO of Tessl) dive deep into the world of Agent Skills. As AI agents move from experimental toys to core development teammates, the industry is racing to standardize how we "teach" them new tricks. The duo explores the shift from "it worked once" anecdotes to professional-grade Context Engineering, the emergence of the Anthropic skill standard, and why your skills need a package manager just as much as your JavaScript does. Along the way, they share eye-opening eval data on Anthropic’s own skills and introduce Tessl’s new registry built to professionalize the AI-native lifecycle.


Beyond the Markdown: Defining the "Skill" Standard

For a long time, guiding an agent meant hacking together "system prompts" or messy .cursorrules. Podjarny defines a Skill as a standard unit of context that teaches an agent a specific competency. While the concept isn't brand new, it has reached a "Goldilocks" moment because of the Anthropic Agent Skills standard.

Similar to how the Model Context Protocol (MCP) standardized tool-calling, this format provides a structured Skills.md file and metadata that allows different agents—from Cursor and Windsurf to Gemini and Claude—to consume the same instruction set.

The Insight: Intelligence (the LLM's reasoning) is different from Knowledge (the context you provide). Skills bridge that gap, using "progressive disclosure" to feed the agent only the information it needs, when it needs it, without bloating the context window.


The Three Buckets of Context Engineering

Not all context is created equal. Podjarny breaks down the "Context Engineering" hierarchy into three distinct categories:

The Three Buckets of Context Engineering: If the category is Rules, then the forcefulness is high (mandatory), context cost is high, and use case is essential constraints like always using TypeScript. If the category is skills, then the forcefulness is medium (which is invoked), the context cost should be medium, and the use case are for specific workflows like how to review security PRs, etc. If the category is for Docs, then the forcefulness is low (searchable), and the context cost is zero until loaded, and the use case should be for vast libraries of reference info (API docs, READMEs).
The Three Buckets of Context Engineering

The challenge is Activation. Just like an MCP tool, a skill is only as good as its name and description. If the agent doesn't know when to pull the lever, the skill is useless.


Evals: Why "It Feels Better" Isn't Good Enough

One of the most provocative segments of the episode covers Evaluations (Evals). Most developers currently treat skills as "vibes-based" software. Simon and Guy argue that if skills are part of your production workflow, they must be tested like software.

Tessl’s new research into review evals revealed some surprising results:

  • The "Prose" Problem: Even Anthropic's own "Canvas Design" skill was flagged by Claude for being "extremely verbose" and containing "philosophical padding."
  • The Impact Gap: In task-based evals, one Vercel skill boosted agent success from 28% to 71%, while another similar skill actually had a negative impact.

The takeaway? You need systematic evals to know if your context is actually helping or just adding noise that confuses the model.


The "NPM for Skills": Solving the Copy-Paste Crisis

We’ve seen this movie before. In the early days of Java and JavaScript, developers copied libraries manually. Today, "Agent Skills" are in that same messy "copy-paste" phase. People are duplicating .skills folders across repos, leading to version drift and unmaintained instructions.


Podjarny argues that we need a Package Manager for Context:

  1. Version Control: Using SemVer for skills so teams know what has changed.
  2. Manifest Files: A single source of truth (like package.json) to track what skills a repo relies on.
  3. Cross-Agent Distribution: A tool that can install a skill into .claude, .cursor, and .windsurf simultaneously.

Tessl has launched a Skill Registry to act as this central hub, allowing developers to tessl skill install proven competencies rather than reinventing the wheel.

Key Takeaways

  • Skills are Software: Stop treating them as static docs. They need versioning, lifecycles, and ownership.
  • Standardize or Die: The Anthropic standard is winning; lean into it to ensure your instructions work across Cursor, Windsurf, and Claude.
  • Ruthless Conciseness: LLMs hate "monolithic walls of text." Use progressive disclosure to keep the context window lean.
  • Measure the Delta: Use Task Evals to compare agent performance "with vs. without" a skill. If it doesn’t move the needle, delete it.
  • Stop Vendoring: Don't manually copy-paste skills into your repos. Use a registry and manifest file to manage dependencies.

Chapters

Trailer
[00:00:00]
Introduction
[00:01:18]
The Rise of Skills Standards
[00:04:39]
Context Engineering and Skills
[00:07:58]
Deep Dive into Evals
[00:17:20]
Importance of Scenarios in Skill Evaluation
[00:24:45]
The Role of Package Managers in Skill Distribution
[00:32:32]
Lifecycle and Professional Development of Skills
[00:39:30]
The Future of Skill Development
[00:41:52]
Engaging with the Tessl Registry
[00:43:12]