Back to articlesI Invented a Three-Tier Stack for AI Agents (And I'm Not Apologizing)

24 Mar 202619 minute read

Baruch Sadogursky

Baruch Sadogursky is a Developer Advocate who helps developers move from vibecoding to spec driven development, with deep experience from JFrog and now at Tessl.

Table of Contents

TLDR

The Three Layers

How I Got Here: Fifty Tabs of Fares

What Happens When You Don't Separate

The Power of Policy

Making It Real with Tessl

Trust, but Eval

Your Move

Back to articles

I Invented a Three-Tier Stack for AI Agents (And I'm Not Apologizing)

24 Mar 202619 minute read

Table of Contents

TLDR

The Three Layers

How I Got Here: Fifty Tabs of Fares

What Happens When You Don't Separate

The Power of Policy

Making It Real with Tessl

Trust, but Eval

Your Move

TLDR

Every software domain has a three-tier diagram. Databases, networking, web apps — three boxes, arrows, done. Except, somehow, agentic applications. We're building those as one undifferentiated blob of prompt text.
I built a flight search library, then a skill to teach agents how to use it, then realized a second person would need completely different rules. Three layers, each obvious in hindsight.
The bottom two layers are shared infrastructure. The top layer, your personal Policy, is what makes the same agentic app produce completely different results for different people.
When a policy skill fails, you don't know — the agent sounds equally confident whether it applied your rules or quietly ignored them. Evals are the only way to find out before production does.

I'm in Mexico City, staring at a cutaway model of the Templo Mayor, the great Aztec temple. Seven construction phases, each new ruler building a bigger temple over the previous one. The innermost structure is the foundation. It does the structural work. Each successive layer wraps it with new rituals, new capabilities. The outermost layer, painted shrines on top, reflects whoever's in charge: their gods, their rules, their policy.

Layered architecture with separation of concerns. Six hundred years ago. On an island in the middle of a lake.

Turns out the brain likes threes. Cognitive scientists call it chunking: three is the number of things humans can hold, relate, and reason about before the model in our heads starts dropping pieces. Every domain in software figured this out independently. Databases: physical, logical, conceptual. Networking: access, distribution, core. Web apps — you know the one. Three boxes, arrows between them, done. It's the architecture the brain converges on, which is why every field gets there eventually.¡

Every field, apparently, except agentic applications. We're building those without an architecture diagram, without layers, without any separation of concerns.

The Aztecs built better architecture than your agentic app.

What are we, savages? So, I came up with one:

 ┌─────────────────────────────┐
 │  Policy    (who/how)        │
 │  personal rules             │
 └──────────────┬──────────────┘
                │
                ▼
 ┌─────────────────────────────┐
 │  ASI       (what)           │
 │  agent skill interface      │
 └──────────────┬──────────────┘
                │
                ▼
 ┌─────────────────────────────┐
 │  Library   (engine)         │
 │  code                       │
 └─────────────────────────────┘

That's the whole diagram.

I've been telling everyone who'll listen that this is the missing piece. Every agentic app you've built without those three boxes has been harder than it needed to be.

The Three Layers

Every agentic application has three concerns, whether you've named them or not.

The Library is the engine. It's code: a Python package, an npm module, a CLI tool, a REST API wrapper. It does the actual work — fetches data, transforms it, writes to databases, talks to external services. The Library has no idea that an AI agent exists. You'd write this same code for a traditional application, with documentation, tests, and a version number.

The ASI is the Agentic Skill Interface, the layer that teaches an agent how to use the Library. I'm calling it ASI because I enjoy the double meaning, and because this one you can actually build. The ASI knows the API signatures, the error types, and the workflow patterns. It tells the agent: "When the user asks for X, call Y with these parameters, handle these errors this way, and present results in this format."

Here's the thing: your coding agent re-discovers how to use your libraries every single session. It reads the docs (if you remember to point it there), takes its best guess at the API, and invents a workflow that's slightly different from what it invented yesterday. That knowledge is encodable. You write it once as a skill, and every agent that installs it gets the benefit. Library knowledge in, agentic interface out.

The Policy is the layer that makes an agentic app yours. It encodes personal rules, preferences, and constraints. It knows nothing about how the Library works, and it doesn't need to. All it has to say is: "When making decisions, here are MY rules." Things like your airline loyalty program, your company's coding standards, your dietary restrictions, and your risk tolerance for production deploys.

The bottom layer does the work. The middle one teaches the agent how to drive it. The top one is yours alone, and it's the reason two people running the same agentic app get completely different results.

How I Got Here: Fifty Tabs of Fares

As a Developer Advocate, I travel a lot, and I am unreasonably particular about finding good flights. The process used to involve opening approximately fifty browser tabs, comparing fares across dates and routes, cross-referencing loyalty programs, and slowly losing the will to live.

So I built a library. A Python package that wraps a flight search engine, runs headless browser automation, and returns typed dataclasses. You give it an origin, a destination, and a date, and it hands back a list of fares. It's an engine.

Then I wanted my coding agent to use it, and I hit the wall every agentic developer hits: the agent had no idea how to drive my Library. It didn't know that the search takes 15-45 seconds. It had no concept that a BlockedError means stop, don't retry, don't even think about retrying. It treated round-trip legs as independent queries instead of the sequential selection flow the Library expects. The agent needed more than the docs; it needed a workflow. It had, and I cannot believe I get to say this unironically, a skill issue.

So I wrote one. Parse intent, handle errors by type, present one leg at a time, set the timeout to two minutes. No flight search code in the skill itself — pure orchestration.

And then the third layer appeared. Because I care about SkyTeam alliance carriers. I have specific opinions about layover airports (no overnight connections in airports where the benches are too short to sleep on, because I have standards). None of that belongs in the Library or the skill. Those are MY rules, and a colleague using the exact same Library and skill would have completely different preferences.

Three layers. They emerged one at a time, and once I saw them, I couldn't unsee it. The oldest software architecture pattern in the world, wearing a new hat (top hat?).

What Happens When You Don't Separate

When you vibecode an agentic application, all three layers live in a single conversation, tangled together in a single long prompt.

And you can't share any of it. If someone else wants the same flight search capability, they can't install "your agentic app" because it doesn't exist as an app — it's a conversation history. The library knowledge, the agent workflow, and your personal rules are knotted into one undifferentiated mess, with no way to say, "Here, take the library and the skill, they're public — but the policy is mine."

The Power of Policy

The policy layer is where this gets personal. Literally.

The bottom two layers, Library and ASI, are shared infrastructure. You build them once, and a whole community of users benefits. The policy layer is yours alone.

Think about what happens when you detach Policy from the rest of the stack. I install the flight search library and the ASI skill. My travel policy says SkyTeam carriers and extra legroom on anything over four hours. A colleague installs the same two tiles but writes a policy that says "cheapest fare, always direct, I don't care about loyalty programs." We're running the same agentic app, and we get completely different flight recommendations, because the only thing that changed is the Policy.

This works in every domain. A code review ASI backed by a linting library could enforce wildly different standards depending on whose Policy is loaded. A startup might want fast, pragmatic reviews. An enterprise client with SOC2 compliance needs something very different. The Library and the ASI didn't change; only the Policy did.

The policy layer is also the most fun to write, because it's just your opinions, codified. My travel policy starts with "always prefer SkyTeam alliance carriers" and includes gems like "no overnight connections in airports where the benches are too short to sleep on." Yours might start with "cheapest option, period" and include "window seat or I'm not going." These are personal rules. They're easy to articulate. Turning them into a skill tile is about as complicated as writing a list of preferences in plain English.

That separation is the whole point.

Making It Real with Tessl

Tessl's tiles (think - plugins) map to this architecture like it was designed for it (because, honestly, it kind of was).

The Library layer is a tile with docs and steering rules. You take your Python package, write up the API docs, and put it on the registry. When someone installs it, their agent gets docs that match the version they're using, plus guardrails that are always loaded.

For the ASI, the tile has a skill and steering rules. The skill walks the agent through the workflow, and the steering rules are there for when the agent decides to get creative between steps.

Your Policy is its own tile — again, a skill and steering rules. Your preferences go in the skill. The steering rules are the ones the agent can't quietly ignore, even on decisions it doesn't think are important.

In my case, the three installs look like this:

tessl install jbaruch/fifty-tabs-of-fares      # library docs
tessl install jbaruch/fifty-tabs-of-fares-asi  # agent skill
tessl install jbaruch/jbaruch-travel-policy    # YOUR rules

Three tessl install commands, and you have a layered agentic application. The library docs and the ASI skill are publicly available on the registry — you build them once, and the whole community benefits. The policy tile stays private, shared only within your workspace, because your rules are nobody else's business. Each layer is versioned and replaceable independently. Update the Library, and the ASI and Policy still work. Your Policy can change every week without the other two layers caring. If someone writes a better ASI for the same Library, swap it in.

You don't need Tessl to implement this pattern. You can separate these concerns with CLAUDE.md files, cursor rules, or whatever your agent supports for injecting context. But Tessl gives you versioning, a registry, evaluation scores, and the ability to share each layer independently. The three-tier diagram stops being a napkin sketch and starts being something you can actually install.

Belt AND Suspenders: Steering Rules

Remember the steering rules I mentioned in each tile? Here's why you want them alongside the skill. Agents are enthusiastic like a golden retriever that has just discovered the garbage can. A skill teaches the agent a workflow, but a workflow doesn't stop the agent from doing something creative in between the steps.

Every Tessl tile can include a steering section — always-on rules that the agent loads eagerly and follows, whether or not the skill is active. The skill fires when the agent is doing flight search work. The steering rules are watching even when the agent thinks it's doing something else.

In tile.json, it looks like this:

"steering": {
  "no-direct-upstream": { "rules": "rules/no-direct-upstream.md" },
  "blocked-means-stop": { "rules": "rules/blocked-means-stop.md" },
  "use-library-browser": { "rules": "rules/use-library-browser.md" }
}

Each rule is a short markdown file with a single guardrail and the reason behind it. The Library tile has usage-constraints — never retry after a BlockedError, treat upstream access as scarce, always close your browser sessions. The ASI tile has three: never access the flight search provider directly (the Library handles stealth), stop everything on a block (don't try a different route), and always use the Library's browser factory (it configures fingerprint randomization you won't get from raw Playwright). The Policy tile has travel-rules — SkyTeam carriers only, hard disqualify on short connections with separate tickets, search ALL gateways, not just the obvious ones.

The skill already covers all of this in the workflow. The steering rules say it again, louder, as standalone guardrails — because the agent might "creatively" decide to skip a step nobody explicitly told it was non-negotiable. Is it redundant? Absolutely. You don't wear both a belt and suspenders because you think one of them will fail. You wear both because you REALLY don't want your pants to fall.

Trust, but Eval

Here's the uncomfortable truth about policy skills: when they fail, you don't know. If a code-generation skill writes bad code, you run the tests, and they blow up. But a policy skill that silently lets the agent skip most of the possible routes and show you "the best" from the five it checked instead of fifty? You book the flight. You never find out you just missed the option at half the price.

I built almost a dozen eval scenarios for my travel policy tile. Baseline without the tile: 62%. With it: 95-97%. But the numbers aren't the story, the bugs and the fixes are. The agent was only searching the obvious routes and presenting results like it searched everything — fix went into the steering rules: "Search ALL routes. Do not stop after Tier 1." The agent was eyeballing connection times instead of applying minimums — the fix went into the scripts in the skill because LLMs are still terrible with math. ITA Airways left SkyTeam in 2025, but the agent's training data didn't know — the fix went into references in the skill and an up-to-date partners list that the steering rule points to. Every fix landed in a different part of a tile. The eval doesn't care where — it just checks whether the agent gets the answer right.

Convinced evals are the next best thing after the invention of skills themselves? Great. Now try building one. If you're a data scientist, this is Tuesday — scenarios, grading criteria, weights, and rubrics are your daily work. For the rest of us, building an eval framework from scratch is not "write a unit test" hard. It's "what does a rubric even look like for a recommendation that's subjectively wrong" hard. tessl scenario generate creates the scenarios and grading criteria for you — from your tile content, in the cloud. tessl scenario run runs them, also in the cloud, and gives you a side-by-side comparison: scores with the tile loaded vs. without, including the token cost of the additional context. Tiles add context, context costs tokens, and you should know whether that context is paying for itself.

Your Move

Pick a library you already use. Something you find yourself explaining to your agent over and over: "No, the API changed in v3," "No, you need to handle that error differently," "No, that's not how you call that function."

Ask your AI agent to write a skill for it — this is one of the things they're genuinely good at. Add steering rules for the guardrails, the things you keep telling the agent NOT to do: library knowledge in, agentic interface out. Your agent never has to re-discover that Library again. Skills follow the Agent Skills open standard, which is supported by Claude Code, Gemini CLI, Codex, Cursor, GitHub Copilot, Junie, and thirty-something other agents at last count.

Then write YOUR Policy, with steering rules for the ones the agent must never forget. The things you'd tell a new colleague on their first day: "Here's how WE do it." Share it as a private tile visible only within your team's workspace — same Library, same skill, same Policy, everyone's agent behaves the way your org expects. Another team, another company, another person with different standards writes their own policy tile. The Library and the skill didn't change. And the policy tile is the easiest one and the most fun to write because it's literally just your standards in plain English.

Then review and eval it. tessl skill review for a quality score against best practices. tessl scenario generate and tessl scenario run for the horror stories above.

When you're ready to share, tessl-labs/tile-creator packages your skill into a tile and puts it on the Tessl registry. Library docs and ASI skills go public; policy tiles stay in your workspace, because your opinions are nobody else's business.

You just built a three-tier agentic application with actual architecture. The pattern is absurdly simple, which is exactly why it works. Three layers is what the brain wants, and every other domain in software figured this out decades ago. We just forgot to bring it with us into the agentic era.

Stop building agentic apps as a single undifferentiated blob.

We're not savages.

Resources

Agent Skills Development Platform

Optimize Skills with Single Command

Skills to avoid common failure patterns: For agents, by an agent

11 Mar 2026

How to Evaluate AI Agents: An Introduction to Harbor

10 Feb 2026

If agents use your tool, you need evals

21 Jan 2026

Baruch Sadogursky

Baruch Sadogursky is a Developer Advocate who helps developers move from vibecoding to spec driven development, with deep experience from JFrog and now at Tessl.

Resources

Agent Skills Development Platform

Optimize Skills with Single Command

Skills to avoid common failure patterns: For agents, by an agent

11 Mar 2026

How to Evaluate AI Agents: An Introduction to Harbor

10 Feb 2026

If agents use your tool, you need evals

21 Jan 2026

I Invented a Three-Tier Stack for AI Agents (And I'm Not Apologizing)

TLDR

The Three Layers

How I Got Here: Fifty Tabs of Fares

What Happens When You Don't Separate

The Power of Policy

Making It Real with Tessl

Belt AND Suspenders: Steering Rules

Trust, but Eval

Your Move

Resources

Related Articles

Skills to avoid common failure patterns: For agents, by an agent

How to Evaluate AI Agents: An Introduction to Harbor

If agents use your tool, you need evals

Resources

Related Articles

Skills to avoid common failure patterns: For agents, by an agent

How to Evaluate AI Agents: An Introduction to Harbor

If agents use your tool, you need evals