We're live on Product Hunt! Product huntUpvote & share feedback
Logo
Back to podcasts

Why Context Beats Every Prompt You'll Ever Write

with Guy Podjarny, and Simon Maple

Transcript

Chapters

Trailer
[00:00:00]
Introduction to the episode
[00:01:20]
Is agentic development a paradigm shift?
[00:01:46]
Three primary challenges of agentic development
[00:02:32]
Additional challenges with LLMs and agents
[00:04:51]
Context as the solution
[00:07:12]
Managing teams through communication
[00:08:40]
Writing down what you want agents to do
[00:09:57]
Evaluating how well agents listen
[00:11:19]
The DevOps loop for context
[00:12:24]
Different types of evaluations
[00:14:58]
Three types of context use cases
[00:17:17]
Documenting your internal platform
[00:19:46]
Application and in-repo context
[00:21:28]
The importance of continuous evaluation
[00:24:44]
Introducing the agent enablement platform
[00:26:43]
Wrap-up and where to learn more
[00:29:20]

In this episode

Most teams think agentic dev is about writing better prompts. It's not.


Guy Podjarny and Simon Maple explain why managing context, not crafting prompts, is what separates teams that scale with agents from teams that don't. They walk through a practical framework for building, evaluating, and distributing the context your agents actually need.


In this episode:

  • Why agents fail without structured context about your internal platform
  • The 3 context layers: policies, platform docs, and application context
  • How to build regression evals and torture tests for your agents
  • The Context Development Lifecycle (CDLC) - a new loop for agentic dev

Your agents are only as good as the context you give them.

Context Engineering Is the Core Competency of Agentic Development

Agentic development is not just about coding faster with AI. It represents a fundamental shift in how software gets built, and the teams succeeding with it are those who have recognised that context engineering sits at the centre of everything. In a recent episode of the AI Native Dev podcast, Guy Podjarny and Simon Maple explored what this means in practice, mapping out the workflows, evaluation strategies, and types of context that development organisations need to master.

The conversation surfaced a framework that appears increasingly relevant as teams scale their use of coding agents: a context development lifecycle that mirrors the DevOps loop developers already know.

Why Context Management Defines Agentic Success

LLMs are stateless machines. They receive a bundle of context, calculate weights, and predict the next tokens. This architectural reality means that managing what information goes in, and how it is structured, becomes the primary lever for influencing agent behaviour. As Guy explained during the conversation, "The core competency in agentic development is context management."

The analogy to human teams proves useful here. When managing a team of developers, the tools available are fundamentally about communication: how information is conveyed, what incentives shape behaviour, and how alignment is achieved across different perspectives. Context serves the same function for agents. Rules push explicit constraints. Skills hint at capabilities the agent can pull down when needed. Docs provide reference material for just-in-time retrieval. Each serves a distinct purpose in the overall architecture of AI agent context management (/blog/ai-agent-context-management).

This framing suggests that developers working with agents need to think less about prompting and more about designing information systems. The question shifts from "How do I ask this better?" to "What information architecture does this agent need to succeed?"

The Context Development Lifecycle

The conversation introduced a workflow that maps naturally onto the DevOps infinity loop. On the development side, teams analyse their current situation, generate documentation and specifications, then evaluate how well agents respond to that context. This evaluation step proves critical, as different models interpret the same instructions differently. Some respond better to concise bullet points. Others handle longer, more nuanced documentation. The only way to know is to test systematically.

"You have to build a competency to evaluate," Guy noted. "The best analogy here is to think about monitoring runtime systems. The closest analogy to having non-deterministic systems is servers as they run. And we understand that you have to instrument a system; you have to observe them."

On the operations side, context gets distributed to agents and then observed in real-world usage. This creates a feedback loop: observations inform updates to context, which get evaluated, refined, and redistributed. The parallels to DevOps practices are intentional. Just as runtime systems require monitoring because their behaviour cannot be fully predicted, agents require observation because their responses are non-deterministic.

The evaluation strategy itself appears to follow a familiar pattern. Lightweight, fast evaluations run frequently, catching obvious regressions whenever context changes. More comprehensive "torture tests," as one early AI Native Dev podcast guest termed them, run less often but cover edge cases that matter. Intercom's Des Traynor described using this tiered approach for their support agents, running regression tests on prompt changes but reserving comprehensive evaluations for model upgrades.

Three Types of Context Every Organisation Needs

The podcast surfaced three distinct categories of context that development organisations are deploying, each with its own workflow and maintenance requirements.

Policy and best practice context captures organisational decisions: security requirements, architectural constraints, budget optimisation preferences. These tend to be hierarchical, with company-wide policies that business units or teams can augment or override. The challenge here is evaluation. Telling an agent to "write secure code" is too broad. Teams need specific evaluations that define what "secure" means in their environment, then optimise context to meet those standards.

Platform documentation addresses the knowledge gap around internal systems. Agents have no inherent knowledge of internal billing systems, custom cloud infrastructure, or proprietary libraries. While agents can theoretically explore codebases to discover this information, that approach proves error-prone and expensive. Centralised, maintained documentation for internal platforms gives agents reliable knowledge they will need repeatedly. The key requirement is maintenance: platforms evolve, and context must evolve with them.

Application context captures the definition of what a specific codebase does and how it should behave. Without this, agents making changes have no reference for what "correct" looks like. Guy suggested starting with evaluation here: extract representative commits from repository history, turn them into test scenarios, and measure how well agents can replicate that work. The failures reveal what context is missing. The successes reveal what context might be unnecessary.

Toward a Context-First Development Practice

The framework presented suggests that context will become a first-class asset in software organisations. It integrates with the SDLC at multiple points: local development, code review, incident analysis. The same unit of context might inform an agent helping with implementation, another reviewing the pull request, and a third investigating a production issue.

This has implications for how teams invest their time. Context, like software, can rot. Documents that once reflected how a system works may drift from reality. Evaluations that once covered typical scenarios may miss new patterns. The discipline required appears similar to test maintenance: ongoing effort to keep assertions aligned with evolving systems.

For developers getting started, the practical path seems clear. Begin by evaluating how well agents handle your existing codebases. Identify the failure patterns. Create context that addresses those failures. Build evaluations that let you know whether changes help or hurt. Observe what happens when agents use that context in real work. Iterate.

The full conversation offers additional depth on each of these themes. Worth a listen for teams working to make agentic development reliable at scale.

Chapters

Trailer
[00:00:00]
Introduction to the episode
[00:01:20]
Is agentic development a paradigm shift?
[00:01:46]
Three primary challenges of agentic development
[00:02:32]
Additional challenges with LLMs and agents
[00:04:51]
Context as the solution
[00:07:12]
Managing teams through communication
[00:08:40]
Writing down what you want agents to do
[00:09:57]
Evaluating how well agents listen
[00:11:19]
The DevOps loop for context
[00:12:24]
Different types of evaluations
[00:14:58]
Three types of context use cases
[00:17:17]
Documenting your internal platform
[00:19:46]
Application and in-repo context
[00:21:28]
The importance of continuous evaluation
[00:24:44]
Introducing the agent enablement platform
[00:26:43]
Wrap-up and where to learn more
[00:29:20]