Simon Maple — Head of Developer Relations at Tessl; AI Native Dev co-host. Previously Field CTO and VP DevRel at Snyk, ZeroTurnaround, and IBM. Java Champion (2014), JavaOne Rockstar speaker (2014, 2017), Duke's Choice award winner, Virtual JUG founder, London Java Community co-leader. Claims to have "invented the term" harness engineering.

⚠️ Note: the bio identifies Simon Maple, but in the transcript the speaker refers to a personal project "artichoke" (a Ruby interpreter in Rust). This is publicly associated with Ryan Lopopolo, not Simon Maple. Treat the speaker attribution as per the metadata provided, but flag this inconsistency to the user if it matters for their question.

Abstract

[inferred] A talk introducing "harness engineering" — the practice of making the non-functional requirements of high-quality software legible to coding agents and surfacing that context just-in-time across an agent's run, so every PR a team accepts adheres to a consistent golden thread of quality.

Thesis (synthesis)

The constraint on software production has flipped: code generation is cheap, but human time, model attention, and context window remain scarce. To scale a team of humans + agents, you must (1) write down what "doing a good job" means, (2) structure that text as a map (agents.md) plus curated review-persona guardrails, (3) just-in-time inject those guardrails via tool-call outputs, lints, tests, and LLM-as-judge reviewers, and (4) shift interventions rightward in the pipeline (not leftward) to minimize synchronous human engagement, treating agents like teammates whose PRs must convince you to merge.

Section TOC

Section	Summary	Transcript lines
Intro & "invented the term"	Speaker frames harness engineering as new, near to his heart	L1–L8
Origin story: trying to automate his own job	June last year, Claude CLI + o3, presenting himself as a tool to the model	L9–L18
The pace of disruption	Singularity in Dec with o1/GPT-4.5; need to retool every point release	L19–L34
New constraints after code-production constraint dies	Human time, model attention, model context window	L35–L52
Writing down what "good job" means	Must articulate; agents lack osmosis, presence, durable memory	L53–L72
The React/suspense onboarding analogy	Why point-in-time fixes don't work; make mistakes statically impossible	L73–L88
Defining harness engineering	Making "good job" context legible and just-in-time surfaced	L89–L96
Shift right, not left	Counterintuitive: put interventions late to minimize synchronous time	L97–L114
Pruning latent space	Agents know how to write good code; team must tell them which choices	L115–L130
Code-as-prompt; unify on patterns	OTel example; six observability stacks vs one	L131–L142
Three phases of context delivery	Grounding → messy middle → review & merge	L143–L150
Phase 1: Grounding (agents.md)	Numbered steps: docs, ADRs, critical user journeys	L151–L160
Phase 2: Messy middle (just-in-time injection)	Tests/lints with descriptive errors → runbooks; retry/timeout example	L161–L182
Phase 3: Review & merge	Static guardrails + LLM-as-judge agents collaborating on PR thread	L183–L200
agents.md as a map, not a rulebook	Point to curated review personas; avoid chopping latent space	L201–L215
Slack-thread-to-PR loop	Cheap continual refinement of guardrails	L216–L225
Coarse tools in the messy middle	File line counts, snapshot tests, banning `any`/`unknown`	L226–L245
Treating agents as teammates at merge	Don't shoulder-surf; require reproduction videos via Claude + ffmpeg	L246–L265
Systematizing feedback capture	Slurp all interrupts/failures, distill nightly with sub-agents	L266–L280
Vibe coding's role	Lets you operate at group-tech-lead level focused on invariants	L281–L292
Q&A: shift-right vs lint rules	Auto-discovery of guardrails reduces need to shift left	L293–L310
Q&A: practical implementations	OSS work on `artichoke/rand_mt`; Claude automations	L311–L323
Close & post-talk chatter	Applause; informal post-session audio	L324–end

Terminology glossary (speaker's own definitions)

Harness engineering — "Harness engineering is making context around what it means to do a good job legible. And then just in time surface to the agent over the course of its trajectories in order to steer and refine its output to make sure that every PR we get adheres to the golden thread of what we consider to be acceptable, high quality aligned software."
Shift right (vs shift left) — "I try and put my interventions as far right in the process as I can. In order to minimize my own synchronous time having to engage with these issues."
Just-in-time prompt injection — exploiting tool-call outputs (tests, lints) to inject corrective context mid-run; "we exploit the fact that these agents are going to call a bunch of tools, run a bunch of tests in order to use them to just in time prompt inject the agent to steer its output back to baseline."
agents.md — a map of where context lives, with "a numbered set of steps that we expect the model to go through over every rollout that we do, over every session". Should not jam in rules — points to curated review persona files instead.
Review personas — "a curated set of review personas that are essentially bolded lists of guardrails" that agents.md points to.
LLM-as-judge / reviewer agents — "a matrix CI job that points out a bunch of markdown files. To judge this thing" and collaborate with the implementation agent on the PR thread.
Auto-compaction — the cycle of context being "obliterated and rebuilt"; tool-call outputs get less weight during compaction, which is why just-in-time injection survives.
Pruning latent space — telling the agent which of the many permutations of "good code" the team has chosen.
Vibe coding — local-only gross code; "This code can be gross, but it brings into possibility. This idea that I don't need to care about some parts of the software production function."
The golden thread — the team's standard every PR must adhere to.

Named frameworks / concepts introduced

Three foundational constraints that remain after code-production is cheap:
- Human time (scarce, synchronous)
- Model attention (thrashing degrades performance)
- Model context window (auto-compacted; protect it)
Three phases of an agent rollout:
- Grounding (read docs, ADRs, tickets, critical user journeys)
- Messy middle (write code, run tests; just-in-time inject via tool calls)
- Review & merge (static guardrails + LLM-as-judge collaborators)
The agents.md map pattern — numbered grounding steps + pointers to curated review-persona guardrail files.
Shift-right doctrine — counter to traditional DevOps shift-left; minimize synchronous human time.
Write-it-down ratchet — every review comment should become a static guardrail; "I never want to give the same review feedback twice."
Code-as-prompt — code in the repo is itself a prompt; unify patterns (e.g. one OTel stack, not six) to reduce attention cost.
Coarse-hammer guardrails — file line counts, snapshot tests with 100% branch coverage, banning any/unknown types to force good decomposition.
Feedback distillation loop — capture every interrupt, failed build, prod exception; "slurp all this data up and dream over it every night" with sub-agents.
Agents-as-teammates at merge — require reproduction videos, screenshots, staging deploys; benefit of the doubt, biased toward merge.

Decisions / Open threads / Disagreements / Next steps

(N/A — this is a talk, not a meeting. Q&A items captured in the TOC.)

Open questions / not covered

Specific tooling/vendors beyond Claude (CLI, app, computer use) and a passing mention of GPT series and o-series models — no comparative benchmarks.
Concrete examples of the agents.md file — described as a map but no full file shown in the transcript.
Team-size or org-structure thresholds for when harness engineering becomes worthwhile — not addressed.
Cost/economics of running many parallel agents and LLM-as-judge reviewers — not addressed.
Security / supply-chain implications of agent-authored code — not addressed.
How review personas are versioned or governed across many teams — not addressed.
Concrete metrics for "is harness engineering working" — Maple alludes to using review feedback patterns as signal but doesn't define KPIs.
End-to-end reference implementation — explicitly acknowledged as not yet built: "I haven't quite gotten to putting those review agents in place yet, but it's coming."

.tessl-plugin

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-graziano-spec-driven-development

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-luebken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-maple-ai-native-devcon-welcome-slick

talk-maple-ai-native-devcon-welcome-spec-reviewer

talk-maple-aind-devcon-welcome

talk-maple-context-engineering-skills

talk-maple-continuous-ai-github-workflows

talk-maple-harness-engineering

talk-maple-tldraw-ai-canvas-experiments

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-stack-humans-architect-ai-writes-code

talk-stoneham-product-brain

talk-tal-skills-security

talk-thomas-ai-native-engineering

talk-walter-runtime-intelligence-agents

talk-wilson-cq-stack-overflow-for-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/latest-aidevcon-speakers-london-2026

outline.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-maple-harness-engineering/

Outline — "Welcome to AI Native DevCon" (Harness Engineering)

Speaker