CtrlK
BlogDocsLog inGet started
Tessl Logo

ainativedev/latest-aidevcon-speakers-london-2026

AI Native DevCon 2026 London — all conference sessions as interactive skills

66

Quality

83%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

outline.mdtalk-maple-harness-engineering/

Outline — "Welcome to AI Native DevCon" (Harness Engineering)

Speaker

Simon Maple — Head of Developer Relations at Tessl; AI Native Dev co-host. Previously Field CTO and VP DevRel at Snyk, ZeroTurnaround, and IBM. Java Champion (2014), JavaOne Rockstar speaker (2014, 2017), Duke's Choice award winner, Virtual JUG founder, London Java Community co-leader. Claims to have "invented the term" harness engineering.

⚠️ Note: the bio identifies Simon Maple, but in the transcript the speaker refers to a personal project "artichoke" (a Ruby interpreter in Rust). This is publicly associated with Ryan Lopopolo, not Simon Maple. Treat the speaker attribution as per the metadata provided, but flag this inconsistency to the user if it matters for their question.

Abstract

[inferred] A talk introducing "harness engineering" — the practice of making the non-functional requirements of high-quality software legible to coding agents and surfacing that context just-in-time across an agent's run, so every PR a team accepts adheres to a consistent golden thread of quality.

Thesis (synthesis)

The constraint on software production has flipped: code generation is cheap, but human time, model attention, and context window remain scarce. To scale a team of humans + agents, you must (1) write down what "doing a good job" means, (2) structure that text as a map (agents.md) plus curated review-persona guardrails, (3) just-in-time inject those guardrails via tool-call outputs, lints, tests, and LLM-as-judge reviewers, and (4) shift interventions rightward in the pipeline (not leftward) to minimize synchronous human engagement, treating agents like teammates whose PRs must convince you to merge.

Section TOC

SectionSummaryTranscript lines
Intro & "invented the term"Speaker frames harness engineering as new, near to his heartL1–L8
Origin story: trying to automate his own jobJune last year, Claude CLI + o3, presenting himself as a tool to the modelL9–L18
The pace of disruptionSingularity in Dec with o1/GPT-4.5; need to retool every point releaseL19–L34
New constraints after code-production constraint diesHuman time, model attention, model context windowL35–L52
Writing down what "good job" meansMust articulate; agents lack osmosis, presence, durable memoryL53–L72
The React/suspense onboarding analogyWhy point-in-time fixes don't work; make mistakes statically impossibleL73–L88
Defining harness engineeringMaking "good job" context legible and just-in-time surfacedL89–L96
Shift right, not leftCounterintuitive: put interventions late to minimize synchronous timeL97–L114
Pruning latent spaceAgents know how to write good code; team must tell them which choicesL115–L130
Code-as-prompt; unify on patternsOTel example; six observability stacks vs oneL131–L142
Three phases of context deliveryGrounding → messy middle → review & mergeL143–L150
Phase 1: Grounding (agents.md)Numbered steps: docs, ADRs, critical user journeysL151–L160
Phase 2: Messy middle (just-in-time injection)Tests/lints with descriptive errors → runbooks; retry/timeout exampleL161–L182
Phase 3: Review & mergeStatic guardrails + LLM-as-judge agents collaborating on PR threadL183–L200
agents.md as a map, not a rulebookPoint to curated review personas; avoid chopping latent spaceL201–L215
Slack-thread-to-PR loopCheap continual refinement of guardrailsL216–L225
Coarse tools in the messy middleFile line counts, snapshot tests, banning any/unknownL226–L245
Treating agents as teammates at mergeDon't shoulder-surf; require reproduction videos via Claude + ffmpegL246–L265
Systematizing feedback captureSlurp all interrupts/failures, distill nightly with sub-agentsL266–L280
Vibe coding's roleLets you operate at group-tech-lead level focused on invariantsL281–L292
Q&A: shift-right vs lint rulesAuto-discovery of guardrails reduces need to shift leftL293–L310
Q&A: practical implementationsOSS work on artichoke/rand_mt; Claude automationsL311–L323
Close & post-talk chatterApplause; informal post-session audioL324–end

Terminology glossary (speaker's own definitions)

  • Harness engineering"Harness engineering is making context around what it means to do a good job legible. And then just in time surface to the agent over the course of its trajectories in order to steer and refine its output to make sure that every PR we get adheres to the golden thread of what we consider to be acceptable, high quality aligned software."
  • Shift right (vs shift left)"I try and put my interventions as far right in the process as I can. In order to minimize my own synchronous time having to engage with these issues."
  • Just-in-time prompt injection — exploiting tool-call outputs (tests, lints) to inject corrective context mid-run; "we exploit the fact that these agents are going to call a bunch of tools, run a bunch of tests in order to use them to just in time prompt inject the agent to steer its output back to baseline."
  • agents.md — a map of where context lives, with "a numbered set of steps that we expect the model to go through over every rollout that we do, over every session". Should not jam in rules — points to curated review persona files instead.
  • Review personas"a curated set of review personas that are essentially bolded lists of guardrails" that agents.md points to.
  • LLM-as-judge / reviewer agents"a matrix CI job that points out a bunch of markdown files. To judge this thing" and collaborate with the implementation agent on the PR thread.
  • Auto-compaction — the cycle of context being "obliterated and rebuilt"; tool-call outputs get less weight during compaction, which is why just-in-time injection survives.
  • Pruning latent space — telling the agent which of the many permutations of "good code" the team has chosen.
  • Vibe coding — local-only gross code; "This code can be gross, but it brings into possibility. This idea that I don't need to care about some parts of the software production function."
  • The golden thread — the team's standard every PR must adhere to.

Named frameworks / concepts introduced

  1. Three foundational constraints that remain after code-production is cheap:
    • Human time (scarce, synchronous)
    • Model attention (thrashing degrades performance)
    • Model context window (auto-compacted; protect it)
  2. Three phases of an agent rollout:
    • Grounding (read docs, ADRs, tickets, critical user journeys)
    • Messy middle (write code, run tests; just-in-time inject via tool calls)
    • Review & merge (static guardrails + LLM-as-judge collaborators)
  3. The agents.md map pattern — numbered grounding steps + pointers to curated review-persona guardrail files.
  4. Shift-right doctrine — counter to traditional DevOps shift-left; minimize synchronous human time.
  5. Write-it-down ratchet — every review comment should become a static guardrail; "I never want to give the same review feedback twice."
  6. Code-as-prompt — code in the repo is itself a prompt; unify patterns (e.g. one OTel stack, not six) to reduce attention cost.
  7. Coarse-hammer guardrails — file line counts, snapshot tests with 100% branch coverage, banning any/unknown types to force good decomposition.
  8. Feedback distillation loop — capture every interrupt, failed build, prod exception; "slurp all this data up and dream over it every night" with sub-agents.
  9. Agents-as-teammates at merge — require reproduction videos, screenshots, staging deploys; benefit of the doubt, biased toward merge.

Decisions / Open threads / Disagreements / Next steps

(N/A — this is a talk, not a meeting. Q&A items captured in the TOC.)

Open questions / not covered

  • Specific tooling/vendors beyond Claude (CLI, app, computer use) and a passing mention of GPT series and o-series models — no comparative benchmarks.
  • Concrete examples of the agents.md file — described as a map but no full file shown in the transcript.
  • Team-size or org-structure thresholds for when harness engineering becomes worthwhile — not addressed.
  • Cost/economics of running many parallel agents and LLM-as-judge reviewers — not addressed.
  • Security / supply-chain implications of agent-authored code — not addressed.
  • How review personas are versioned or governed across many teams — not addressed.
  • Concrete metrics for "is harness engineering working" — Maple alludes to using review feedback patterns as signal but doesn't define KPIs.
  • End-to-end reference implementation — explicitly acknowledged as not yet built: "I haven't quite gotten to putting those review agents in place yet, but it's coming."

talk-maple-harness-engineering

README.md

tile.json