Shachar — Product Manager at Buzz, based in Tel Aviv. Self-described as a "product manager [veteran]" with ~a decade in product management. Mission at Buzz: "building the best and the most precise AI code review agent in the market." Planning to relocate and establish a new site for Buzz in San Francisco.

Host / event framing: Simon Maple (Head of Developer Relations at Tessl, AI Native Dev co-host) — named in the event metadata as the host who introduces the session. The transcript body itself is delivered by Shachar; Simon's words are not separately labelled here.

Audience Q&A: three unnamed audience questioners at the end.

Abstract

Not provided by user. [inferred] A case study from Buzz on building "Spec Reviewer" — an agent that verifies whether implemented features actually match the spec — and the context-engineering lessons learned along the way (planner/verifier split, sub-agent delegation, base-branch grounding, ephemeral sandboxing).

Thesis (synthesis)

Coding agents in 2026 are good at generating code but bad at verifying features against their specs, which is why human code review is still a bottleneck. Solving this is a context-engineering problem, not a model-capability problem: you must (a) split planning from verification, (b) delegate per-requirement verification to parallel sub-agents, (c) ground the verifier in the base branch rather than the diff to avoid solution bias and hallucinated requirements, (d) scope each sub-agent to the relevant layer (frontend vs backend), and (e) sandbox browser sessions when the agent must visit customer URLs. The strategic takeaway: the big coding-agent vendors overlook cracks like this, and that's where small teams can win.

Section TOC

Section	Summary	Approx. lines
1. Opening & self-intro	Shachar introduces himself, Buzz, Tel Aviv, plan to move to SF	1–25
2. The problem: code review is the bottleneck	Discovery-call findings: PRs piling up, trust deficit, agents ignore spec parts	25–55
3. Motivating example: the "continue button" overlap bug	A spec he wrote himself, with a recording, was still implemented wrong	55–90
4. Why coding agents fail at verification	They focus on generating, not verifying; code-vs-code isn't the answer; need a deployed feature in staging	90–120
5. Spec Reviewer architecture v1 (single agent)	Agent reads tickets/designs/staging, extracts requirements, validates each — and Claude warns it's a hard task; context window explodes	120–155
6. Split 1: Planner + Verifier	Two agents — planner extracts requirements, verifier checks them. Sessions stop crashing but requirements get skipped and quality is inconsistent	155–185
7. Split 2: Per-requirement sub-agents	Parallel sub-agents, one per requirement, with an orchestrator collecting verdicts	185–210
8. Hallucinated requirements & base-branch grounding	Agent invents requirements (e.g. "new responder command", "backward compatibility"); fix: give it the base branch (not the diff) + scope it to the relevant layer	210–250
9. Sandboxing untrusted URLs	Customer integrations require visiting arbitrary URLs; use a third-party sandbox (AWS Agent Core) — ephemeral sandbox per requirement	250–285
10. The dream made real	Spec Reviewer running on Buzz's own code; agent sessions navigating dashboards, integrations, Stripe subscription, Google/Tessl onboarding	285–310
11. Three key takeaways	(1) Context engineering is still hard in 2026; (2) specs + code is a gold mine; (3) use proven third-party tools in high-risk areas; bonus: find the cracks the big labs overlook	310–345
12. Q&A 1 — Regression tests	Yes, Spec Reviewer also runs critical-flow regression checks every PR (e.g. subscription)	345–375
13. Q&A 2 — Why not generate tests instead?	Generated tests cover imaginary scenarios; specs reflect real-life intent; byproduct: teams write better specs	375–410
14. Q&A 3 — Lighter models for sub-agents? & exploratory testing	Heavy models for extraction, small agents for verification; focus on the 1–5% that matters	410–460
15. Q&A 4 — How to verify the spec was implemented as written + spaghetti code concern	Cut off by time; question only partially answered	460–end

Terminology glossary (speaker's own definitions)

Spec Reviewer — "an agent that is called spec reviewer. It's going to have access to specs… designs… and… the feature that is deployed in the staging environment or in a preview environment. And verify it."
Planner agent — "Planner's role is to extract the requirements from the spec and understand what are going to be the failure cases that I'm going to verify through the verification process. Only one task to extract requirements."
Verification agent — "going to navigate through different files through the UI through the design and understand if the specific specs that were provided by the planner were met in this feature."
Sub-agent delegation — "instead of one agent that is checking 10 or 12 or 15 requirements sequentially, I'm going to have 15 or 12 agents that are running in parallel. Each of them is reaching to a specific verdict and in the end there's an orchestrator that collects all the verdicts."
Base branch vs diff — "if we give it the diff… it's biased to the specific solution that the engineer choose to implement. But if we give it the base branch before the change, the agent is open-minded to different kinds of approach and is more critical about the solution that was chosen."
Scoping — "if I'm reviewing a front end feature, there's no reason to be concerned about backend issues because I'm just going to create noise that are irrelevant for this specific feature."
Ephemeral sandbox — "when the agent needs to validate a specific requirement, there will be a sandbox that would be running for that specific requirement with a browser session checking the specific feature."

Named frameworks / concepts introduced

Spec Reviewer architecture — Planner → parallel per-requirement Verifier sub-agents → Orchestrator collecting verdicts, with per-requirement ephemeral sandboxes for browser navigation.
The two context-engineering moves: (a) "dividing between planning and execution"; (b) "delegating agentic tasks between multiple sub-agents".
Grounding rule: specs are "a snapshot and what we're trying to achieve" but "the code is how we ground that agent to reality." Give it base branch + scope to the relevant layer.
Risk heuristic: "If you identify… high risk areas… prefer using third party proven tools that you can use instead of exposing yourself to security issues." Concrete instance: AWS Agent Core for sandboxed browser sessions.
Strategic heuristic for builders: "Look for the gaps that the big coding agents are not able to fill and build the product there."
Byproduct effect ("renewable energy") — when teams know specs will be used by a verification agent, they write better specs.

Open questions / not covered

The talk does not give a quantitative evaluation of Spec Reviewer (accuracy, false positive/negative rates, cost per PR) beyond anecdote.
The talk does not cover prompt-level details of the planner or verifier (no prompts shown).
The talk does not specify which models are used where, beyond "heavy model" for extraction and "small agents" for verification, and a generic mention of "openai and anthropic ones".
The final audience question — "how did you make sure that the spec that you gave the agent were actually implemented the way the spec has been written?" and the spaghetti-code follow-up — was cut off by time and not substantively answered ("So I have three seconds. And it's okay.").
The talk does not address how Spec Reviewer handles ambiguous or contradictory specs beyond noting that "human beings aren't consistent about how the writer expects."
No discussion of how the orchestrator resolves conflicting verdicts between sub-agents.
No discussion of how designs (visual assets) are actually ingested or compared.

Speech-to-text artifacts worth knowing

"Tessla" → almost certainly "Tessl"
"vehic" → likely "veteran"
"Asian sessions" → "agent sessions"
"Father. Doc" → garbled audience-member intro
"twist" → likely "Twitch" or similar
"platinum" → "planner" (in the planner/verifier split section)
"fiber" → an unrecognised ticketing-system name (possibly "Fibery")
"Tel Aviv… slide" → likely "flight"

.tessl-plugin

talk-batey-building-product-teams-age-of-ai

talk-birgitta-closing-keynote

talk-debois-agent-enablement

talk-douglas-training-ai-on-your-own-code

talk-dubnov-merge-rate-ai-adoption

talk-farley-vibe-coding-best-we-can-do

talk-firtman-web-mcp-agentic-web

talk-foxwell-reinvention-dev-team

talk-graziano-spec-driven-development

talk-groetzinger-skills-everywhere

talk-jones-odevo-ai-native-transformation

talk-jourdan-pipelines-to-prompts

talk-katsioloudes-code-security-ai

talk-lamis-context-engineering-dreaming

talk-lawson-agent-experience

talk-luebken-embedding-pi-coding-agent

talk-maleix-collective-intelligence

talk-maple-ai-native-devcon-welcome-slick

talk-maple-ai-native-devcon-welcome-spec-reviewer

talk-maple-aind-devcon-welcome

talk-maple-context-engineering-skills

talk-maple-continuous-ai-github-workflows

talk-maple-harness-engineering

talk-maple-tldraw-ai-canvas-experiments

talk-marsden-agent-desktops

talk-martinelli-spec-driven-development

talk-moss-skills-team-workflow

talk-overweg-one-brain-no-filtering

talk-podjarny-skills-are-the-new-code

talk-roberts-ai-native-brownfield

talk-roberts-brownfield-ai-native

talk-scheire-artificial-intelligence

talk-selajev-docker-sandboxes-agents

talk-sloan-harness-engineering-beyond-code

talk-stack-humans-architect-ai-writes-code

talk-stoneham-product-brain

talk-tal-skills-security

talk-thomas-ai-native-engineering

talk-walter-runtime-intelligence-agents

talk-wilson-cq-stack-overflow-for-agents

talk-wotherspoon-humans-vs-slop

README.md

tile.json

ainativedev/latest-aidevcon-speakers-london-2026

outline.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}talk-maple-ai-native-devcon-welcome-spec-reviewer/

Outline — Welcome to AI Native DevCon (Spec Reviewer talk)

Speaker