AI Native DevCon 2026 London — all conference sessions as interactive skills
66
83%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
Shachar — Product Manager at Buzz, based in Tel Aviv. Self-described as a "product manager [veteran]" with ~a decade in product management. Mission at Buzz: "building the best and the most precise AI code review agent in the market." Planning to relocate and establish a new site for Buzz in San Francisco.
Host / event framing: Simon Maple (Head of Developer Relations at Tessl, AI Native Dev co-host) — named in the event metadata as the host who introduces the session. The transcript body itself is delivered by Shachar; Simon's words are not separately labelled here.
Audience Q&A: three unnamed audience questioners at the end.
Not provided by user. [inferred] A case study from Buzz on building "Spec Reviewer" — an agent that verifies whether implemented features actually match the spec — and the context-engineering lessons learned along the way (planner/verifier split, sub-agent delegation, base-branch grounding, ephemeral sandboxing).
Coding agents in 2026 are good at generating code but bad at verifying features against their specs, which is why human code review is still a bottleneck. Solving this is a context-engineering problem, not a model-capability problem: you must (a) split planning from verification, (b) delegate per-requirement verification to parallel sub-agents, (c) ground the verifier in the base branch rather than the diff to avoid solution bias and hallucinated requirements, (d) scope each sub-agent to the relevant layer (frontend vs backend), and (e) sandbox browser sessions when the agent must visit customer URLs. The strategic takeaway: the big coding-agent vendors overlook cracks like this, and that's where small teams can win.
| Section | Summary | Approx. lines |
|---|---|---|
| 1. Opening & self-intro | Shachar introduces himself, Buzz, Tel Aviv, plan to move to SF | 1–25 |
| 2. The problem: code review is the bottleneck | Discovery-call findings: PRs piling up, trust deficit, agents ignore spec parts | 25–55 |
| 3. Motivating example: the "continue button" overlap bug | A spec he wrote himself, with a recording, was still implemented wrong | 55–90 |
| 4. Why coding agents fail at verification | They focus on generating, not verifying; code-vs-code isn't the answer; need a deployed feature in staging | 90–120 |
| 5. Spec Reviewer architecture v1 (single agent) | Agent reads tickets/designs/staging, extracts requirements, validates each — and Claude warns it's a hard task; context window explodes | 120–155 |
| 6. Split 1: Planner + Verifier | Two agents — planner extracts requirements, verifier checks them. Sessions stop crashing but requirements get skipped and quality is inconsistent | 155–185 |
| 7. Split 2: Per-requirement sub-agents | Parallel sub-agents, one per requirement, with an orchestrator collecting verdicts | 185–210 |
| 8. Hallucinated requirements & base-branch grounding | Agent invents requirements (e.g. "new responder command", "backward compatibility"); fix: give it the base branch (not the diff) + scope it to the relevant layer | 210–250 |
| 9. Sandboxing untrusted URLs | Customer integrations require visiting arbitrary URLs; use a third-party sandbox (AWS Agent Core) — ephemeral sandbox per requirement | 250–285 |
| 10. The dream made real | Spec Reviewer running on Buzz's own code; agent sessions navigating dashboards, integrations, Stripe subscription, Google/Tessl onboarding | 285–310 |
| 11. Three key takeaways | (1) Context engineering is still hard in 2026; (2) specs + code is a gold mine; (3) use proven third-party tools in high-risk areas; bonus: find the cracks the big labs overlook | 310–345 |
| 12. Q&A 1 — Regression tests | Yes, Spec Reviewer also runs critical-flow regression checks every PR (e.g. subscription) | 345–375 |
| 13. Q&A 2 — Why not generate tests instead? | Generated tests cover imaginary scenarios; specs reflect real-life intent; byproduct: teams write better specs | 375–410 |
| 14. Q&A 3 — Lighter models for sub-agents? & exploratory testing | Heavy models for extraction, small agents for verification; focus on the 1–5% that matters | 410–460 |
| 15. Q&A 4 — How to verify the spec was implemented as written + spaghetti code concern | Cut off by time; question only partially answered | 460–end |
.tessl-plugin
talk-batey-building-product-teams-age-of-ai
talk-birgitta-closing-keynote
talk-debois-agent-enablement
talk-douglas-training-ai-on-your-own-code
talk-dubnov-merge-rate-ai-adoption
talk-farley-vibe-coding-best-we-can-do
talk-firtman-web-mcp-agentic-web
talk-foxwell-reinvention-dev-team
talk-graziano-spec-driven-development
talk-groetzinger-skills-everywhere
talk-jones-odevo-ai-native-transformation
talk-jourdan-pipelines-to-prompts
talk-katsioloudes-code-security-ai
talk-lamis-context-engineering-dreaming
talk-lawson-agent-experience
talk-luebken-embedding-pi-coding-agent
talk-maleix-collective-intelligence
talk-maple-ai-native-devcon-welcome-slick
talk-maple-ai-native-devcon-welcome-spec-reviewer
talk-maple-aind-devcon-welcome
talk-maple-context-engineering-skills
talk-maple-continuous-ai-github-workflows
talk-maple-harness-engineering
talk-maple-tldraw-ai-canvas-experiments
talk-marsden-agent-desktops
talk-martinelli-spec-driven-development
talk-moss-skills-team-workflow
talk-overweg-one-brain-no-filtering
talk-podjarny-skills-are-the-new-code
talk-roberts-ai-native-brownfield
talk-roberts-brownfield-ai-native
talk-scheire-artificial-intelligence
talk-selajev-docker-sandboxes-agents
talk-sloan-harness-engineering-beyond-code
talk-stack-humans-architect-ai-writes-code
talk-stoneham-product-brain
talk-tal-skills-security
talk-thomas-ai-native-engineering
talk-walter-runtime-intelligence-agents
talk-wilson-cq-stack-overflow-for-agents
talk-wotherspoon-humans-vs-slop