CtrlK
BlogDocsLog inGet started
Tessl Logo

ainativedev/latest-aidevcon-speakers-london-2026

AI Native DevCon 2026 London — all conference sessions as interactive skills

66

Quality

82%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

outline.mdtalk-stack-humans-architect-ai-writes-code/

Outline — The Humans Architect the System, the AI Writes the Code

Speaker

Paul Stack — Infrastructure coder; previously at HashiCorp (Terraform) and Pulumi; currently at System Initiative (and a newly-spun-up sister company he calls "Eldest One Club" — likely a transcription artifact). Self-described "ops person, a developer", "incredibly opinionated", from Northern Ireland. Online as stack72 (stack72.dev). Long-time speaker on CI/CD and operational practice.

Abstract (as provided by user, verbatim)

At System Initiative, we don't write code. Not because we can't — because we decided not to.

Every line of code in our new project, swamp, is generated by an LLM operating within strict design guidelines we've crafted and maintained. We don't accept pull requests but we happily accept contributions. If you want to contribute, you open an issue, we discuss the problem, refine the design together and let the AI build it. This keeps the supply chain intact and trustworthy. That's not a gimmick, it's the thesis made real. (...) The future of software isn't humans writing less code, it's humans getting better at expressing what they want.

Thesis (synthesised)

Stop writing code; build the machine that writes the code. The engineering bottleneck has moved from typing to expressing intent, so the human job is to author and maintain sharp, executable constraints (CLAUDE.md, design guidelines, adversarial reviewers, UAT-as-source-of-truth) inside which agents reliably produce working software. Refusing all human-authored PRs — even from internal staff — is what keeps the supply chain trustworthy in an era of AI-generated supply-chain attacks.

Section TOC

#SectionSummaryApprox. line range in transcript.md
1Cold open & framing"Last line of code I wrote? End of January." Build the machine, don't write the code.1–25
2What changed at System InitiativeSix years of beautifully-architected Rust thrown away end of January; five people stayed; restarted on a Miro board on Jan 25.26–55
3Why Paul fell out of love with codeTwo years ago, a Zoom call about NATS queue naming conventions.56–75
4What he does now — architecture, not syntaxOwns constraints, invariants, system coherence; "vibes don't scale" blog post.76–100
5The no-PR policyEven internal PRs from humans get deleted. Contributions = issues. Supply-chain integrity.101–135
6End-to-end flow: triage → plan ↔ adversarial → human arbiter → UAT → releaseThe pipeline as a state machine inside a skill, backed by a CLI; 5-iteration cap before human steps in.136–175
7CLAUDE.md as executable contractTypeScript strict, no anys, named exports, AGPL header, no fire-and-forget promises, imports from mod, never leak internals; trailing "record non-obvious problems" rule.176–205
8Test strategy & merge gatesUnit/integration/contract/property/architectural tests + separate-repo UAT + adversarial tests; five merge gates including a skill-check gate.206–245
9Self-merging PR demo & numbersA PR merged itself just before the talk. 30-day stats: 295 issues opened, 217 shipped, 81 closed; median triage 4.6h, triage-to-ship 1.6h; ~$3k/month total cost for 5 people.246–285
10Self-debugging agent / removing bus factorSwamp reproduces its own errors, checks if already fixed, files structured issues. Fresh context every time — no /clear in two months.286–305
11How to get started smallTurn conventions into constraints; encode the one thing only one person knows; run one loop end-to-end; find the next constraint.306–325
12Closing — "intent is the new architecture"AI gives you back the work you got into the industry to do. References "context is the new code" from a prior session.326–345
13Live triage demo of issue #518Walks through the skill loading, fetching issue, summarising findings, classifying, structured plan + adversarial warnings.346–370
14Q&A: bottleneckBottleneck is deciding what to build, not compute.371–390
15Q&A: how humans arbitratePlan + adversarial outputs are structured data, queryable; if uncertain, restart for ~10 min cost.391–410
16Q&A: open-source / commercial modelAGPLv3; revenue from commercial software licenses (Red Hat-style); customers pay for maintained, trustworthy supply chain.411–430
17Q&A: does this scale to large orgs?Doesn't need hundreds of engineers; needs people with architectural vision. Juniors become more important — they can learn architecture without syntax.431–460
18Q&A: is this the honeymoon period?"Absolutely. 100." Expects to redesign the system four more times; maybe rewrite in Rust.461–475

Terminology glossary (Paul's own definitions, verbatim where possible)

  • Build the machine that writes the code — the distinction Paul opens with: "The distinction here is between write the code. And building the machine that writes the code."
  • Executable constraints"this is not for us documentation. This is actually executable constraints." The CLAUDE.md rules are enforced/used, not aspirational.
  • Vibes don't scale — Paul's blog post title; TL;DR "If you're via coda [vibe-coding] and there's no idea of what you're trying to do, can you give it a single line prompt? You're going to get somewhere. Is it going to be the most secure, the most useful tool over time? Probably not."
  • Adversarial review"a very grumpy adversarial review that says you don't trust any code in the world. You have to prove that it's secure. You have to prove that it has no injection attacks. You have to prove that it's architecturally complete as to the guidelines."
  • Five-loop cap"We get to five. Okay. We'll only ever allow five loops. Before it steps. (...) human has to be the arbiter of what's going on."
  • UAT as source of truth"in our UAT repo. We have a line that says that tests are the source of Truth. Don't change the test. Always feel [fix] the right if it's a regression."
  • Intent is the new architecture — Paul's recent blog title; framed alongside another speaker's "context is the new code."
  • Swamp — the AI-native automation CLI for ops teams the team is building; also the host of the triage skill.
  • CLAUDE.md"It is an executable contract. It is literally the center of gravity in our entire system."

Named frameworks / concepts introduced

  1. The end-to-end pipeline (state machine in a skill) start → triage → classify → loop{plan ↔ adversarial review} (max 5) → human arbiter → implement → reproduce/verify → open PR → 5 merge gates → self-merge → UAT in separate repo → release → notify requester.
  2. The five merge gates — (a) code review, (b) adversarial review, (c) UX review (CLI consistency, non-regressing output, verb structure), (d) CI security review, (e) skill check (content/format/triggers).
  3. CLAUDE.md constraint vocabulary — TypeScript strict / no any / named exports / no default exports / AGPL header on every file / no fire-and-forget promises / long-adjacent endpoints / imports from mod / never leak internals to consumers / trailing "if you hit a non-obvious problem, record it and propose an update" rule.
  4. Adversarial UAT — running the built binary in a different repo as a user would, plus tests that kill the process mid-flight, corrupt data, etc. "The user is your best user. Your best QA of your entire system."
  5. Self-debugging agent — swamp reproduces its own errors at the right version, checks whether the issue is fixed in a later version, and files a well-formed issue.
  6. "Start small" adoption path — turn conventions into constraints; encode the one thing only one person in the org knows; run that loop once end-to-end; find the next break; iterate.
  7. Commercial model — AGPLv3 OSS + paid commercial software licenses ("a Red Hat model") where customers pay for maintained supply chain.

Open questions / not covered

  • Specific tooling beyond Claude Code — Paul says agents are swappable but only names Claude Code (Max Pro $200/seat) and references "QR code users" (likely Codex / similar — transcription unclear).
  • Concrete adversarial-reviewer prompt text — described in spirit ("grumpy", "prove it's secure") but not shown verbatim.
  • Performance / cost of LLM inference per loop — only aggregate ($1,500–$2,000/month on CI review across 5 people) given.
  • Languages other than TypeScript — CLAUDE.md examples are TS-specific; Rust possibility is mentioned only as a future joke.
  • Regulated / compliance environments — not addressed.
  • What happens to product management / design roles — only engineering and architecture framing given.
  • How the team handles offline / disconnected work — not discussed.
  • Long-term maintenance of CLAUDE.md itself — only the auto-append rule is mentioned; no discussion of guideline drift, conflict, or review.
  • Metrics on agent error rate / false-merge rate — not given.
  • How "contributions via issues" feels from an external contributor's side — only sketched.

talk-stack-humans-architect-ai-writes-code

README.md

tile.json