AI Native DevCon 2026 London — all conference sessions as interactive skills
71
89%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
Brian Douglas ("bdougie" / "bw" on the internet). Founder of Paper Compute ("distributed systems primitives for AI agents"). Previously founded Open Sauced, which joined the Linux Foundation in 2024. Background in finance ("My background is finance. I went to school for it"). Self-describes as an "elder millennial" and a front-end developer by trade. Previously worked at a company called continue where they trained a model called "next edit" using SFT — this is where his fine-tuning experience comes from.
Every AI agent call generates training data. Most teams throw it away. Tapes is an open source telemetry proxy that intercepts LLM API calls and builds a content-addressable Merkle DAG of every conversation turn, with zero instrumentation required. […] We built this pipeline by running parallel agents speedrunning Gameboy games. […] The result: a closed loop where your agents generate the data that trains the next version of the model they run on. No external training data. No third-party model outputs. Your code, your agent traces, your model.
The talk delivered is a looser, more narrative version of this abstract — Brian explicitly says "This is not a workshop, so don't expect to, like, step one, step two, step three. I'm going to just share my journey of how I learned this."
Every agent session you run produces telemetry that is currently being thrown away (Claude Code stores sessions for 30 days then deletes them). If you capture those sessions with a tool like tapes, you can (1) feed past context back into future runs to build a self-healing loop, (2) extract reusable skills from the traces, and (3) optionally fine-tune small local models on the cleaned data. Brian validated this by speedrunning Pokémon Red with 10 parallel agents, then applied the same machinery to his own codebases ("super agents" / "sweeper agent"). His strong recommendation is the first two steps; the fine-tuning step (especially DPO) is expensive and only worth it for bespoke cases.
| # | Section | Summary | Approx. location in transcript.md |
|---|---|---|---|
| 0 | MC intro | The MC welcomes the audience, asks people to scoot in, mentions sessions have moved rooms, introduces Brian. | Lines 1–18 |
| 1 | Brian's intro & disclaimer | Self-intro as bdougie/bw, mentions Paper Compute, says he has "nothing to sell" — this is open source. | Lines 19–34 |
| 2 | Pokémon case study setup | Why Pokémon Red on Game Boy: validating tapes at scale via 1000-turn sessions, speedrunning to get the first Pokémon. | Lines 35–58 |
| 3 | The Pokémon agent setup | pygame-boy + Claude Code as harness, headless emulation, screenshots every 10 turns, 10 parallel agents. | Lines 59–86 |
| 4 | The "politely hallucinating" failure | Agent wouldn't progress because it never learned to talk to NPCs (Mom). Self-imposed rule: no internet lookup. | Lines 87–104 |
| 5 | observation.md & observer-state.json | Markdown observations written by the agent at end of session ("journal" analogy); JSON for game-state things like the 7-second door cooldown. | Lines 105–138 |
| 6 | Kafka + anomaly detection on the Pokémon loop | 10 sims publishing to Kafka, anomaly detection catching things like "not going through doors" and HP/berry battle nuance. | Lines 139–158 |
| 7 | From Pokémon to codebases | The same nuance-discovery applies to 5–20 year old codebases; we currently "shoot from the hip" every 5-hour session without learning. | Lines 159–178 |
| 8 | Super Agent & Sweeper Agent | Applying the same setup to his own code: 10 parallel agents on separate VMs (steros) fixing lint, writing docs, generating context. | Lines 179–210 |
| 9 | The data-value argument | Anthropic/Cursor are paying $10M+ for training data deals. "You should be extracting value." Use cheaper models (Haiku) for bespoke at-scale work. | Lines 211–240 |
| 10 | Aside: auto-research / Qwen 3.6 / unbanned from Claude | He got blocked by Anthropic for running 10 parallel super-agents; got unblocked in ~12 hours after a blog post. Same week, Qwen released auto-research. | Lines 241–270 |
| 11 | Tapes architecture: Merkle DAG of sessions/turns | Every commit/session/turn is content-addressed. Session = from claude to /clear or close. Built by his co-founder. | Lines 271–305 |
| 12 | "Check the tapes" skill | Use stored sessions to reconstruct why something was done 6 months ago; he used it to recover prompts for designer wireframes. | Lines 306–340 |
| 13 | TapeDeck UI | CLI + visualization of session probability ("tape deck" — "tape being the most durable form medium"). Shows tool calls and skill invocations. | Lines 341–365 |
| 14 | Generating skills from tapes | Use a small/cheap model (he uses GPT-4o) to draft skills from filtered tape sessions; recommends human review. | Lines 366–385 |
| 15 | The book analogy for SFT | Model = book; SFT = writing skills in the margins. Used Qwen 4B, embedded his + 3 teammates' skills. Works for bespoke, not daily-driver. | Lines 386–425 |
| 16 | DPO: the Cliff Notes / Matthew McConaughey aside | DPO = "alright alright alright" — picks the best every time. Very expensive. Don't bother unless you're a researcher at Meta. | Lines 426–470 |
| 17 | Hardware results | 4070 RTX 24GB worked for SFT; needed 32GB so upgraded to 5090; DPO needed h100s (borrowed via an Nvidia friend); DPO on 4B = "not even worth it", on 7B = "go for it" but expensive. | Lines 471–495 |
| 18 | Wrap-up: the three steps | (1) Capture sessions, (2) Knowledge transfer via skills (multiplayer coding), (3) Harness/model freedom. Anthropic IPO mention. | Lines 496–525 |
| 19 | Q&A | One question on Codex support: works with Claude Code, Conductor, Ollama today; happy to add Codex if someone asks. | Lines 526–end |
Definitions are Brian's own framing, paraphrased only when no clean verbatim exists.
The three-step wrap-up pipeline:
The Pokémon validation loop: 10 parallel agents × 1000 turns × screenshots every 10 turns → recorded to tapes → observations → observer-state → self-healing loop. Validated speedrunning to get the first Pokémon from Professor Oak.
The book analogy for fine-tuning: Model = book containing co-located ideas. SFT = margin notes (your skills embedded). DPO = throwing out the book and reading Cliff Notes instead.
The cost/value argument: Claude Code stores sessions for 30 days then deletes them. Anthropic/Cursor sell training-data access for millions ("Cursor currently is in a deal for $10 million at minimum with SpaceX"). Therefore capture and own your own session data.
The speech-to-text frequently garbles key terms. When quoting verbatim, preserve the artifact and clarify in brackets. Common substitutions seen:
.tessl-plugin
talk-azriel-executable-specs-agentic-coding
talk-batey-building-product-teams-age-of-ai
talk-birgitta-closing-keynote
talk-cormack-tests-lie-observability-ai-honest
talk-debois-agent-enablement
talk-douglas-training-ai-on-your-own-code
talk-dubnov-merge-rate-ai-adoption
talk-farley-vibe-coding-best-we-can-do
talk-firtman-web-mcp-agentic-web
talk-foxwell-reinvention-dev-team
talk-graziano-spec-driven-development
talk-groetzinger-skills-everywhere
talk-jones-odevo-ai-native-transformation
talk-jourdan-pipelines-to-prompts
talk-katsioloudes-code-security-ai
talk-kerr-bipolar-disorder-dysregulation-ai
talk-lamis-context-engineering-dreaming
talk-lawson-agent-experience
talk-lopopolo-harness-engineering-humans-steer-agents-execute
talk-luebken-embedding-pi-coding-agent
talk-maleix-collective-intelligence
talk-marsden-agent-desktops
talk-martinelli-spec-driven-development
talk-moss-skills-team-workflow
talk-obstbaum-willoughby-evals-hard
talk-overweg-one-brain-no-filtering
talk-podjarny-skills-are-the-new-code
talk-roberts-ai-native-brownfield
talk-roberts-brownfield-ai-native
talk-scheire-artificial-intelligence
talk-selajev-docker-sandboxes-agents
talk-sloan-harness-engineering-beyond-code
talk-smith-connecting-context-future-transports
talk-stack-humans-architect-ai-writes-code
talk-stoneham-product-brain
talk-syme-agentic-repository-automation
talk-tal-skills-security
talk-thomas-ai-native-engineering
talk-trieloff-browser-agents
talk-walter-runtime-intelligence-agents
talk-wilson-cq-stack-overflow-for-agents
talk-wotherspoon-humans-vs-slop