Speaker
Matthias Lübken — AI engineer and founder specializing in AI agents for business workflow automation. 20+ years of experience building cloud-based systems; helps organizations adopt and build agentic systems, from leveraging existing platforms to designing new agent-based architectures from scratch. Runs a small agency with co-founder Ivan, working with clients in the UK and Germany.
Abstract (as provided)
When people use OpenClaw, they're amazed. It auto-discovers new capabilities, explores available data sources, stitches components together, and dynamically builds new solutions. It feels like the system is learning. It feels magical.
At its core, OpenClaw is powered by pi.dev: a deliberately simple coding agent built on a small set of powerful primitives. PI's "radical extensibility" turns out to be a strong architectural fit for the kinds of composable, evolving use cases OpenClaw is designed to support.
In this talk, we'll take a closer look at what's actually happening under the hood. We will look at the different components and how builders can reuse them in their products.
Thesis (synthesis)
The "magic" of Codex-style coding agents — tools-in-a-loop plus a bash runtime that lets the agent self-discover capabilities — can be deliberately reproduced inside ordinary business software. The recipe is four primitives (agent setup, tools, extensions/hooks, sessions) composed into patterns (predefined workflow, embedded power-user chat, malleable user-extensible software). The key design move is treating tool definition as the real system architecture and using lifecycle hooks (not prompt-only instructions) to enforce guardrails while leaving the agent's flow open.
Section TOC
- Intro & motivation (lines ~1–40) — MC intro; Lübken's framing question: "how do we design systems which are able to deliver the same magic as OpenAI Codex?"
- The "Peter / voice message" anecdote (~40–70) — Codex agent receives a .wav, inspects it with Unix tools, tries Whisper, finds an API key, transcribes. Used to motivate the "tools in a loop + bash" definition.
- Definition of coding agents (~70–85) — "tools that run in a loop" + bash + a runtime/sandbox.
- What Pi is (by what it isn't) (~85–110) — No MCP servers, no subagents, no plan mode, no built-in todos. But Pi can be told to create these.
- Pi extension worked example (~110–135) — "Create a Pi extension that asks for permission when I want to push the main branch to remote" — Pi generates TypeScript + markdown summary.
- OpenClaw / after-sales prototype (~135–230) — Client use case: email inbox → analyze customer requests → check CRM/ERP → draft replies. One agent per customer, one session per case.
- Primitive 1 — Agent setup (~230–270) — SDK level chosen (Codex agent SDK); model, tools, reusable agents.md fragments, skills.
- Primitive 2 — Tools (~270–330) — Three tools: case-state (CRM), parts lookup (ERP), draft-email. Design principle: "don't make your agent guess" — intent-revealing, scoped, can change on the fly.
- Primitive 3 — Extensions / lifecycle hooks (~330–400) — Tool-call and tool-result hooks. Worked examples: validating draft-email domain matches customer; injecting context information mid-flow.
- Primitive 4 — Sessions (~400–450) — JSON event-log tree; supports retries / branching; sessions can be mined to auto-generate skills.
- Pattern 1 — Workflow (~450–470) — The application as shown.
- Pattern 2 — Embedded chat for power users (~470–500) — Reuse the same tools/context in a co-pilot chat surface.
- Pattern 3 — Malleable software (~500–545) — Ink & Switch essay; users adapt their own tools; "what if the software can change itself?"
- Q&A — security of user-authored extensions (~545–585) — Boundaries via tool design (e.g. only draft, never send); extensions add guardrails not powers.
- Q&A — Pi vs a Gmail MCP server (~585–615) — Defining tools/MCP servers is the first step; the talk's point is what you embed them into.
Terminology glossary (speaker's own definitions)
- Coding agent — "these agents who have these, these tools that run and loop ... coding agents additionally have bash ... so they have any kind of Linux, Unix tools at their disposal and some runtime."
- Pi (pi.dev) — "very, very minimal Codex agent ... The minimalism is its feature. So the way to define it, to Define Pi is what's not Pi. There's no MCP servers. There's no subagents. There's no solution pop-ups, there's no plan mode. There's no built-in to dos. There's no background bash thing." But "you can actually tell Pi to create it."
- Pi extension — A TypeScript file (plus a markdown summary) the agent generates or a user writes that hooks into Pi's event lifecycle. Example: a pre-push permission gate created by asking Pi to make one.
- Tool — A capability exposed to the LLM via the agent harness; "the large language model decides when to call these tools."
- "Don't make your agent guess" — Tool design principle: "Try to be precise about the tool definition, make it the intent revealing, make it scoped to the specific task."
- Lifecycle hooks (tool-call / tool-result) — "before we do a tool call or after we've done a tool result ... we cannot control that the LLM is calling it ... but when it does, we can actually ingest and filter out things or do something with the result."
- Session — "tree structure of an event log ... JSON structure ... with messages, model changes, different types of things. And you can ingest your owns ... custom messages which are sent to the LLM and those which are not sent to the LLM."
- Malleable software — From an Ink & Switch essay: "software system should not be these predefined systems, but more of an ecosystem where anyone can adapt their tools to their needs with minimal fiction [friction]." Knife vs. specialized slicer analogy.
- OpenClaw — The product that originally embedded Pi ("we have one claw, we have Pi embedded"). Note: speaker mentions "that's not true since like two weeks, so I need to change the title ... they have just ripped out Pi."
Named frameworks / concepts
The four primitives (building blocks for reasoning about embedding a coding agent in your product):
- Agent setup — choose SDK level, model, reusable instruction fragments (agents.md-style), skills.
- Tools — the real architectural surface; intent-revealing, scoped; LLM self-discovers via
--help and error messages.
- Extensions (lifecycle hooks) — pre-tool-call and post-tool-result hooks for validation, guardrails, dynamic context injection.
- Sessions — branchable JSON event-log trees; auditable, mineable (e.g. auto-generate skills from session logs).
The three patterns (how primitives compose into product shapes):
- Workflow — streamlined automation (the after-sales email prototype).
- Embedded chat — full Codex-agent chat for power users, reusing the same tools and context.
- Malleable software — users themselves extend the system (e.g. write extensions in natural language).
"Don't make your agent guess" — tool-design heuristic.
"Don't hand out the tools the agent should use" — corollary: if you give the agent a tool, assume it will use it; instructions alone won't stop it.
Open questions / not covered
- No benchmark numbers, latency figures, or cost figures for the OpenClaw prototype.
- No detail on the evals being run against auto-generated skills ("we're going to do evals around it, etc.") — only mentioned as future work.
- No discussion of multi-tenant isolation between the per-customer agent containers beyond architectural sketch.
- Tree-structured sessions: speaker explicitly admits "we have not explored that too much."
- No comparison of Pi vs. specific alternatives (Claude Code, Cursor, Aider) beyond noting "you can do this with other agents."
- No concrete UI/UX details for user-authored extensions ("the user able to articulate that").
- No discussion of model choice, prompt-engineering specifics, or token economics.
- Security/permissions for malleable extensions only discussed at a high level in Q&A (boundaries via tool design, not a full model).