Outline — Welcome to AI Native DevCon (Simon Maple & Baruch Sadogursky)
Speakers
- Simon Maple — Head of Developer Relations at Tessl; AI Native Dev co-host. Previously Field CTO and VP DevRel at Snyk, ZeroTurnaround, IBM. Java Champion (2014), JavaOne Rockstar speaker (2014, 2017), Duke's Choice winner, Virtual JUG founder, London Java Community co-leader.
- Baruch Sadogursky — Tessl colleague co-presenting. The transcript jokes one of them is "Ushi" (Russian-speaker cat) and the other "Fail" (Tessl cat). Baruch self-identifies as the Russian speaker and the one who has been on JFrog for "12 years".
- Audience — interactive workshop format, multiple audience members ask questions and contribute steps to the example workflow.
⚠️ The source transcript has no per-speaker labels — it is one continuous block. Attributions in this outline are inferred from context (e.g. "hand over to Baruch", "This is Baruch", self-references). When in doubt, attribute to "one of the presenters."
Abstract
Not provided by the user. [inferred]: A live unscripted workshop demonstrating how to refactor a "mega-prompt" that drives an agent from GitHub issue to merged PR into proper context primitives — skills, rules, scripts — then bundle them as a distributable Tessl plugin with evals, and ultimately treat the resulting policy as a context artifact enforceable in CI.
Thesis (synthesis)
Treat context as real software. The same software principles that broke up "god components" apply to LLM prompts: decompose mega-prompts into single-responsibility skills (conditionally activated via YAML description), extract anything deterministic into scripts (zero tokens, predictable), put always-fire behaviors into rules (short, mandatory), bundle the lot as a versioned plugin artifact (not committed ad-hoc to GitHub), and validate the whole thing with evals using LLM-as-judge. Then package the policy itself as a context artifact and enforce it in CI.
Section-by-section TOC
- Cold open & dark factory framing — Tessl is trying to turn its SDLC into a "dark factory"; today's focus is the issue→merged-PR orchestrator. (transcript lines ~1–40)
- Audience-sourced workflow steps — planning, requirements, tests/CI, branch, write code, PR, review, deploy. (~40–80)
- Generating the mega-prompt with Claude — live demo asking Claude for a "mega prompt" to turn a GitHub issue into a merged PR. (~80–150)
- Mega-prompt critique — "you are a seasoned world-class engineer" theater, all-caps begging, conditionals that get loaded regardless because "LLMs don't do branching". (~150–220)
- Introducing Baruch & cat-themed slides — social handles, QR code, housekeeping (Claude Code + Tessl account workspace). (~220–280)
- First end-to-end run of the mega-prompt — creates hello-world Python repo, opens issue, runs the mega-prompt; ~88k tokens, PR opens, Copilot review, race condition with auto-push. Works but expensive and brittle. (~280–420)
- Skills 101 — what they are and why — YAML frontmatter (
name, description); descriptions are preloaded, bodies are conditionally activated; Anthropic invented the paradigm in October 2025; quickly adopted across harnesses. (~420–520)
- Extracting skills from the mega-prompt — Claude itself extracts
documentation-ticket-workflow and code-ticket-workflow skills into .claude/skills/. (~520–600)
- Skill description quality — Tessl's skill-review scores descriptions (76 / 82 / 68); criticism that steps are "prose rather than commands"; Claude Code is "actually one of the worst activators of skills". (~600–700)
- Testing the skills end-to-end — clear context, fix documentation issue; observes "successful skill activation"; conditional activation = less tokens than mega-prompt. (~700–780)
- Deterministic vs non-deterministic — extract scripts — "every time we want something predictable we walk away from an LLM and convert it to script"; warning about agents over-scripting things that need reasoning (regex on text is the classic anti-example). (~780–900)
- Cat demo & dumb models — Ushi the cat presses a button; segue to "dumb models" — Haiku probably does 90% of this; OpenRouter and LiteLLM for cheap inference; "the bill we're going to rack up for this demo is absolutely mind-blowing". (~900–1000)
- Rules — always-on, must-fire context — short, mandatory tokens, sit at system-prompt level; example rule: "don't summon GitHub Copilot review for docs"; Claude doesn't natively support rules so Tessl hacks
.claude/rules/. (~1000–1100)
- Hooks — coming soon, harness-dependent — concept exists in some harnesses; Tessl's CLI will plug them into each harness; not yet a Tessl primitive at time of talk. (~1100–1180)
- Plugins & distribution — context as artifact — "sources go to GitHub, packaged artifacts go to registry"; never commit jar files to GitHub; Tessl plugin =
plugin.json with name, version, description, list of skills; published to Tessl registry; install via tessl install <plugin>. (~1180–1320)
- Evals & scenarios — LLM-as-judge — context is non-deterministic so use evals not unit tests; Tessl can generate scenarios; example: AI-banned repo, with-plugin run discovers the ban and refuses to open issue → 100% lift. (~1320–1450)
- Closing meta-point — package the policy itself — write a skill that analyses prompts and enforces "should be a rule / should be a script"; install it in CI as an agentic reviewer. (~1450–1500)
Terminology glossary (definitions as actually given)
- Dark factory — Tessl's name for a fully automated SDLC. "At Tessl, we've been trying to turn our own software development lifecycle into full dark factory." (Note: transcript also renders this as "dev factory" once — likely STT artifact.)
- Orchestrator — what they call the issue-to-merged-PR portion of the pipeline. "You might call this like an orchestrator."
- Mega-prompt — a giant single prompt covering an entire workflow. "A giant mega prompt is one way that we could achieve that." Smells: theater openings, all-caps begging, conditionals.
- Skill — atomic, conditionally-activated prompt with YAML frontmatter. Defined functionally: "a tiny bit on top of this prompt which is called the front matter ... has two items. The first is name and the second is description." The description is what the agent reasons over to decide activation.
- Description (of a skill) — "absolutely crucial. This is already quite a differentiator from like the big prompt we had." Two competing constraints: detailed enough to trigger, small enough that loading all descriptions doesn't blow context.
- Rule — "tiny prompts that are going to be there at every time a conversation is sent to the model. No matter what's going on." Used when 100% activation is required.
- Script — deterministic code the agent invokes. "Scripts are direct invocations zero tokens zero money."
- Hook — harness-dependent automation primitive; "hooks are very harness dependent and they will be completely different for each and every one of them."
- Plugin (Tessl) — bundled context artifact: skills + rules + scripts + references, described by a
plugin.json (name, workspace, version, description, list of skills). Scripts aren't top-level — referenced from skills/rules.
- Eval / scenario — non-deterministic test using LLM-as-judge. "We run the same question without any additional context. And we run the same question then prompt basically with our [plugin] ... if the results are better with context that means that yay we did a good context."
- LLM-as-judge — "we ask ... are those sounds better than no sweethearts and it will tell us yes it's better."
- Conditional activation — only skills (not rules, not mega-prompts) are conditionally loaded. "This conditional activation for effectively like little reusable atomic prompts was kind of a game changer."
Named frameworks / concepts introduced
- Issue → merged PR orchestrator — the linear-issue-to-merged-PR portion of the dark factory.
- Mega-prompt → skills → scripts → rules → hooks decomposition — the central refactoring narrative of the talk.
- Deterministic vs non-deterministic split — "whatever is deterministic should be deterministic"; LLM for reasoning, scripts for predictable steps. Anti-example: agents using regex for text classification.
- YAML frontmatter pattern for skills —
name + description; description is the activation discriminator.
- Tessl skill-review — scores skill descriptions on activation quality, surfaces issues like "no explicit use when", "steps are prose rather than commands", token-cost vs trigger-fidelity tension.
- Context-as-artifact / plugin distribution model — sources in Git, packaged artifacts in registry; install via Tessl CLI; private vs public publishing.
- Eval lift with LLM-as-judge — measure whether installed plugin improves agent behavior on a held-out scenario; example showed 0% → 100% lift on the AI-ban scenario.
- Meta-policy as plugin — package the lessons of the workshop (e.g. "this should be a rule / this should be a script") as a Tessl plugin and install it in CI as an agentic code reviewer.
Open questions / not covered
- Specific hook syntax for any particular harness — the talk says hooks are coming to Tessl and harness-dependent, but does not specify implementation.
- How exactly references inside skills work / a documentation page for them — explicitly flagged by an audience member; presenters agreed they should write docs.
- Quantitative skill-activation benchmarks per harness — the talk asserts Claude Code is "one of the worst activators" but no numbers are shown.
- How to write good evals/scenarios beyond Tessl auto-generation — only the AI-ban example is shown end-to-end.
- Symphonia in depth — mentioned once as "60 million tokens to check if a PR was open or closed" but not analysed.
- Detailed cost figures — repeated jokes about token cost but no actual numbers besides one "88,000 tokens" mention.
- Multi-agent / agent-to-agent coordination — out of scope.
- Security of the context artifacts themselves — touched on briefly via Snyk scanner integration but not detailed.
Speech-to-text artifacts to be aware of
The transcript contains numerous STT artifacts. When quoting verbatim, preserve them and flag the likely intended word:
- "Macey" / "Macey maker" → likely Maple (Simon Maple)
- "Sadly mansource" → likely "shadows open source" or similar
- "dark factory" vs "dev factory" — used interchangeably; "dark factory" is the consistent term
- "URA something" → likely "you are a..." (the seasoned-engineer prompt opener)
- "linear" early on refers to the Linear product (issue tracker), not "linear" as adjective
- Numerous broken cat-themed jokes around slide patching
- The final ~15% of the transcript is post-talk hallway audio (lunch logistics, mic setup, next speaker intro for the Anthropic managed-agents workshop) — not part of this talk and should generally be ignored when answering questions about the Maple/Sadogursky session.