Koog 1.0 idioms, gotchas, and scaffolding skills for Kotlin agents on the JVM
91
89%
Does it follow best practices?
Impact
92%
1.73xAverage score across 47 eval scenarios
Advisory
Suggest reviewing before use
All notable changes to this tile are documented here. Format: Keep a Changelog. Versioning: SemVer.
use-planner Steps 2-3 — added the Path: file-write convention so the ai.koog:agents-planner dependency and the agent construction land in graded build.gradle.kts / Main.kt files instead of inline prose. The 3-run eval showed use-planner-llm-based-triage reliably missing "adds the separate planner module dependency" (33%) because the dependency was never written to a fileuse-planner Step 1 — the graph-DSL redirect now adds a top-of-file comment acknowledging the developer's "planning" wording. The use-planner-refuses-when-graph-fits scenario scored 16% on "acknowledges the developer's framing without capitulating" because the redirect named the topology and round-trips but never engaged the "planning" wordadd-structured-output Steps 1-2 — added the Path: file-write convention (the same fix use-llm-node-variants got in 0.4.6). Both actions now write Main.kt (plus Strategy.kt for Step 2) and build.gradle.kts to disk instead of emitting inline code the eval scorer can't see. The eval add-structured-output-classify-issue had scored 0/0 in every run — baseline and with-context — because the file-reading scorer saw an empty solution directory. This was the last skill missing the Path: convention from the 0.4.4 rolloutevals/add-structured-output-classify-issue — Output Specification reworded to a need-only description (update the project so agent.run(...) returns IssueClassification directly). The file-write convention lives in the skill's Path: directives, not the task, so the baseline does not get the technique (plugin-evals No Bleeding)domain-model-subtask-pipeline-refuse, cache-llm-calls-refuses-provider-side, persist-chat-history-refuses-fact-store, snapshot-and-restore-refuses-crash, use-planner-refuses-when-graph-fits, add-persistence-refuses-conversation, scaffold-agent-refuse-non-empty-dir) now explicitly fail on an empty/missing solution. A stubbed (no-output) run previously passed them vacuously, scoring 35-60/100 on a non-answer and masquerading as negative lift in the suitesnapshot-and-restore Step 1 + add-persistence Step 2 — completed the crash-recovery redirect handoff. The snapshot Step 1 redirect previously said only "invoke add-persistence", so with the skill loaded the agent emitted a meta-description of the skill chain instead of the concrete install(Persistence) solution and a developer-facing message — eval snapshot-and-restore-refuses-crash regressed to with-context 42 vs baseline 66. The redirect now directs the agent to deliver the full Persistence solution plus a one-message snapshot-vs-Persistence mismatch explanation, and add-persistence Step 2 gained checkpoint-frequency cost guidance (every-step writes are expensive on long runs). With-context returned to 100evals/migrate-from-0-x-custom-strategy — second migration scenario covering the non-obvious 1.0 breaking changes a custom-strategy agent hits: the nodeExecuteTools auto-writeback removal (chain nodeLLMSendToolResults explicitly), the LLMClient HTTP-transport decoupling (KoogHttpClient.Factory instead of a Ktor HttpClient), and the kotlin.time.Clock → KoogClock swap, alongside the coordinate/JDK bumpsevals/wire-acp-server-choose-vs-a2a — protocol-selection scenario: a tooling dashboard needing run-lifecycle control with cancellation and progress streaming should pick ACP (agents-features-acp), not A2A (agent-to-agent RPC) or MCP (tool host). Highest-lift new scenario (+95 baseline→with-context)evals/add-structured-output-classify-issue — removed the answer-narrating clause from the task so the "does not introduce a custom strategy" criterion tests application rather than readingKnowledge corrections from Vadim Briliantov (Koog project lead) — three skill clarifications:
wire-mcp-server Step 6 — added a framing paragraph stating that @Tool / @LLMDescription / ToolSet are LOCAL Koog tool annotations. The startStdioMcpServer path bridges a Koog ToolRegistry to MCP, but that's a secondary use case for the annotation, not its primary purpose. For projects whose primary goal is publishing tools over MCP (independent of any Koog agent), the Kotlin MCP SDK (io.modelcontextprotocol:kotlin-sdk) has its own server annotation. The Koog-bridge path is right when you already have a Koog ToolRegistry and want it reachable over MCP tooadd-tool Step 3 (Sub-Agent-as-Tool) — added the sub-agent vs subgraph distinction. Sub-agents (AIAgentService.fromAgent) are fully independent agents that communicate only through typed input/output; subgraphs (subgraphWithTask / subgraphWithVerification) are part of the same agent and share one message history. Default for "break my agent into stages" is subgraph; reach for sub-agent only when isolation is the explicit requirementdomain-model-subtask-pipeline Step 6 — strengthened the auto-shared-history framing to name the contrast with independent-agent abstractions (Koog sub-agents, LangChain4j Agentic sub-agents). Subgraphs live on one common history; independent agents communicate only through typed input/output. Cross-references Skill(skill: "add-tool") Step 3 for the isolation caseuse-llm-node-variants Steps 1-4 — added the Path: write directive to each action (streaming / multiple-choice / moderation / force-one-tool). Eval 019e648a against 0.4.5 surfaced that use-llm-node-variants-streaming regressed to lift -87 (baseline 87 → with-context 0) with the reasoning "No Kotlin code was produced at all". Same file-write-gap root cause as the 0.3.1 nightmare — this was the one skill that hadn't been patched in PR #10's omnibus Path: rolloutevals/persist-chat-history-refuses-fact-store/criteria.json — re-weighted to favor functional correctness over prose explanation. The 0.4.5 eval showed the agent did the right thing functionally (refused JdbcChatHistory, picked LongTermMemory) but lost 35/100 on prose-explanation criteria (30 + 5) because the patched skills explicitly direct code-only output via Path:. New weights: Does not install JdbcChatHistory 35 (was 25), Recommends LongTermMemory 30 (was 25), Does not synthesise pseudo-turns 25 (was 15), Names the distinction 10 (was 30, reworded to accept a code comment), and dropped the standalone "Acknowledges the framing" criterion (was 5). Sum still 100wire-mcp-server Step 6 — optional server-side startStdioMcpServer flow for users authoring an MCP server (not just consuming one). Pulls ai.koog:agents-mcp-server-jvm:1.0.0-beta (same beta + -jvm-suffix gotchas as the client module). Exposes a ToolRegistry over stdio with awaitCancellation() keeping the process alivedomain-model-subtask-pipeline Step 7 — MultiLLMPromptExecutor callout for cross-provider per-subgraph model selection. The skill teaches per-phase llmModel = ... but the per-provider executors (simpleOpenAIExecutor / simpleAnthropicExecutor) only know one provider; when a strategy mixes (e.g., OpenAIModels.Chat.O3 for the verifier and AnthropicModels.Sonnet_4_5 for the deployer), the agent needs a MultiLLMPromptExecutor with one client per providerpersist-chat-history Step 5 — multi-turn footnote: the same agent instance can call agent.run(input, sessionId = ...) repeatedly; the installed chat-history backend accumulates the message log on each call, so a while (true) driver loop maintains conversation without reconstructing the agentpersist-chat-history Step 6 — new anti-pattern section: don't use chat-history as a fact store. Symptoms (date-prefixed pseudo-turns, synthetic Message.Assistant claims about actions the agent never took, queryable structured data forced into a sequential channel) and the right primitives for each shape (LongTermMemory for cross-session facts; @Tool for queryable structured data; systemPrompt for small fixed context; storage for run-scoped state). Custom ChatHistoryProvider stays legitimate for replaying real conversation messages from external sourcesadd-observability Step 3 — one-line clarification on setVerbose(true) semantics: it emits prompts, completions, and token counts on each spanevals/wire-mcp-server-author-stdio — positive scenario covering the new Step 6 decisional branch (publishing an existing ToolSet as a stdio MCP server). Criteria check for startStdioMcpServer, the agents-mcp-server-jvm:1.0.0-beta dependency, ToolRegistry { tools(asTools()) } reuse of the developer's existing class, awaitCancellation() for process lifetime, and refusal of the client-transport surfaceevals/persist-chat-history-refuses-fact-store — negative scenario covering the new Step 6 anti-pattern (don't use chat-history as a fact store). Criteria check that the agent names the chat-history-vs-fact-store distinction, recommends LongTermMemory or a @Tool, refuses to install JdbcChatHistory / ChatHistoryAws / ChatMemorySql, and refuses to synthesise pseudo-turns in a custom ChatHistoryProviderHardened skills against the file-write failure mode observed in 0.3.1 eval run 019e60f5 and confirmed in the partial re-run 019e613e: the scorer reads files from the solution directory, but several skills told the agent "produce ... as part of your response" — which the agent satisfied with stdout prose that the scorer can't see. Adopted the Path: convention from scaffold-agent across the patched skills so the file targets are explicit and unambiguous (full src/main/kotlin/com/example/<file> paths plus build.gradle.kts at repo root):
add-observability Step 3 — Path: src/main/kotlin/com/example/Main.kt for the modified agent construction, Path: build.gradle.kts for the dependencymanage-state Step 2 — Path: src/main/kotlin/com/example/Strategy.kt for the boundary-node bodypersist-chat-history Step 4 — Path: src/main/kotlin/com/example/Main.kt + Path: build.gradle.kts + a concrete handler path (e.g., src/main/kotlin/com/example/Routes.kt) when the user named a handleradd-tool Step 2 — Path: src/main/kotlin/com/example/AccountLookupTool.kt (concrete tool-name example, rename to match the actual tool) + Path: src/main/kotlin/com/example/Main.kt + Path: build.gradle.ktsauthor-strategy Step 8 — Path: src/main/kotlin/com/example/Strategy.kt for the DSL + Path: src/main/kotlin/com/example/Main.kt for the modified constructionhandle-agent-events Step 2 — Path: src/main/kotlin/com/example/Main.kt + Path: build.gradle.ktswire-ktor-server Step 5 — Path: src/main/kotlin/com/example/Application.kt + Path: build.gradle.ktsadd-tool Step 1 routing — annotated-tool default now defers to Step 2 (typed Tool<TArgs,TResult>) when the user's existing function takes a data class parameter or returns a typed result; the previous "default to Step 1" rule wrapped typed signatures in flat-primitive annotated tools and lost the type contract
handle-agent-events Step 2 — adopted the same Path: directive; round-2 eval 019e6149 showed the prior prose-only handoff was non-deterministic (round 1: 100, round 2: 0)
use-planner Step 1 redirect to author-strategy — made the redirect actionable: it now runs author-strategy end-to-end and writes the graph DSL code via Path: per author-strategy's Step 8, plus topology and round-trip-cost reasoning as comments at the top of the produced file; the previous wording let the agent stop at a prose explanation. Step 1 also adds a "Finish here — do not continue into planner-variant selection or Step 2 / Step 3" line so the redirect and planner-fits branches are mutually exclusive, and an explicit "Chaining exception (exhaustive — overrides 'Do not run other steps' only as listed)" preamble names the Step 1 → Step 2/3 chain per skill-authoring
Hardened skills against the "no-output" failure mode observed in 0.3.1 eval run 019e60f5:
add-observability Step 2 — replaced blocking "Ask the user which backend" with a non-blocking pick + OTLP default; Step 3 writes via Path:; fixes the −100pp lift on add-observability-langfusewire-ktor-server Step 2 — split into minimal install (mandatory) vs MCP/HOCON add-ons; Steps 3 and 4 each open with "Skip this step entirely if..."; Step 5 writes via Path:; fixes the −28pp lift on wire-ktor-server-routemanage-state Step 2 — committed to HistoryCompressionStrategy.WholeHistory as the default and moved the other six variants into "use only when the user names them"; fixes the −80pp lift on manage-state-tldr-mid-phaseuse-planner Step 1 — converted the "ask user LLM-based vs GOAP" stall into a pick-by-keywords rule (GOAP only when the user names typed state / classical planner / state space); planner-redirect tasks no longer block on a clarifying questionTightened skill activation routing for planner construction:
use-planner description now names Planners.llmBased, Planners.llmBasedWithCritic, Planners.goap, PlannerAIAgent, agents-plannerscaffold-agent description adds an explicit "Do NOT use when the user is constructing a planner / picking a strategy / naming a specific agent shape" exclusion; fixes the mis-activation that pulled scaffold-agent for use-planner-llm-based-triagecache-llm-calls-redis-shared eval scenario — retired per plugin-evals.md "Lift, Not Attainment": baseline 100/100, lift 0pp (Cause #1, universal competence). The partner scenario cache-llm-calls-refuses-provider-side still covers cache-llm-calls at +100pp lifthandle-agent-events-stdout-trace task — stripped the "arrow indicating start vs end" framing that bled into the criterion "Uses distinct visual markers for start vs end" (plugin-evals.md "No Bleeding")add-tool-typed-args-with-result task — stripped "they want the tool's input and output to remain these typed shapes — not a JSON blob, not a flattened String" framing that telegraphed Tool<TArgs,TResult>; the typed queryAccount signature still carries the constraintpersist-chat-history-jdbc task — replaced "wants conversations to persist by user account" framing with a user-reported bug ("the bot doesn't remember anything we talked about") so the agent must navigate persistence-feature vs chat-history vs LongTermMemory on its owndescription: fields in add-tool (Tool<TArgs,TResult> → Tool[TArgs,TResult]; <X> / <function> → prose), domain-model-subtask-pipeline (subgraphWithTask<In, Out> → subgraphWithTask[In, Out]; same for subgraphWithVerification<T> and CriticResult<T>), use-llm-node-variants (<tool> → prose). The tessl skill review validator rejects < followed by alpha as an XML tag — the just-landed skill-review CI gate (0.4.1) failed add-tool on the previous 0.4.2 publish attempt because of this. The 0.4.2 publish never landed in the registry (failed at the gate); 0.4.3 ships the same content as 0.4.2 plus this descriptor fixauthor-strategy skill — added a member-vs-extension import table at the end of Step 5. The DSL primitives split across two shapes: forwardTo / onCondition / transformed are infix members (no import needed); onToolCalls / onTextMessage / onIsInstance / onSuccessful / onFailure / asUserMessage / asToolResultMessage / onMessageParts are top-level extensions in ai.koog.agents.core.dsl.extension.* (each needs its own import). Inventing a member import or omitting an extension import is the most common copy-paste compile failureauthor-strategy + domain-model-subtask-pipeline skills — fixed the wrong artifact text. subgraphWithTask / subgraphWithVerification / CriticResult (package ai.koog.agents.ext.agent) ship inside agents-core (which koog-agents umbrella pulls), NOT the standalone ai.koog:agents-ext:1.0.0-beta artifact. Added the imports + the AIAgentGraphStrategy package note (ai.koog.agents.core.agent.entity, not bare ai.koog.agents.core.agent)wire-mcp-server skill — each transport builder (streamableHttp, fromSseUrl, fromProcess) is a top-level extension in ai.koog.agents.mcp declared separately from the McpToolRegistryProvider object. All three example blocks now show the explicit extension import alongside the provider importwire-mcp-server skill + module-coordinates rule — corrected the MCP client dependency to ai.koog:agents-mcp-jvm:1.0.0-beta. Koog 1.0 stable did not publish agents-mcp / agents-mcp-server at 1.0.0; only 1.0.0-beta is on Maven Central, and they publish only JVM variants so the -jvm suffix is mandatory for Gradle KMP variant resolutionadd-tool + wire-a2a skills — corrected the ToolSet and asTools imports to come from ai.koog.agents.core.tools.reflect.* (the actual package), not bare ai.koog.agents.core.tools.*module-coordinates rule — added Kotlin 2.3.10+ minimum requirement. Koog 1.0 is compiled with Kotlin 2.3.x; earlier Kotlin versions fail at consume time with metadata-version errorsevals/domain-model-subtask-pipeline-triage/criteria.json — corrected C7's required-Gradle-deps description: the umbrella ai.koog:koog-agents:1.0.0 is sufficient (it pulls agents-core, which contains the subgraph DSL). Penalize unnecessary agents-ext linesevals/author-strategy-import-shapes — new positive scenario testing the member-vs-extension import correctness when the agent emits a tool-handling-loop strategyevals/wire-mcp-server-import-shapes — new positive scenario testing the fromSseUrl extension import + the -jvm:1.0.0-beta dependency line. Updated wire-mcp-server-stdio-playwright, wire-mcp-server-streamable-http, and wire-mcp-server-merge-tools criteria to match the corrected artifact specCloses #9 (items 1–9; item 10 is a separate Tessl install-policy investigation).
.github/workflows/publish.yml — tessl skill review --threshold 85 gate before tessl tile publish . via jbaruch/coding-policy/.github/actions/skill-review@ef67ffe5 (changed-skills loop). Closes the context-artifacts Mandatory Review gap flagged on #7; below-threshold skill scores now block publish. Checkout step bumped to fetch-depth: 0 so the action's git diff $github.event.before..HEAD can resolve the prior commit. Closes #8domain-model-subtask-pipeline skill — the integrated pattern for typed-handoff pipelines: tools sliced by access into separate ToolSets (read / write / communication), @Serializable @LLMDescription-annotated data classes as inter-subtask contracts, subgraphWithTask<In, Out> per phase with per-phase model selection, subgraphWithVerification<T> + CriticResult<T> for self-correction loops. The methodology JetBrains' KotlinConf 2026 banking demo demonstrates — fills the gap left by author-strategy (DSL mechanics only) and add-structured-output (top-level typed output only)domain-model-subtask-pipeline-triage (positive, four-phase support workflow) and domain-model-subtask-pipeline-refuse (negative — declines to over-engineer a one-shot text transform).github/workflows/publish.yml — tesslio/patch-version-publish@v1 wired to push-on-main + manual dispatch; auto-bumps patch from the registry on future merges.github/workflows/review-openai.md / review-anthropic.md — paired gh-aw PR reviewers from jbaruch/coding-policy: install-reviewer (cross-family review per author-model-declaration).env.example — required hosted-CI secrets with placeholder values and a deep link to https://github.com/jbaruch/koog-tessl/settings/secrets/actions (per no-secrets).pre-commit-config.yaml — gitleaks v8.21.2 + standard pre-commit-hooks (per no-secrets pre-commit-scanning requirement)use-planner-refuses-when-graph-fits, cache-llm-calls-refuses-provider-side, add-persistence-refuses-conversation, snapshot-and-restore-refuses-crash — covering the cross-skill redirects each skill prescribes (closes the "only positive cases" gap from 0.3.0)use-attachments-pdf-and-url (PDF + URL-image attachments in a single LLM call); the existing use-attachments-image-input only exercised imagesscenario.json backfilled on every eval scenario (40 total) to match the canonical three-file shape tessl scenario generate emits — drift fix, not a featuretile.json summary from a 290-char comma-spliced multi-clause string back to a one-line description per skill-authoring.mdinteraction-rules phantom reference from 8 skill files (the rule never existed in this tile or in any consumed tile)see the \X` skill, redirect to `X`) to typed Skill(skill: "X")calls perskill-authoring.md` "Typed Calls"Verify and Hand Off, Bump JDK and Tooling, etc.) per skill-authoring.md "Step Structure"model-planner-subtasks-parallel-tree task to remove technique leak (PlannerNode composition, storage-key tracking) per plugin-evals.md "No Bleeding"use-attachments-pdf-and-url task on the cross-family reviewer's finding — dropped strategy / user message technique proseenable-prompt-caching-anthropic-long-system (43 chars) → enable-prompt-caching-anthropic (31) to fit the 40-char default capwire-mcp-{merge-with-existing-tools,stdio-playwright,streamable-http-github} → wire-mcp-server-{merge-tools,stdio-playwright,streamable-http} so prefixes match the skill name per plugin-evals.md "Naming"cache-llm-calls — in-process LLM-response cache (prompt-executor-cached + prompt-cache-{files,model,redis}), distinct from the provider-side caching covered by enable-prompt-cachingpersist-chat-history — chat-history persistence backends (chat-history-jdbc, chat-history-aws, chat-memory-sql), distinct from generic persistence and LongTermMemorytest-koog-agents — deterministic agent testing with agents-test (scripted executor, fake KoogClock, event-handler recorder)trace-agent-internals — deep diagnostic trace feature (agents-features-trace), distinct from OpenTelemetry (production signal) and event handlers (high-level callbacks)query-sql-from-agent — SQL-querying feature (agents-features-sql) with read-only mode, schema scoping, row capsmodel-planner-subtasks — PlannerNode tree composition, parallel vs sequential subtasks, retry-on-parse-failure edges, history compression between phasesuse-functional-agent — FunctionalAIAgent (the third concrete agent subtype, alongside GraphAIAgent and PlannerAIAgent) — single suspending block, no graphmodule-coordinates and agent-construction remain always-on — they cover gotchas every Koog project hits. The other 7 rules were converted to on-demand skills:
strategy-dsl → author-strategy skillplanner-vs-graph → use-planner skilltools-and-mcp → folded into add-tool and wire-mcp-server skills (already existed)state-and-memory → manage-state skillobservability → add-observability skillspring-boot-integration → wire-spring-boot skillmigration-from-0-x → migrate-from-0-x skilladd-structured-output, define-prompt, add-persistenceenable-prompt-caching, handle-agent-events, wire-ktor-serveruse-llm-node-variants (streaming / multiple-choices / moderation / force-one-tool), add-rag, wire-a2a, wire-acp-server, add-token-budgeting, snapshot-and-restore, use-attachmentsagent-construction rule now includes a "When to reach for a skill" index pointing to the right skill for each common taskscaffold-agent, add-tool, wire-mcp-server.gemini
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
scenario-44
scenario-45
scenario-46
scenario-47
skills
add-observability
add-persistence
add-rag
add-structured-output
add-token-budgeting
add-tool
cache-llm-calls
define-prompt
domain-model-subtask-pipeline
references
enable-prompt-caching
handle-agent-events
manage-state
migrate-from-0-x
model-planner-subtasks
persist-chat-history
query-sql-from-agent
scaffold-agent
snapshot-and-restore
test-koog-agents
trace-agent-internals
use-attachments
use-functional-agent
use-llm-node-variants
use-planner
wire-a2a
wire-acp-server
wire-ktor-server
wire-mcp-server
wire-spring-boot