CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/koog

Koog 1.0 idioms, gotchas, and scaffolding skills for Kotlin agents on the JVM

89

1.78x
Quality

89%

Does it follow best practices?

Impact

89%

1.78x

Average score across 47 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

CHANGELOG.md

Changelog

All notable changes to this tile are documented here. Format: Keep a Changelog. Versioning: SemVer.

[0.4.8] — 2026-05-29

Fixed

  • snapshot-and-restore Step 1 + add-persistence Step 2 — completed the crash-recovery redirect handoff. The snapshot Step 1 redirect previously said only "invoke add-persistence", so with the skill loaded the agent emitted a meta-description of the skill chain instead of the concrete install(Persistence) solution and a developer-facing message — eval snapshot-and-restore-refuses-crash regressed to with-context 42 vs baseline 66. The redirect now directs the agent to deliver the full Persistence solution plus a one-message snapshot-vs-Persistence mismatch explanation, and add-persistence Step 2 gained checkpoint-frequency cost guidance (every-step writes are expensive on long runs). With-context returned to 100

Added

  • evals/migrate-from-0-x-custom-strategy — second migration scenario covering the non-obvious 1.0 breaking changes a custom-strategy agent hits: the nodeExecuteTools auto-writeback removal (chain nodeLLMSendToolResults explicitly), the LLMClient HTTP-transport decoupling (KoogHttpClient.Factory instead of a Ktor HttpClient), and the kotlin.time.ClockKoogClock swap, alongside the coordinate/JDK bumps
  • evals/wire-acp-server-choose-vs-a2a — protocol-selection scenario: a tooling dashboard needing run-lifecycle control with cancellation and progress streaming should pick ACP (agents-features-acp), not A2A (agent-to-agent RPC) or MCP (tool host). Highest-lift new scenario (+95 baseline→with-context)

Changed

  • evals/add-structured-output-classify-issue — removed the answer-narrating clause from the task so the "does not introduce a custom strategy" criterion tests application rather than reading

[0.4.7] — 2026-05-29

Changed

Knowledge corrections from Vadim Briliantov (Koog project lead) — three skill clarifications:

  • wire-mcp-server Step 6 — added a framing paragraph stating that @Tool / @LLMDescription / ToolSet are LOCAL Koog tool annotations. The startStdioMcpServer path bridges a Koog ToolRegistry to MCP, but that's a secondary use case for the annotation, not its primary purpose. For projects whose primary goal is publishing tools over MCP (independent of any Koog agent), the Kotlin MCP SDK (io.modelcontextprotocol:kotlin-sdk) has its own server annotation. The Koog-bridge path is right when you already have a Koog ToolRegistry and want it reachable over MCP too
  • add-tool Step 3 (Sub-Agent-as-Tool) — added the sub-agent vs subgraph distinction. Sub-agents (AIAgentService.fromAgent) are fully independent agents that communicate only through typed input/output; subgraphs (subgraphWithTask / subgraphWithVerification) are part of the same agent and share one message history. Default for "break my agent into stages" is subgraph; reach for sub-agent only when isolation is the explicit requirement
  • domain-model-subtask-pipeline Step 6 — strengthened the auto-shared-history framing to name the contrast with independent-agent abstractions (Koog sub-agents, LangChain4j Agentic sub-agents). Subgraphs live on one common history; independent agents communicate only through typed input/output. Cross-references Skill(skill: "add-tool") Step 3 for the isolation case

[0.4.6] — 2026-05-28

Fixed

  • use-llm-node-variants Steps 1-4 — added the Path: write directive to each action (streaming / multiple-choice / moderation / force-one-tool). Eval 019e648a against 0.4.5 surfaced that use-llm-node-variants-streaming regressed to lift -87 (baseline 87 → with-context 0) with the reasoning "No Kotlin code was produced at all". Same file-write-gap root cause as the 0.3.1 nightmare — this was the one skill that hadn't been patched in PR #10's omnibus Path: rollout
  • evals/persist-chat-history-refuses-fact-store/criteria.json — re-weighted to favor functional correctness over prose explanation. The 0.4.5 eval showed the agent did the right thing functionally (refused JdbcChatHistory, picked LongTermMemory) but lost 35/100 on prose-explanation criteria (30 + 5) because the patched skills explicitly direct code-only output via Path:. New weights: Does not install JdbcChatHistory 35 (was 25), Recommends LongTermMemory 30 (was 25), Does not synthesise pseudo-turns 25 (was 15), Names the distinction 10 (was 30, reworded to accept a code comment), and dropped the standalone "Acknowledges the framing" criterion (was 5). Sum still 100

[0.4.5] — 2026-05-26

Added

  • wire-mcp-server Step 6 — optional server-side startStdioMcpServer flow for users authoring an MCP server (not just consuming one). Pulls ai.koog:agents-mcp-server-jvm:1.0.0-beta (same beta + -jvm-suffix gotchas as the client module). Exposes a ToolRegistry over stdio with awaitCancellation() keeping the process alive
  • domain-model-subtask-pipeline Step 7 — MultiLLMPromptExecutor callout for cross-provider per-subgraph model selection. The skill teaches per-phase llmModel = ... but the per-provider executors (simpleOpenAIExecutor / simpleAnthropicExecutor) only know one provider; when a strategy mixes (e.g., OpenAIModels.Chat.O3 for the verifier and AnthropicModels.Sonnet_4_5 for the deployer), the agent needs a MultiLLMPromptExecutor with one client per provider
  • persist-chat-history Step 5 — multi-turn footnote: the same agent instance can call agent.run(input, sessionId = ...) repeatedly; the installed chat-history backend accumulates the message log on each call, so a while (true) driver loop maintains conversation without reconstructing the agent
  • persist-chat-history Step 6 — new anti-pattern section: don't use chat-history as a fact store. Symptoms (date-prefixed pseudo-turns, synthetic Message.Assistant claims about actions the agent never took, queryable structured data forced into a sequential channel) and the right primitives for each shape (LongTermMemory for cross-session facts; @Tool for queryable structured data; systemPrompt for small fixed context; storage for run-scoped state). Custom ChatHistoryProvider stays legitimate for replaying real conversation messages from external sources
  • add-observability Step 3 — one-line clarification on setVerbose(true) semantics: it emits prompts, completions, and token counts on each span
  • evals/wire-mcp-server-author-stdio — positive scenario covering the new Step 6 decisional branch (publishing an existing ToolSet as a stdio MCP server). Criteria check for startStdioMcpServer, the agents-mcp-server-jvm:1.0.0-beta dependency, ToolRegistry { tools(asTools()) } reuse of the developer's existing class, awaitCancellation() for process lifetime, and refusal of the client-transport surface
  • evals/persist-chat-history-refuses-fact-store — negative scenario covering the new Step 6 anti-pattern (don't use chat-history as a fact store). Criteria check that the agent names the chat-history-vs-fact-store distinction, recommends LongTermMemory or a @Tool, refuses to install JdbcChatHistory / ChatHistoryAws / ChatMemorySql, and refuses to synthesise pseudo-turns in a custom ChatHistoryProvider

[0.4.4] — 2026-05-26

Changed

  • Hardened skills against the file-write failure mode observed in 0.3.1 eval run 019e60f5 and confirmed in the partial re-run 019e613e: the scorer reads files from the solution directory, but several skills told the agent "produce ... as part of your response" — which the agent satisfied with stdout prose that the scorer can't see. Adopted the Path: convention from scaffold-agent across the patched skills so the file targets are explicit and unambiguous (full src/main/kotlin/com/example/<file> paths plus build.gradle.kts at repo root):

    • add-observability Step 3 — Path: src/main/kotlin/com/example/Main.kt for the modified agent construction, Path: build.gradle.kts for the dependency
    • manage-state Step 2 — Path: src/main/kotlin/com/example/Strategy.kt for the boundary-node body
    • persist-chat-history Step 4 — Path: src/main/kotlin/com/example/Main.kt + Path: build.gradle.kts + a concrete handler path (e.g., src/main/kotlin/com/example/Routes.kt) when the user named a handler
    • add-tool Step 2 — Path: src/main/kotlin/com/example/AccountLookupTool.kt (concrete tool-name example, rename to match the actual tool) + Path: src/main/kotlin/com/example/Main.kt + Path: build.gradle.kts
    • author-strategy Step 8 — Path: src/main/kotlin/com/example/Strategy.kt for the DSL + Path: src/main/kotlin/com/example/Main.kt for the modified construction
    • handle-agent-events Step 2 — Path: src/main/kotlin/com/example/Main.kt + Path: build.gradle.kts
    • wire-ktor-server Step 5 — Path: src/main/kotlin/com/example/Application.kt + Path: build.gradle.kts
  • add-tool Step 1 routing — annotated-tool default now defers to Step 2 (typed Tool<TArgs,TResult>) when the user's existing function takes a data class parameter or returns a typed result; the previous "default to Step 1" rule wrapped typed signatures in flat-primitive annotated tools and lost the type contract

  • handle-agent-events Step 2 — adopted the same Path: directive; round-2 eval 019e6149 showed the prior prose-only handoff was non-deterministic (round 1: 100, round 2: 0)

  • use-planner Step 1 redirect to author-strategy — made the redirect actionable: it now runs author-strategy end-to-end and writes the graph DSL code via Path: per author-strategy's Step 8, plus topology and round-trip-cost reasoning as comments at the top of the produced file; the previous wording let the agent stop at a prose explanation. Step 1 also adds a "Finish here — do not continue into planner-variant selection or Step 2 / Step 3" line so the redirect and planner-fits branches are mutually exclusive, and an explicit "Chaining exception (exhaustive — overrides 'Do not run other steps' only as listed)" preamble names the Step 1 → Step 2/3 chain per skill-authoring

  • Hardened skills against the "no-output" failure mode observed in 0.3.1 eval run 019e60f5:

    • add-observability Step 2 — replaced blocking "Ask the user which backend" with a non-blocking pick + OTLP default; Step 3 writes via Path:; fixes the −100pp lift on add-observability-langfuse
    • wire-ktor-server Step 2 — split into minimal install (mandatory) vs MCP/HOCON add-ons; Steps 3 and 4 each open with "Skip this step entirely if..."; Step 5 writes via Path:; fixes the −28pp lift on wire-ktor-server-route
    • manage-state Step 2 — committed to HistoryCompressionStrategy.WholeHistory as the default and moved the other six variants into "use only when the user names them"; fixes the −80pp lift on manage-state-tldr-mid-phase
    • use-planner Step 1 — converted the "ask user LLM-based vs GOAP" stall into a pick-by-keywords rule (GOAP only when the user names typed state / classical planner / state space); planner-redirect tasks no longer block on a clarifying question
  • Tightened skill activation routing for planner construction:

    • use-planner description now names Planners.llmBased, Planners.llmBasedWithCritic, Planners.goap, PlannerAIAgent, agents-planner
    • scaffold-agent description adds an explicit "Do NOT use when the user is constructing a planner / picking a strategy / naming a specific agent shape" exclusion; fixes the mis-activation that pulled scaffold-agent for use-planner-llm-based-triage

Removed

  • cache-llm-calls-redis-shared eval scenario — retired per plugin-evals.md "Lift, Not Attainment": baseline 100/100, lift 0pp (Cause #1, universal competence). The partner scenario cache-llm-calls-refuses-provider-side still covers cache-llm-calls at +100pp lift

Fixed

  • handle-agent-events-stdout-trace task — stripped the "arrow indicating start vs end" framing that bled into the criterion "Uses distinct visual markers for start vs end" (plugin-evals.md "No Bleeding")
  • add-tool-typed-args-with-result task — stripped "they want the tool's input and output to remain these typed shapes — not a JSON blob, not a flattened String" framing that telegraphed Tool<TArgs,TResult>; the typed queryAccount signature still carries the constraint
  • persist-chat-history-jdbc task — replaced "wants conversations to persist by user account" framing with a user-reported bug ("the bot doesn't remember anything we talked about") so the agent must navigate persistence-feature vs chat-history vs LongTermMemory on its own

[0.4.3] — 2026-05-26

Fixed

  • Stripped XML-tag-looking syntax from skill description: fields in add-tool (Tool<TArgs,TResult>Tool[TArgs,TResult]; <X> / <function> → prose), domain-model-subtask-pipeline (subgraphWithTask<In, Out>subgraphWithTask[In, Out]; same for subgraphWithVerification<T> and CriticResult<T>), use-llm-node-variants (<tool> → prose). The tessl skill review validator rejects < followed by alpha as an XML tag — the just-landed skill-review CI gate (0.4.1) failed add-tool on the previous 0.4.2 publish attempt because of this. The 0.4.2 publish never landed in the registry (failed at the gate); 0.4.3 ships the same content as 0.4.2 plus this descriptor fix

[0.4.2] — 2026-05-26 (never published)

Fixed

  • author-strategy skill — added a member-vs-extension import table at the end of Step 5. The DSL primitives split across two shapes: forwardTo / onCondition / transformed are infix members (no import needed); onToolCalls / onTextMessage / onIsInstance / onSuccessful / onFailure / asUserMessage / asToolResultMessage / onMessageParts are top-level extensions in ai.koog.agents.core.dsl.extension.* (each needs its own import). Inventing a member import or omitting an extension import is the most common copy-paste compile failure
  • author-strategy + domain-model-subtask-pipeline skills — fixed the wrong artifact text. subgraphWithTask / subgraphWithVerification / CriticResult (package ai.koog.agents.ext.agent) ship inside agents-core (which koog-agents umbrella pulls), NOT the standalone ai.koog:agents-ext:1.0.0-beta artifact. Added the imports + the AIAgentGraphStrategy package note (ai.koog.agents.core.agent.entity, not bare ai.koog.agents.core.agent)
  • wire-mcp-server skill — each transport builder (streamableHttp, fromSseUrl, fromProcess) is a top-level extension in ai.koog.agents.mcp declared separately from the McpToolRegistryProvider object. All three example blocks now show the explicit extension import alongside the provider import
  • wire-mcp-server skill + module-coordinates rule — corrected the MCP client dependency to ai.koog:agents-mcp-jvm:1.0.0-beta. Koog 1.0 stable did not publish agents-mcp / agents-mcp-server at 1.0.0; only 1.0.0-beta is on Maven Central, and they publish only JVM variants so the -jvm suffix is mandatory for Gradle KMP variant resolution
  • add-tool + wire-a2a skills — corrected the ToolSet and asTools imports to come from ai.koog.agents.core.tools.reflect.* (the actual package), not bare ai.koog.agents.core.tools.*
  • module-coordinates rule — added Kotlin 2.3.10+ minimum requirement. Koog 1.0 is compiled with Kotlin 2.3.x; earlier Kotlin versions fail at consume time with metadata-version errors
  • evals/domain-model-subtask-pipeline-triage/criteria.json — corrected C7's required-Gradle-deps description: the umbrella ai.koog:koog-agents:1.0.0 is sufficient (it pulls agents-core, which contains the subgraph DSL). Penalize unnecessary agents-ext lines

Added

  • evals/author-strategy-import-shapes — new positive scenario testing the member-vs-extension import correctness when the agent emits a tool-handling-loop strategy
  • evals/wire-mcp-server-import-shapes — new positive scenario testing the fromSseUrl extension import + the -jvm:1.0.0-beta dependency line. Updated wire-mcp-server-stdio-playwright, wire-mcp-server-streamable-http, and wire-mcp-server-merge-tools criteria to match the corrected artifact spec

Closes #9 (items 1–9; item 10 is a separate Tessl install-policy investigation).

[0.4.1] — 2026-05-26

Added

  • .github/workflows/publish.ymltessl skill review --threshold 85 gate before tessl tile publish . via jbaruch/coding-policy/.github/actions/skill-review@ef67ffe5 (changed-skills loop). Closes the context-artifacts Mandatory Review gap flagged on #7; below-threshold skill scores now block publish. Checkout step bumped to fetch-depth: 0 so the action's git diff $github.event.before..HEAD can resolve the prior commit. Closes #8

[0.4.0] — 2026-05-25

Added

  • domain-model-subtask-pipeline skill — the integrated pattern for typed-handoff pipelines: tools sliced by access into separate ToolSets (read / write / communication), @Serializable @LLMDescription-annotated data classes as inter-subtask contracts, subgraphWithTask<In, Out> per phase with per-phase model selection, subgraphWithVerification<T> + CriticResult<T> for self-correction loops. The methodology JetBrains' KotlinConf 2026 banking demo demonstrates — fills the gap left by author-strategy (DSL mechanics only) and add-structured-output (top-level typed output only)
  • 2 eval scenarios — domain-model-subtask-pipeline-triage (positive, four-phase support workflow) and domain-model-subtask-pipeline-refuse (negative — declines to over-engineer a one-shot text transform)

[0.3.1] — 2026-05-25

Added

  • .github/workflows/publish.ymltesslio/patch-version-publish@v1 wired to push-on-main + manual dispatch; auto-bumps patch from the registry on future merges
  • .github/workflows/review-openai.md / review-anthropic.md — paired gh-aw PR reviewers from jbaruch/coding-policy: install-reviewer (cross-family review per author-model-declaration)
  • .env.example — required hosted-CI secrets with placeholder values and a deep link to https://github.com/jbaruch/koog-tessl/settings/secrets/actions (per no-secrets)
  • .pre-commit-config.yaml — gitleaks v8.21.2 + standard pre-commit-hooks (per no-secrets pre-commit-scanning requirement)
  • 4 negative eval scenarios — use-planner-refuses-when-graph-fits, cache-llm-calls-refuses-provider-side, add-persistence-refuses-conversation, snapshot-and-restore-refuses-crash — covering the cross-skill redirects each skill prescribes (closes the "only positive cases" gap from 0.3.0)
  • 1 generator-produced eval scenario — use-attachments-pdf-and-url (PDF + URL-image attachments in a single LLM call); the existing use-attachments-image-input only exercised images
  • scenario.json backfilled on every eval scenario (40 total) to match the canonical three-file shape tessl scenario generate emits — drift fix, not a feature

Changed

  • Slimmed tile.json summary from a 290-char comma-spliced multi-clause string back to a one-line description per skill-authoring.md
  • Stripped interaction-rules phantom reference from 8 skill files (the rule never existed in this tile or in any consumed tile)
  • Converted 7 prose skill cross-references (see the \X` skill, redirect to `X`) to typed Skill(skill: "X")calls perskill-authoring.md` "Typed Calls"
  • Split 6 step titles that combined verbs with "and" (Verify and Hand Off, Bump JDK and Tooling, etc.) per skill-authoring.md "Step Structure"
  • Reworded model-planner-subtasks-parallel-tree task to remove technique leak (PlannerNode composition, storage-key tracking) per plugin-evals.md "No Bleeding"
  • Reworded use-attachments-pdf-and-url task on the cross-family reviewer's finding — dropped strategy / user message technique prose
  • Renamed enable-prompt-caching-anthropic-long-system (43 chars) → enable-prompt-caching-anthropic (31) to fit the 40-char default cap
  • Renamed wire-mcp-{merge-with-existing-tools,stdio-playwright,streamable-http-github}wire-mcp-server-{merge-tools,stdio-playwright,streamable-http} so prefixes match the skill name per plugin-evals.md "Naming"

[0.3.0] — 2026-05-25

Added

  • 7 additional skills covering modules and API surfaces missed in 0.2.0:
    • cache-llm-calls — in-process LLM-response cache (prompt-executor-cached + prompt-cache-{files,model,redis}), distinct from the provider-side caching covered by enable-prompt-caching
    • persist-chat-history — chat-history persistence backends (chat-history-jdbc, chat-history-aws, chat-memory-sql), distinct from generic persistence and LongTermMemory
    • test-koog-agents — deterministic agent testing with agents-test (scripted executor, fake KoogClock, event-handler recorder)
    • trace-agent-internals — deep diagnostic trace feature (agents-features-trace), distinct from OpenTelemetry (production signal) and event handlers (high-level callbacks)
    • query-sql-from-agent — SQL-querying feature (agents-features-sql) with read-only mode, schema scoping, row caps
    • model-planner-subtasksPlannerNode tree composition, parallel vs sequential subtasks, retry-on-parse-failure edges, history compression between phases
    • use-functional-agentFunctionalAIAgent (the third concrete agent subtype, alongside GraphAIAgent and PlannerAIAgent) — single suspending block, no graph
  • 7 new eval scenarios — 1 per new skill, weighted-checklist with non-uniform weights summing to 100
  • Scope statement in README clarified: Kotlin/JVM only; Kotlin/JS, Kotlin/Native, Compose Multiplatform explicitly out of scope

[0.2.0] — 2026-05-25

Changed (breaking)

  • Slimmed rules from 9 to 2. Only module-coordinates and agent-construction remain always-on — they cover gotchas every Koog project hits. The other 7 rules were converted to on-demand skills:
    • strategy-dslauthor-strategy skill
    • planner-vs-graphuse-planner skill
    • tools-and-mcp → folded into add-tool and wire-mcp-server skills (already existed)
    • state-and-memorymanage-state skill
    • observabilityadd-observability skill
    • spring-boot-integrationwire-spring-boot skill
    • migration-from-0-xmigrate-from-0-x skill
  • Front-loaded token cost dropped from ~6.9k to ~1.3k

Added

  • 19 new skills filling Koog 1.0 surface gaps not covered in 0.1.0:
    • High priority: add-structured-output, define-prompt, add-persistence
    • Medium priority: enable-prompt-caching, handle-agent-events, wire-ktor-server
    • Lower priority but real coverage: use-llm-node-variants (streaming / multiple-choices / moderation / force-one-tool), add-rag, wire-a2a, wire-acp-server, add-token-budgeting, snapshot-and-restore, use-attachments
  • 19 new eval scenarios — 1 per new skill, all weighted-checklist with non-uniform weights summing to 100
  • agent-construction rule now includes a "When to reach for a skill" index pointing to the right skill for each common task

[0.1.0] — 2026-05-25

Added

  • Initial tile targeting Koog 1.0.0 (released 2026-05-21)
  • 9 always-apply rules covering module coordinates, agent construction, the strategy DSL, planner vs graph, tools & MCP, state & memory, observability, Spring Boot integration, and the 0.x → 1.0 migration surface
  • 3 skills: scaffold-agent, add-tool, wire-mcp-server
  • 9 eval scenarios (3 per skill)
  • Kotlin-only scope; Java-interop surface deferred to a future sibling tile

CHANGELOG.md

README.md

tessl.json

tile.json