Koog 1.0 idioms, gotchas, and scaffolding skills for Kotlin agents on the JVM
87
88%
Does it follow best practices?
Impact
87%
1.85xAverage score across 45 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent integrates the full pattern — tools sliced by access, typed @LLMDescription handoff classes between phases, subgraphWithTask per phase with per-phase model selection, subgraphWithVerification + CriticResult for the verify/adjust loop. The developer named all the requirements; the question is whether the agent reaches for the integrated pattern rather than picking one piece and missing the others.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Tools sliced into communication / read / write ToolSets",
"description": "Produces three distinct ToolSet classes — one for communication (asking the customer), one for reading account data, one for mutating state — not a single monolithic ToolSet and not a feature-axis split. The grouping is by access pattern as the developer's phase constraints demand",
"max_score": 17
},
{
"name": "Each phase grants only the matching tool subset",
"description": "The understanding phase's subgraphWithTask receives communication + read tools (no write); the action phase receives read + write (no communication); the verification phase receives communication + read (no write); the adjustment phase receives read + write (the task says it re-runs the action — communication-only adjustment cannot apply the corrected fix). NOT every phase getting every tool",
"max_score": 13
},
{
"name": "Typed @LLMDescription'd data classes for inter-phase handoffs",
"description": "Defines @Serializable data classes (e.g., IssueSummary, Resolution, etc.) with @LLMDescription on the class and on every property. The phases consume and produce these typed shapes — not String handoffs and not Map<String, Any>. The contract is type-checked between phases",
"max_score": 25
},
{
"name": "Uses subgraphWithTask<In, Out> for each phase",
"description": "Each phase is wired with subgraphWithTask<InputType, OutputType> (or subgraphWithVerification for the verifier). The Input/Output type parameters match the handoff classes from the previous criterion. Does NOT inline each phase as a plain nodeLLMRequest in a single strategy",
"max_score": 15
},
{
"name": "Verification + adjust loop with CriticResult branching",
"description": "Verification is a subgraphWithVerification<T> whose CriticResult<T> drives two edges: success goes to nodeFinish (transformed { it.input }); failure goes to the adjust phase, coercing the nullable critic feedback to a non-null String via transformed { it.feedback.orEmpty() } and into a subgraphWithTask<String, _> adjust subgraph. The adjust phase has a back-edge to verification. NOT a fire-and-forget linear chain",
"max_score": 15
},
{
"name": "Mixed model selection per phase",
"description": "Picks distinct models per phase honoring the developer's stated preferences — a cheap model for understanding, a mid-tier model for action, a reasoning-tier model for verification. NOT the same model on all four phases",
"max_score": 10
},
{
"name": "Declares the required Gradle dependencies",
"description": "Names the Gradle dependency lines needed for the pipeline — at minimum the umbrella `ai.koog:koog-agents:1.0.0` (which transitively provides `subgraphWithTask` / `subgraphWithVerification` / `CriticResult` via `agents-core` — they do NOT require the standalone `agents-ext` beta artifact) plus whatever provider modules (`prompt-executor-openai-client`, `prompt-executor-anthropic-client`) the chosen models require. Penalize answers that add an unnecessary `ai.koog:agents-ext` line — those APIs ship with `agents-core`",
"max_score": 5
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
scenario-44
scenario-45
skills
add-observability
add-persistence
add-rag
add-structured-output
add-token-budgeting
add-tool
cache-llm-calls
define-prompt
domain-model-subtask-pipeline
references
enable-prompt-caching
handle-agent-events
manage-state
migrate-from-0-x
model-planner-subtasks
persist-chat-history
query-sql-from-agent
scaffold-agent
snapshot-and-restore
test-koog-agents
trace-agent-internals
use-attachments
use-functional-agent
use-llm-node-variants
use-planner
wire-a2a
wire-acp-server
wire-ktor-server
wire-mcp-server
wire-spring-boot