{
  "context": "Tests whether the agent correctly distinguishes member-from-extension primitives in the Koog strategy DSL when emitting the import block. The natural tool-handling loop uses both shapes — `forwardTo` (member, no import), `onToolCalls` / `onTextMessage` (extensions in `ai.koog.agents.core.dsl.extension.*`, need explicit imports). Agents that copy-paste the call shape without resolving the import side typically over-import members or under-import extensions, which is the exact compile error this rubric grades against.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Imports the `onToolCalls` extension explicitly",
      "description": "The produced file contains a line like `import ai.koog.agents.core.dsl.extension.onToolCalls` (either fully-qualified named or as part of a star-import covering the same package). Without this import the `onToolCalls { ... }` edge call does not resolve",
      "max_score": 22
    },
    {
      "name": "Imports the `onTextMessage` extension explicitly",
      "description": "Same as above for `onTextMessage` — `import ai.koog.agents.core.dsl.extension.onTextMessage` (or a star-import covering the package). The text-reply branch of the loop calls this extension",
      "max_score": 22
    },
    {
      "name": "Does not invent a member import for `forwardTo`",
      "description": "The import block does NOT contain a line like `import ai.koog.agents.core.dsl.builder.forwardTo` (or any other variant of a top-level `forwardTo` import). `forwardTo` is an infix member on `AIAgentNodeBase`; inventing a top-level import for it produces an unresolved-reference error at compile",
      "max_score": 18
    },
    {
      "name": "Uses both edge primitives the loop needs",
      "description": "The strategy body contains an `edge(... forwardTo nodeExecuteTool onToolCalls { ... })`-shape edge AND an `edge(... forwardTo nodeFinish onTextMessage { ... })`-shape edge (variable names may differ). Both branches are present — the agent didn't shortcut the loop into a single linear chain",
      "max_score": 18
    },
    {
      "name": "Chains `nodeExecuteTools` to `nodeLLMSendToolResults` explicitly",
      "description": "After `nodeExecuteTools()` produces `ReceivedToolResults`, there is an edge feeding those into `nodeLLMSendToolResults()`. The 1.0 surface dropped the auto-writeback variant — without this edge the LLM never sees what its tools returned",
      "max_score": 12
    },
    {
      "name": "Wires the loop back through `nodeExecuteTool` on continued tool calls",
      "description": "There is a back-edge from `nodeLLMSendToolResults` to `nodeExecuteTools` gated by `onToolCalls { ... }`, so a multi-step tool sequence loops rather than dying after the first tool result",
      "max_score": 8
    }
  ]
}

evals

scenario-1

scenario-2

scenario-3

scenario-4

scenario-5

scenario-6

scenario-7

scenario-8

scenario-9

scenario-10

scenario-11

scenario-12

scenario-13

scenario-14

scenario-15

scenario-16

scenario-17

scenario-18

scenario-19

scenario-20

scenario-21

scenario-22

scenario-23

scenario-24

scenario-25

scenario-26

scenario-27

scenario-28

scenario-29

scenario-30

scenario-31

scenario-32

scenario-33

scenario-34

scenario-35

scenario-36

scenario-37

scenario-38

criteria.json

task.md

scenario-39

scenario-40

scenario-41

scenario-42

scenario-43

rules

README.md

tile.json

jbaruch/koog

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-38/

criteria.jsonevals/scenario-38/