CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/compact-debug-ledger

Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".

99

3.66x
Quality

100%

Does it follow best practices?

Impact

99%

3.66x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-4/

{
  "context": "Tests whether the agent limits the Next Experiments section to 1-3 items that most reduce uncertainty, rather than listing every possible action from a long brainstorm list.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Next Experiments section present",
      "description": "Output contains a '### Next Experiments' section",
      "max_score": 10
    },
    {
      "name": "Experiments count limit",
      "description": "The Next Experiments section contains between 1 and 3 items (inclusive) — NOT 4 or more",
      "max_score": 20
    },
    {
      "name": "Experiments reduce uncertainty",
      "description": "The listed experiments are investigation steps that would resolve remaining uncertainty (e.g. merging/validating the health-check fix, checking for the same pattern elsewhere) — not administrative tasks like writing docs or updating dashboards",
      "max_score": 15
    },
    {
      "name": "Debug Target section",
      "description": "Output contains a '### Debug Target' section with a one-sentence description",
      "max_score": 8
    },
    {
      "name": "Evidence section",
      "description": "Output contains an '### Evidence' section as a bullet list",
      "max_score": 8
    },
    {
      "name": "Attempts section with labels",
      "description": "Output contains an '### Attempts' section where each attempt has a worked/failed/inconclusive label",
      "max_score": 10
    },
    {
      "name": "Ruled Out section",
      "description": "Output contains a '### Ruled Out' section",
      "max_score": 8
    },
    {
      "name": "Still Plausible section",
      "description": "Output contains a '### Still Plausible' section",
      "max_score": 8
    },
    {
      "name": "File created",
      "description": "A file named ci_debug.md exists in the workspace",
      "max_score": 5
    },
    {
      "name": "No dead detail",
      "description": "Output does NOT include unrelated items (post-mortems, doc updates, dashboards) that don't affect investigation progress",
      "max_score": 8
    }
  ]
}

evals

tile.json