CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/compact-debug-ledger

Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".

99

3.66x
Quality

100%

Does it follow best practices?

Impact

99%

3.66x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent produces a debug ledger with all required sections in the correct format, including Debug Target, Evidence, Attempts, Ruled Out, Still Plausible, and Next Experiments sections.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Debug Target section",
      "description": "Output contains a '### Debug Target' section header",
      "max_score": 8
    },
    {
      "name": "Debug Target one sentence",
      "description": "The content under Debug Target is exactly one sentence (a single sentence, not a paragraph or bullet list)",
      "max_score": 8
    },
    {
      "name": "Evidence section",
      "description": "Output contains an '### Evidence' section header",
      "max_score": 8
    },
    {
      "name": "Evidence as bullet facts",
      "description": "The content under Evidence is formatted as a bullet list of facts (lines starting with '- ')",
      "max_score": 8
    },
    {
      "name": "Attempts section",
      "description": "Output contains an '### Attempts' section header",
      "max_score": 8
    },
    {
      "name": "Attempt status labels",
      "description": "Each item under Attempts includes one of the labels 'worked', 'failed', or 'inconclusive' (e.g. '- <attempt>: worked')",
      "max_score": 10
    },
    {
      "name": "Ruled Out section",
      "description": "Output contains a '### Ruled Out' section header",
      "max_score": 8
    },
    {
      "name": "Still Plausible section",
      "description": "Output contains a '### Still Plausible' section header",
      "max_score": 8
    },
    {
      "name": "Next Experiments section",
      "description": "Output contains a '### Next Experiments' section header",
      "max_score": 8
    },
    {
      "name": "No off-topic detail",
      "description": "Output does NOT include conversational filler (e.g. coffee break mention, 'I joined late', routine social exchanges) that has no bearing on the investigation",
      "max_score": 9
    },
    {
      "name": "Evidence over chronology",
      "description": "Output is organized by investigation state (Evidence, Attempts, Hypotheses) rather than a timestamped chronological replay of events",
      "max_score": 9
    },
    {
      "name": "Output saved to file",
      "description": "A file named debug_ledger.md exists in the workspace",
      "max_score": 8
    }
  ]
}

evals

scenario-1

criteria.json

task.md

tile.json