Name: coding-agent-helpers/compact-debug-ledger
Rating: 99.3 (1 reviews)
Author: coding-agent-helpers

coding-agent-helpers/compact-debug-ledger

Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".

3.66x

Quality

100%

Does it follow best practices?

Impact

99%

3.66x

Average score across 8 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent correctly labels each attempt in the Attempts section with 'worked', 'failed', or 'inconclusive', as required by the skill's output format.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Attempts section present",
      "description": "Output contains an '### Attempts' section",
      "max_score": 8
    },
    {
      "name": "All attempts labeled",
      "description": "Every item listed under Attempts includes one of the outcome labels: 'worked', 'failed', or 'inconclusive' (not just described narratively)",
      "max_score": 15
    },
    {
      "name": "Correct worked labels",
      "description": "Attempts that had a positive effect (LRU cache fix, heap snapshot collection, per-endpoint tracking) are labeled 'worked' or 'inconclusive' — NOT 'failed'",
      "max_score": 12
    },
    {
      "name": "Correct failed labels",
      "description": "Attempts that had no effect (npm audit scan, log buffer flush) are labeled 'failed'",
      "max_score": 12
    },
    {
      "name": "Ruled Out section",
      "description": "Output contains a '### Ruled Out' section that includes the disproved hypotheses (session middleware, elastic-client connection leak)",
      "max_score": 10
    },
    {
      "name": "Still Plausible section",
      "description": "Output contains a '### Still Plausible' section that includes the remaining viable hypotheses (EventEmitter listener leak not yet fully confirmed in prod)",
      "max_score": 10
    },
    {
      "name": "Next Experiments section",
      "description": "Output contains a '### Next Experiments' section",
      "max_score": 8
    },
    {
      "name": "Next experiments count",
      "description": "Between 1 and 3 next experiments are listed (not 0, not more than 3)",
      "max_score": 10
    },
    {
      "name": "Evidence section",
      "description": "Output contains an '### Evidence' section with bullet-list facts from the investigation",
      "max_score": 7
    },
    {
      "name": "Debug Target section",
      "description": "Output contains a '### Debug Target' section with a single-sentence description of the bug",
      "max_score": 8
    }
  ]
}

evals

scenario-1

scenario-2

criteria.json

task.md

scenario-3

scenario-4

scenario-5

scenario-6

scenario-7

scenario-8

skills

tile.json

coding-agent-helpers/compact-debug-ledger

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/