Name: coding-agent-helpers/compact-debug-ledger
Rating: 99.3 (1 reviews)
Author: coding-agent-helpers

coding-agent-helpers/compact-debug-ledger

Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".

3.66x

Quality

100%

Does it follow best practices?

Impact

99%

3.66x

Average score across 8 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent omits dead conversational detail (latecomers being caught up, lunch breaks, 'any updates?', departures) and preserves only investigation-relevant content.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "No catch-up repetition",
      "description": "Output does NOT reproduce repeated catch-up explanations (e.g. 'Hi what's going on?', 'Can someone catch me up?', 'What did I miss?', the repeated re-summarizing of the situation for Jordan and Sam)",
      "max_score": 15
    },
    {
      "name": "No absence/return mentions",
      "description": "Output does NOT mention team members leaving for calls, lunch, or returning — these have no bearing on investigation state",
      "max_score": 12
    },
    {
      "name": "Evidence preserved",
      "description": "Output contains an '### Evidence' section that retains the key investigative facts: record count (~3.8M), UiPath/RPA automation, Cloudflare IP, logins only from UiPath range, Acme Corp half-admission",
      "max_score": 15
    },
    {
      "name": "Attempts section with labels",
      "description": "Output contains an '### Attempts' section where each attempt has a worked/failed/inconclusive label",
      "max_score": 10
    },
    {
      "name": "Ruled Out section",
      "description": "Output contains a '### Ruled Out' section",
      "max_score": 8
    },
    {
      "name": "Still Plausible section",
      "description": "Output contains a '### Still Plausible' section with hypotheses still under consideration (e.g. rogue employee vs deliberate exfiltration, TechFlow connection)",
      "max_score": 10
    },
    {
      "name": "Next Experiments section",
      "description": "Output contains a '### Next Experiments' section with 1-3 items",
      "max_score": 8
    },
    {
      "name": "Debug Target one sentence",
      "description": "Output contains a '### Debug Target' section with exactly one sentence",
      "max_score": 8
    },
    {
      "name": "No repeated context",
      "description": "Key facts are stated ONCE in the output — the same information is not repeated in multiple sections",
      "max_score": 7
    },
    {
      "name": "File saved",
      "description": "A file named security_investigation.md exists in the workspace",
      "max_score": 7
    }
  ]
}

evals

scenario-1

scenario-2

scenario-3

scenario-4

scenario-5

scenario-6

scenario-7

criteria.json

task.md

scenario-8

skills

tile.json

coding-agent-helpers/compact-debug-ledger

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-7/

criteria.jsonevals/scenario-7/