CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/compact-debug-ledger

Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".

99

3.66x
Quality

100%

Does it follow best practices?

Impact

99%

3.66x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-3/

{
  "context": "Tests whether the agent correctly separates disproved hypotheses into a 'Ruled Out' section and still-viable hypotheses into a 'Still Plausible' section, rather than mixing them or losing the distinction.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Ruled Out section present",
      "description": "Output contains a '### Ruled Out' section",
      "max_score": 10
    },
    {
      "name": "Ruled Out correct content",
      "description": "The Ruled Out section contains at least 3 of: clock skew (H1), Redis TTL (H3), DB connection pool interleaving (H6) — all of which were disproved",
      "max_score": 15
    },
    {
      "name": "Still Plausible section present",
      "description": "Output contains a '### Still Plausible' section",
      "max_score": 10
    },
    {
      "name": "Still Plausible correct content",
      "description": "The Still Plausible section includes at least 2 of: Kafka consumer rebalancing (H2), redlock race condition (H4), network partition (H5) — hypotheses not yet ruled out",
      "max_score": 15
    },
    {
      "name": "No hypothesis mixing",
      "description": "Hypotheses that are ruled out do NOT appear under Still Plausible, and vice versa — the two sections contain different hypotheses",
      "max_score": 10
    },
    {
      "name": "Debug Target section",
      "description": "Output contains a '### Debug Target' section with a one-sentence description of the problem",
      "max_score": 8
    },
    {
      "name": "Evidence section",
      "description": "Output contains an '### Evidence' section as a bullet list including key facts (e.g. duplicate rate, debit-only pattern, rebalancing correlation)",
      "max_score": 8
    },
    {
      "name": "Attempts section",
      "description": "Output contains an '### Attempts' section where each attempt has a worked/failed/inconclusive label",
      "max_score": 8
    },
    {
      "name": "Next Experiments present",
      "description": "Output contains a '### Next Experiments' section with 1-3 items",
      "max_score": 8
    },
    {
      "name": "Output saved to file",
      "description": "A file named handoff.md exists in the workspace",
      "max_score": 8
    }
  ]
}

evals

tile.json