Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".
99
100%
Does it follow best practices?
Impact
99%
3.66xAverage score across 8 eval scenarios
Passed
No known issues
Your team has been investigating a production incident for the past 3 hours. The conversation has grown long and messy — people have gone back and forth, explored dead ends, and made progress that's now buried in chat history. A new engineer is joining the investigation and needs to get up to speed quickly without reading the entire conversation.
Produce a concise investigation summary from the transcript below that captures where things stand. Save the summary to a file called debug_ledger.md.
The following file is provided as input. Extract it before beginning.
=============== FILE: inputs/session.md ===============
[09:02] Alice: Hey team, we're getting a flood of 500 errors on the /checkout endpoint starting around 8:55am. Error rate jumped from <0.1% to about 12%.
[09:04] Bob: Let me check the logs. Yeah I see a lot of "connection pool exhausted" errors in the app logs.
[09:05] Alice: Could be the database. Did we deploy anything this morning?
[09:06] Bob: Yes, we deployed v2.3.1 at 8:50am. That release only changed the product recommendation engine though, shouldn't touch checkout.
[09:08] Alice: Let me check if it's the DB. I'll query the slow query log. Wait, actually let me grab coffee first.
[09:12] Bob: While you were gone I checked — DB CPU is at 22%, totally normal. Query latency is also fine, p99 is 45ms which is baseline.
[09:14] Alice: Oh interesting. So not the DB itself. Could be connection pool misconfiguration in the new deploy?
[09:15] Carol: I joined late, what's the issue?
[09:16] Alice: 500s on checkout since 8:55, connection pool exhausted errors.
[09:17] Carol: Have you checked thread pool? The new recommendation engine runs async workers.
[09:18] Bob: Good call. I see the recommendation service is spawning workers but not releasing them. The thread pool is at 98% capacity.
[09:20] Alice: Is it related to the deploy? Was this worker leak there before?
[09:22] Bob: I checked git blame. The async worker lifecycle code was changed in v2.3.1. There's a code path where if the recommendation API times out, the worker goroutine is never cleaned up.
[09:23] Carol: So the recommendation API is timing out? Let me check its latency... yeah it's showing p99 of 12 seconds, way above the 2 second timeout.
[09:25] Alice: So the recommendation API is slow → causes timeout → worker goroutines leak → thread pool exhausts → checkout requests fail. That's the chain.
[09:26] Bob: Makes sense. We could fix by: (a) rolling back v2.3.1, (b) patching the goroutine leak with a defer cleanup, or (c) increasing the thread pool size as a bandaid.
[09:27] Carol: Why is the recommendation API slow though? That seems like root cause.
[09:28] Alice: Recommendation API queries a Redis cache that may have been flushed. Let me check... yes, Redis was restarted at 8:53am for a routine maintenance window. Caused cache miss storm.
[09:30] Bob: So the Redis restart triggered the slowness, but the goroutine leak in v2.3.1 is what turned a slow API into total checkout failure.
[09:31] Carol: Redis should be fully warmed up again in a few minutes based on traffic patterns. But we still have the goroutine leak.
[09:32] Alice: Agreed. Immediate options: rollback v2.3.1 or deploy a hotfix. What's the recommended fix?
[09:33] Bob: I'm writing the hotfix now — adding a defer statement to clean up goroutines on the timeout path.
[09:35] Carol: Should be a one-line fix. I'll review it. Also, should we increase alerting on thread pool saturation? This would've caught it faster.
[09:37] Alice: Yes but let's fix first. Is the hotfix ready?
[09:38] Bob: Almost. Testing it locally now.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
skills
compact-debug-ledger