CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/compact-debug-ledger

Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".

99

3.66x
Quality

100%

Does it follow best practices?

Impact

99%

3.66x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-2/

Compressing a Node.js Memory Leak Investigation

Problem Description

A backend team has been chasing a memory leak in a Node.js service for two days. The Slack thread is now 200+ messages long and unmanageable. You've been asked to produce a compact version of the investigation so that the on-call engineer coming on shift tonight can pick up exactly where the team left off.

Read the session transcript below and produce a compact investigation record. Save it as investigation_summary.md.

Input Files

The following file is provided as input. Extract it before beginning.

=============== FILE: inputs/slack_thread.md ===============

Memory Leak Investigation - Slack Thread (condensed)

Day 1, 10:00 — Issue filed: Node.js API service memory grows from ~200MB to ~1.4GB over 6 hours, then OOM-kills. Restarts every night around 3am.

Day 1, 10:45 — Checked if it was a known library issue. Scanned npm audit. No relevant advisories found. Dead end.

Day 1, 11:30 — Added heap snapshot collection every 30 min via clinic.js. Heap snapshots confirmed: string objects accumulating in a cache layer.

Day 1, 12:15 — Hypothesis: Express session middleware not expiring sessions. Checked session TTL config — TTL is set to 1 hour and working correctly. Hypothesis disproved.

Day 1, 14:00 — Hypothesis: Log buffer not being flushed. Added flush call after each request. Memory growth rate unchanged after 1 hour observation. This fix had no effect.

Day 1, 15:30 — Added per-endpoint memory tracking via custom middleware. Isolated the leak to the /search endpoint.

Day 1, 17:00 — Examined /search handler code. Found an in-process cache (searchCache) that stores raw query strings and result sets, keyed by query. Cache has no eviction policy and no size limit.

Day 1, 17:45 — Added LRU cache with max 500 entries to replace unbounded cache. Deployed to staging. Memory stable for 2 hours on staging. Looks promising.

Day 2, 09:00 — Deployed LRU fix to production. Monitoring overnight: memory still grew but much more slowly — from 200MB to ~450MB over 6 hours (was 1.4GB before).

Day 2, 10:30 — Heap snapshots show a second, smaller accumulation: event listeners on the EventEmitter instance inside the search handler are not being removed after request completion. Classic listener leak.

Day 2, 11:15 — Added emitter.removeAllListeners() in the request cleanup path. Deployed to staging. Memory stayed flat for 3 hours on staging.

Day 2, 12:00 — Hypothesis: Third-party elastic-client library leaks connections. Checked connection pool metrics — pool is healthy, no leaked connections detected. Hypothesis disproved.

Day 2, 13:30 — Not yet deployed the listener fix to production. Need to verify it handles edge case where request is aborted mid-flight before merging.

Day 2, 14:00 — Still need to check: does the EventEmitter leak also exist in the /autocomplete endpoint which uses similar code?

evals

tile.json