CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/compact-debug-ledger

Use when a debugging thread needs to be compressed into a reusable investigation ledger. Capture the target, evidence, attempted fixes, ruled-out hypotheses, viable hypotheses, and next experiments. Good triggers include "compact this debugging session", "summarize what we've tried", and "turn this into a debugging ledger".

99

3.66x
Quality

100%

Does it follow best practices?

Impact

99%

3.66x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-4/

Active Bug Hunt: Flaky CI Pipeline

Problem Description

Your team's CI pipeline has been producing flaky test failures for two weeks — the same tests sometimes pass, sometimes fail on identical code, with no reproducible pattern locally. Multiple engineers have investigated but progress has stalled because of scattered notes. You need to consolidate the current investigation state into a single compact reference document that focuses on what to try next.

Produce a compact investigation record from the notes below and save it as ci_debug.md.

Input Files

The following file is provided as input. Extract it before beginning.

=============== FILE: inputs/ci_notes.md ===============

CI Flakiness Investigation Notes

The Problem

Integration tests in the payments module fail approximately 15-20% of runs in CI. Always passes locally. Failures are non-deterministic — same commit, same code, pass/fail alternates with no apparent pattern.

What We Know (Evidence)

  • Failures first appeared after upgrading GitHub Actions runner from ubuntu-20.04 to ubuntu-22.04
  • Affected tests all involve the PaymentProcessor class and external HTTP calls
  • Test logs show "connection refused" on port 8080 — a mock server that should be started by test setup
  • The mock server sometimes starts before the test and sometimes starts after (race condition in test setup)
  • Test setup uses setTimeout(startServer, 100) — this was added as a workaround for a different issue last year
  • Server startup time on ubuntu-22.04 is faster than ubuntu-20.04 due to system optimization

Attempted Fixes and Results

  1. Pinned runner back to ubuntu-20.04 → tests pass consistently → confirms runner version is the trigger, but not a permanent fix since ubuntu-20.04 is deprecated
  2. Increased setTimeout delay from 100ms to 500ms → reduced failure rate from 20% to 5% but not eliminated → didn't fully solve it
  3. Added retry logic in test client for connection refused → tests pass but mask the underlying race → not a real fix
  4. Replaced setTimeout with a health-check poll (poll port 8080 every 50ms until ready, timeout 5s) → deployed to feature branch CI → zero failures in 20 runs → looks very promising, not yet merged
  5. Attempted to use jest's globalSetup instead of per-test setup → failed — server lifecycle doesn't integrate well with jest's module isolation

Possible Next Actions (team brainstorm, not prioritized)

  • Merge the health-check poll fix to main
  • Add a regression test that deliberately starts server late to prevent future flakiness
  • Investigate whether other test modules have the same setTimeout anti-pattern
  • Check if the ubuntu-22.04 runner change affected any other test suites
  • Write a post-mortem documenting the root cause
  • Update the test infrastructure docs
  • Review all CI configs for similar patterns
  • Set up flakiness alerting dashboard
  • Consider migrating to testcontainers for more reliable service lifecycle management
  • Add a lint rule to catch setTimeout in test setup files

evals

tile.json