markusdowne/error-triage-ladder

Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

1.16x

Quality

94%

Does it follow best practices?

Impact

100%

1.16x

Average score across 9 eval scenarios

Error Classification Module for a Monitoring Agent

Name: markusdowne/error-triage-ladder
Rating: 0.988 (1 reviews)
Author: markusdowne

Problem Description

A DevOps team runs an autonomous monitoring agent that polls 12 internal services every minute and takes corrective actions when services appear unhealthy. Some failures are well-understood (e.g., a known rate-limit error code from a specific service), but a class of errors keeps appearing with no clear cause — the error message is generic ("internal error"), the stack trace points to a third-party library internals, and there is no historical record of this specific failure. The team is unsure whether these mystery errors are harmless blips or signs of a deeper problem.

Currently, the agent treats unknown errors the same as known minor errors — it logs them and moves on. This has caused two near-misses where an unknown error was actually a symptom of data corruption. The team wants a triage decision module that has a well-defined policy for what to do when an error cannot be definitively classified, so the agent behaves conservatively when in doubt.

Output Specification

Write a Python module triage_classifier.py that:

Classifies errors into severity tiers based on available evidence
Has a defined policy for errors that cannot be clearly classified
Includes a classify_error(error_info: dict) -> TriageDecision function
Includes a DESIGN.md file explaining the classification rules and the conservative fallback policy

Also write a classify_examples.py script that calls classify_error with at least 3 example inputs (one clearly benign, one clearly severe, one ambiguous/unknown) and prints the resulting triage decisions to stdout.

Install with Tessl CLI

npx tessl i markusdowne/error-triage-ladder@0.1.3

markusdowne/error-triage-ladder

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-5/

Error Classification Module for a Monitoring Agent

Problem Description

Output Specification

task.mdevals/scenario-5/