Diagnose failed checks, flaky runs, API/tool errors, missing evidence, false-success risk, and unresolved incidents by classifying failure severity, applying bounded retries or suppression budgets, and deciding when to escalate. Use when troubleshooting jobs, pipelines, automations, integrations, agent/tool failures, timeouts, rate limits, stale read-backs, or any situation where you need a clear retry-vs-escalate decision instead of ad hoc recovery.
94
94%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description that excels in specificity and completeness, clearly articulating both what the skill does and when to use it with an explicit 'Use when...' clause. The trigger terms are natural and comprehensive. The main weakness is the very broad scope, which could cause overlap with more specialized troubleshooting or debugging skills in a large skill library.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Diagnose failed checks, flaky runs, API/tool errors, missing evidence, false-success risk, and unresolved incidents' along with specific methods like 'classifying failure severity, applying bounded retries or suppression budgets, and deciding when to escalate.' | 3 / 3 |
Completeness | Clearly answers both 'what' (diagnose failures, classify severity, apply retries/suppression, decide escalation) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios like troubleshooting jobs, pipelines, automations, timeouts, rate limits, and retry-vs-escalate decisions. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'failed checks', 'flaky runs', 'API errors', 'tool errors', 'timeouts', 'rate limits', 'pipelines', 'automations', 'integrations', 'agent/tool failures', 'retry-vs-escalate'. These are terms users would naturally use when encountering these problems. | 3 / 3 |
Distinctiveness Conflict Risk | While the description is detailed, the extremely broad scope ('jobs, pipelines, automations, integrations, agent/tool failures, timeouts, rate limits... any situation') could overlap with more specific pipeline, CI/CD, or API debugging skills. The phrase 'any situation where you need a clear retry-vs-escalate decision' is quite expansive and could conflict with narrower troubleshooting skills. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
92%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a high-quality skill that provides a clear, actionable decision framework for error triage. Its strengths are the well-defined tier system with concrete default actions, the suppression budget pattern with pseudocode, and strong guardrails including untrusted-content handling. The only minor weakness is that all content is inline, though the overall length is reasonable enough that this is a minor concern.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient throughout. It avoids explaining what errors, APIs, or retries are—concepts Claude already knows. Every section earns its place: the workflow steps, tier definitions, suppression budget pattern, and examples are all additive information that Claude wouldn't have by default. | 3 / 3 |
Actionability | The skill provides concrete, executable guidance: specific tier classifications with clear default actions (retry counts, escalation rules), a pseudocode suppression budget tracker, concrete tier examples with specific signals and actions, and a structured output format template. The guidance is specific enough to be directly applied. | 3 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with explicit validation checkpoints. The guardrails section enforces ordering ('Do not advance from classification to action until the tier is explicit'). The suppression budget pattern includes a clear feedback loop (check thresholds → escalate or suppress → save state). Unknown/contradictory states have explicit escalation rules, preventing silent failures. | 3 / 3 |
Progressive Disclosure | The content is well-organized with clear sections (workflow, trigger patterns, suppression budget, examples, output format, guardrails), but it's all inline in a single file. At ~100 lines this is borderline—the content doesn't desperately need splitting, but the concrete tier examples and suppression budget pattern could be referenced out to keep the main skill leaner. No external references are provided. | 2 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents