Explains test failures and provides actionable debugging guidance. Use when tests fail (unit, integration, E2E), builds fail, or code throws errors. Analyzes error messages, stack traces, and test output to identify root causes and suggest concrete fixes. Handles pytest, jest, junit, mocha, vitest, selenium, cypress, playwright, and other testing frameworks across Python, JavaScript/TypeScript, Java, Go, and other languages.
92
92%
Does it follow best practices?
Impact
88%
1.25xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the marks. It provides specific concrete actions, includes a comprehensive 'Use when...' clause with natural trigger terms, and clearly carves out a distinct niche around test failure debugging. The enumeration of specific frameworks and languages adds precision without being overly verbose.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete actions: 'Explains test failures', 'provides actionable debugging guidance', 'Analyzes error messages, stack traces, and test output', 'identify root causes', 'suggest concrete fixes'. Also enumerates specific frameworks and languages. | 3 / 3 |
Completeness | Clearly answers both what ('Explains test failures and provides actionable debugging guidance', 'Analyzes error messages...') AND when ('Use when tests fail (unit, integration, E2E), builds fail, or code throws errors') with explicit trigger guidance. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'tests fail', 'unit, integration, E2E', 'builds fail', 'errors', 'error messages', 'stack traces', plus specific framework names (pytest, jest, cypress, playwright) that users would naturally mention. | 3 / 3 |
Distinctiveness Conflict Risk | Clear niche focused specifically on test/build failures and debugging. The explicit mention of testing frameworks, test types, and error analysis creates distinct triggers that wouldn't overlap with general coding or documentation skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with excellent actionability and workflow clarity. The multiple concrete examples across different frameworks (pytest, Jest, Cypress) with complete before/after code are particularly strong. The main weakness is some verbosity in introductory sections and generic best practices that don't add value for Claude.
Suggestions
Remove or significantly condense the 'Core Capabilities' section - it describes rather than instructs and Claude doesn't need this meta-explanation
Trim the 'Best Practices' section which contains generic debugging advice Claude already knows (e.g., 'Read the full error', 'Verify the fix')
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is moderately verbose with some unnecessary explanations (e.g., listing 'Core Capabilities' that describe what the skill does rather than instructing). The framework-specific notes and quick reference table add value, but sections like 'Best Practices' contain generic advice Claude already knows. | 2 / 3 |
Actionability | Provides fully executable code examples across multiple languages and frameworks. The error analysis examples show complete before/after code fixes, specific commands to run, and concrete debugging steps that are copy-paste ready. | 3 / 3 |
Workflow Clarity | Clear 6-step workflow with explicit sequencing from gathering context through suggesting next steps. Each step has clear actions, and the verification steps are explicitly called out in examples. The workflow handles different scenarios (clear fix vs. needs more info vs. environmental). | 3 / 3 |
Progressive Disclosure | Well-structured with clear sections progressing from workflow to examples to framework-specific notes. References to external files (error_patterns.md, debugging_strategies.md) are clearly signaled and one level deep. Content is appropriately organized for discovery. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (606 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
0f00a4f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.