Run e2e tests, fix flake and outdated tests, identify bugs against spec. Use when running e2e tests, debugging test failures, or fixing flaky tests. Never changes source code logic or API without spec backing.
92
96%
Does it follow best practices?
Impact
75%
1.22xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that concisely covers specific actions, includes natural trigger terms, explicitly states both what and when, and carves out a distinct niche around e2e testing. The added behavioral constraint ('Never changes source code logic or API without spec backing') further clarifies scope and reduces conflict risk with general coding skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Run e2e tests', 'fix flake and outdated tests', 'identify bugs against spec'. Also includes a behavioral constraint ('Never changes source code logic or API without spec backing'). | 3 / 3 |
Completeness | Clearly answers both what ('Run e2e tests, fix flake and outdated tests, identify bugs against spec') and when ('Use when running e2e tests, debugging test failures, or fixing flaky tests'). Includes an explicit 'Use when...' clause. | 3 / 3 |
Trigger Term Quality | Includes natural keywords users would say: 'e2e tests', 'test failures', 'flaky tests', 'debugging'. These are terms developers commonly use when dealing with end-to-end testing issues. | 3 / 3 |
Distinctiveness Conflict Risk | Clearly scoped to e2e testing specifically, with the constraint about not changing source code logic further distinguishing it from general coding or unit testing skills. The 'against spec' framing and 'flaky tests' focus create a distinct niche. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
92%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a high-quality skill that provides a clear, actionable framework for e2e test management. The failure taxonomy with explicit fix rules per category is excellent and genuinely adds knowledge Claude wouldn't have. The workflow is well-sequenced with appropriate validation, and the output template ensures consistent reporting. Minor improvement could come from slightly better progressive disclosure for the detailed fix rules.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Every section earns its place. The failure taxonomy is dense but necessary—Claude wouldn't inherently know this project's classification system or fix rules. No padding, no explaining what e2e tests are, no unnecessary context. | 3 / 3 |
Actionability | Provides concrete bash commands for running tests, specific patterns to use (getByRole, getByLabel, getByTestId), specific anti-patterns to avoid (waitForTimeout, retry loops), and a concrete output template. The fix rules are prescriptive and copy-paste applicable. | 3 / 3 |
Workflow Clarity | Clear 5-step workflow with explicit sequencing (fix flaky first, then outdated, then bugs). Includes validation via re-run in Step 5, a structured failure table for categorization, and the bug fix path has a unit test gate before completion. The feedback loop is implicit but clear: fix → re-run → report. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections (Principles vs Workflow), but everything is inline in a single file. The fix rules by category section is fairly lengthy and could benefit from being split out or referenced. However, for a skill of this size (~100 lines), it's borderline acceptable. | 2 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f772de4
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.