Parses and analyzes test output to identify failing tests, flaky tests, coverage gaps, defect patterns, and release readiness. Use when the user shares test results, CI/CD pipeline output, pytest/JUnit/Mocha logs, test coverage reports, or asks about test failures, pass rates, or quality metrics. Produces structured reports with root cause analysis, risk assessment, and prioritized recommendations for development teams and stakeholders.
87
85%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the marks. It provides specific concrete actions, includes a comprehensive 'Use when...' clause with natural trigger terms spanning multiple testing frameworks, and clearly distinguishes itself from related skills like code review or debugging. The description is well-structured, uses third person voice throughout, and balances detail with conciseness.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: parses test output, identifies failing tests, flaky tests, coverage gaps, defect patterns, release readiness assessment, produces structured reports with root cause analysis, risk assessment, and prioritized recommendations. | 3 / 3 |
Completeness | Clearly answers both 'what' (parses test output, identifies failing/flaky tests, coverage gaps, produces structured reports with root cause analysis) and 'when' (explicit 'Use when...' clause listing specific trigger scenarios like sharing test results, CI/CD output, pytest/JUnit/Mocha logs, etc.). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would say: 'test results', 'CI/CD pipeline output', 'pytest/JUnit/Mocha logs', 'test coverage reports', 'test failures', 'pass rates', 'quality metrics'. These span multiple testing frameworks and common user phrasings. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche around test output analysis with distinct triggers like pytest/JUnit/Mocha logs, CI/CD pipeline output, test coverage reports, and quality metrics. Unlikely to conflict with general code review or debugging skills due to the specific focus on test result parsing and analysis. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with excellent workflow clarity and progressive disclosure. The release readiness thresholds table and incomplete data handling section are particularly strong, providing concrete decision criteria. The main weaknesses are that the code example delegates to external functions rather than being self-contained, and some sections could be slightly more concise.
Suggestions
Include the actual implementation of parse_junit_xml inline (or at least a minimal working version) rather than only showing invocation of a function defined elsewhere, to improve actionability.
Tighten the introductory sentence and Step 4 bullets—the skill description already covers the purpose, and 'Include all findings, assumptions, and data limitations' is generic guidance Claude already follows.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient but includes some unnecessary elaboration. Phrases like 'Parses raw test data into actionable insights' in the intro partially repeat the workflow. The statistical methods section (p-values, standard deviations) adds useful domain-specific guidance Claude wouldn't inherently know, but some bullet points could be tightened. | 2 / 3 |
Actionability | The workflow steps are concrete and the thresholds table is highly actionable. However, the code example is not self-contained—it references functions defined in EXAMPLES.md rather than providing executable code. The guidance is more procedural than copy-paste ready, sitting between pseudocode and fully executable. | 2 / 3 |
Workflow Clarity | The four-step workflow is clearly sequenced with explicit validation checkpoints: Step 1 requires confirming data completeness before proceeding, Step 2 includes significance thresholds and explicit handling of insufficient data, and Step 3 produces a go/no-go with confidence levels. The 'Handling Incomplete or Ambiguous Data' section provides clear feedback loops for edge cases. | 3 / 3 |
Progressive Disclosure | Content is well-structured with clear sections. References to TEMPLATES.md and EXAMPLES.md are one level deep and clearly signaled. The main skill file serves as an effective overview with the right level of detail inline (thresholds, workflow) while deferring implementation details and report templates to separate files. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
010799b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.