test-results-analyzer

Parses and analyzes test output to identify failing tests, flaky tests, coverage gaps, defect patterns, and release readiness. Use when the user shares test results, CI/CD pipeline output, pytest/JUnit/Mocha logs, test coverage reports, or asks about test failures, pass rates, or quality metrics. Produces structured reports with root cause analysis, risk assessment, and prioritized recommendations for development teams and stakeholders.

Quality

85%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with excellent workflow clarity and progressive disclosure. The release readiness thresholds table and incomplete data handling section are particularly strong, providing concrete decision criteria. The main weaknesses are that the code example delegates to external functions rather than being self-contained, and some sections could be slightly more concise.

Suggestions

Include the actual implementation of parse_junit_xml inline (or at least a minimal working version) rather than only showing invocation of a function defined elsewhere, to improve actionability.

Tighten the introductory sentence and Step 4 bullets—the skill description already covers the purpose, and 'Include all findings, assumptions, and data limitations' is generic guidance Claude already follows.

Dimension	Reasoning	Score
Conciseness	Generally efficient but includes some unnecessary elaboration. Phrases like 'Parses raw test data into actionable insights' in the intro partially repeat the workflow. The statistical methods section (p-values, standard deviations) adds useful domain-specific guidance Claude wouldn't inherently know, but some bullet points could be tightened.	2 / 3
Actionability	The workflow steps are concrete and the thresholds table is highly actionable. However, the code example is not self-contained—it references functions defined in EXAMPLES.md rather than providing executable code. The guidance is more procedural than copy-paste ready, sitting between pseudocode and fully executable.	2 / 3
Workflow Clarity	The four-step workflow is clearly sequenced with explicit validation checkpoints: Step 1 requires confirming data completeness before proceeding, Step 2 includes significance thresholds and explicit handling of insufficient data, and Step 3 produces a go/no-go with confidence levels. The 'Handling Incomplete or Ambiguous Data' section provides clear feedback loops for edge cases.	3 / 3
Progressive Disclosure	Content is well-structured with clear sections. References to TEMPLATES.md and EXAMPLES.md are one level deep and clearly signaled. The main skill file serves as an effective overview with the right level of detail inline (thresholds, workflow) while deferring implementation details and report templates to separate files.	3 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that hits all the marks. It provides specific concrete actions, includes a comprehensive 'Use when...' clause with natural trigger terms spanning multiple testing frameworks, and clearly distinguishes itself from related skills like code review or debugging. The description is well-structured, uses third person voice throughout, and balances detail with conciseness.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: parses test output, identifies failing tests, flaky tests, coverage gaps, defect patterns, release readiness assessment, produces structured reports with root cause analysis, risk assessment, and prioritized recommendations.	3 / 3
Completeness	Clearly answers both 'what' (parses test output, identifies failing/flaky tests, coverage gaps, produces structured reports with root cause analysis) and 'when' (explicit 'Use when...' clause listing specific trigger scenarios like sharing test results, CI/CD output, pytest/JUnit/Mocha logs, etc.).	3 / 3
Trigger Term Quality	Excellent coverage of natural trigger terms users would say: 'test results', 'CI/CD pipeline output', 'pytest/JUnit/Mocha logs', 'test coverage reports', 'test failures', 'pass rates', 'quality metrics'. These span multiple testing frameworks and common user phrasings.	3 / 3
Distinctiveness Conflict Risk	Occupies a clear niche around test output analysis with distinct triggers like pytest/JUnit/Mocha logs, CI/CD pipeline output, test coverage reports, and quality metrics. Unlikely to conflict with general code review or debugging skills due to the specific focus on test result parsing and analysis.	3 / 3
	Total	12 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: OpenRoster-ai/awesome-agents
Commit: 010799b

Reviewed: 4 months ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.