CtrlK
BlogDocsLog inGet started
Tessl Logo

test-results-analyzer

Parses and analyzes test output to identify failing tests, flaky tests, coverage gaps, defect patterns, and release readiness. Use when the user shares test results, CI/CD pipeline output, pytest/JUnit/Mocha logs, test coverage reports, or asks about test failures, pass rates, or quality metrics. Produces structured reports with root cause analysis, risk assessment, and prioritized recommendations for development teams and stakeholders.

87

Quality

85%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that hits all the marks. It provides specific concrete actions, includes a comprehensive 'Use when...' clause with natural trigger terms spanning multiple testing frameworks, and clearly distinguishes itself from related skills like code review or debugging. The description is well-structured, uses third person voice throughout, and balances detail with conciseness.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: parses test output, identifies failing tests, flaky tests, coverage gaps, defect patterns, release readiness assessment, produces structured reports with root cause analysis, risk assessment, and prioritized recommendations.

3 / 3

Completeness

Clearly answers both 'what' (parses test output, identifies failing/flaky tests, coverage gaps, produces structured reports with root cause analysis) and 'when' (explicit 'Use when...' clause listing specific trigger scenarios like sharing test results, CI/CD output, pytest/JUnit/Mocha logs, etc.).

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms users would say: 'test results', 'CI/CD pipeline output', 'pytest/JUnit/Mocha logs', 'test coverage reports', 'test failures', 'pass rates', 'quality metrics'. These span multiple testing frameworks and common user phrasings.

3 / 3

Distinctiveness Conflict Risk

Occupies a clear niche around test output analysis with distinct triggers like pytest/JUnit/Mocha logs, CI/CD pipeline output, test coverage reports, and quality metrics. Unlikely to conflict with general code review or debugging skills due to the specific focus on test result parsing and analysis.

3 / 3

Total

12

/

12

Passed

Implementation

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with excellent workflow clarity and progressive disclosure. The release readiness thresholds table and incomplete data handling section are particularly strong, providing concrete decision criteria. The main weaknesses are that the code example delegates to external functions rather than being self-contained, and some sections could be slightly more concise.

Suggestions

Include the actual implementation of parse_junit_xml inline (or at least a minimal working version) rather than only showing invocation of a function defined elsewhere, to improve actionability.

Tighten the introductory sentence and Step 4 bullets—the skill description already covers the purpose, and 'Include all findings, assumptions, and data limitations' is generic guidance Claude already follows.

DimensionReasoningScore

Conciseness

Generally efficient but includes some unnecessary elaboration. Phrases like 'Parses raw test data into actionable insights' in the intro partially repeat the workflow. The statistical methods section (p-values, standard deviations) adds useful domain-specific guidance Claude wouldn't inherently know, but some bullet points could be tightened.

2 / 3

Actionability

The workflow steps are concrete and the thresholds table is highly actionable. However, the code example is not self-contained—it references functions defined in EXAMPLES.md rather than providing executable code. The guidance is more procedural than copy-paste ready, sitting between pseudocode and fully executable.

2 / 3

Workflow Clarity

The four-step workflow is clearly sequenced with explicit validation checkpoints: Step 1 requires confirming data completeness before proceeding, Step 2 includes significance thresholds and explicit handling of insufficient data, and Step 3 produces a go/no-go with confidence levels. The 'Handling Incomplete or Ambiguous Data' section provides clear feedback loops for edge cases.

3 / 3

Progressive Disclosure

Content is well-structured with clear sections. References to TEMPLATES.md and EXAMPLES.md are one level deep and clearly signaled. The main skill file serves as an effective overview with the right level of detail inline (thresholds, workflow) while deferring implementation details and report templates to separate files.

3 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
OpenRoster-ai/awesome-agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.