flaky-test-detector

Identifies non-deterministic or unreliable tests through static code analysis and test result analysis. Use when Claude needs to find flaky tests, analyze test reliability, or investigate intermittent test failures. Supports Python (pytest, unittest) and Java (JUnit, TestNG) test frameworks. Trigger when users mention "flaky tests", "intermittent failures", "non-deterministic tests", "unreliable tests", or ask to "find flaky tests", "analyze test stability", or "why tests fail randomly".

1.04x

Quality

92%

Does it follow best practices?

Impact

97%

1.04x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that hits all the marks. It provides specific capabilities (static analysis, test result analysis), explicitly names supported frameworks, includes a comprehensive 'Use when...' clause, and lists natural trigger terms that developers would actually use. The description is well-structured, uses third person voice correctly, and creates a clear, distinct niche.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'static code analysis', 'test result analysis', 'find flaky tests', 'analyze test reliability', 'investigate intermittent test failures'. Also specifies supported frameworks (pytest, unittest, JUnit, TestNG).	3 / 3
Completeness	Clearly answers both what (identifies non-deterministic tests through static code analysis and test result analysis) AND when (explicit 'Use when...' clause plus 'Trigger when...' with specific phrases). Both sections are explicit and detailed.	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'flaky tests', 'intermittent failures', 'non-deterministic tests', 'unreliable tests', 'find flaky tests', 'analyze test stability', 'why tests fail randomly'. These are exactly what developers would naturally say.	3 / 3
Distinctiveness Conflict Risk	Very clear niche focused specifically on flaky/unreliable test detection. The combination of 'flaky tests', specific test frameworks, and 'intermittent failures' creates a distinct trigger profile unlikely to conflict with general testing or code analysis skills.	3 / 3
	Total	12 / 12 Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with excellent actionability and workflow clarity. The content provides concrete detection methods, specific patterns to look for, and clear remediation guidance. The main weakness is some verbosity in explaining concepts Claude already understands, particularly in the 'What Makes Tests Flaky' section which could be condensed to a simple reference list.

Suggestions

Condense the 'What Makes Tests Flaky' section to a brief bullet list without explanations - Claude understands these concepts

Reduce the framework-specific guidance sections to just the unique patterns and fixes, removing general best practices Claude already knows

Dimension	Reasoning	Score
Conciseness	The content is reasonably efficient but includes some explanatory sections that Claude would already know (e.g., 'What Makes Tests Flaky' lists concepts Claude understands). The framework-specific guidance sections could be more condensed.	2 / 3
Actionability	Provides concrete, executable guidance including specific script usage with command examples, JSON input format, specific patterns to search for with actual code snippets (time.sleep(), requests.get()), and structured report format examples.	3 / 3
Workflow Clarity	Clear 5-step workflow with explicit decision points (choose detection method based on context), structured process for each method, and clear output format. The workflow includes validation through reporting and remediation steps.	3 / 3
Progressive Disclosure	Well-organized with clear sections, appropriate references to external files (flaky-patterns.md, remediation-strategies.md) that are one level deep and clearly signaled. Quick Start provides immediate guidance while detailed sections follow.	3 / 3
	Total	11 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: ArabelaTso/Skills-4-SE
Commit: 0f00a4f

Reviewed: about 2 months ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.