Identifies non-deterministic or unreliable tests through static code analysis and test result analysis. Use when Claude needs to find flaky tests, analyze test reliability, or investigate intermittent test failures. Supports Python (pytest, unittest) and Java (JUnit, TestNG) test frameworks. Trigger when users mention "flaky tests", "intermittent failures", "non-deterministic tests", "unreliable tests", or ask to "find flaky tests", "analyze test stability", or "why tests fail randomly".
Install with Tessl CLI
npx tessl i github:ArabelaTso/Skills-4-SE --skill flaky-test-detector94
Does it follow best practices?
Validation for skill structure
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the marks. It provides specific capabilities (static analysis, test result analysis), explicitly names supported frameworks, includes a comprehensive 'Use when...' clause, and lists natural trigger terms that developers would actually use. The description is well-structured, uses third person voice correctly, and creates a clear, distinct niche.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'static code analysis', 'test result analysis', 'find flaky tests', 'analyze test reliability', 'investigate intermittent test failures'. Also specifies supported frameworks (pytest, unittest, JUnit, TestNG). | 3 / 3 |
Completeness | Clearly answers both what (identifies non-deterministic tests through static code analysis and test result analysis) AND when (explicit 'Use when...' clause plus 'Trigger when...' with specific phrases). Both sections are explicit and detailed. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'flaky tests', 'intermittent failures', 'non-deterministic tests', 'unreliable tests', 'find flaky tests', 'analyze test stability', 'why tests fail randomly'. These are exactly what developers would naturally say. | 3 / 3 |
Distinctiveness Conflict Risk | Very clear niche focused specifically on flaky/unreliable test detection. The combination of 'flaky tests', specific test frameworks, and 'intermittent failures' creates a distinct trigger profile unlikely to conflict with general testing or code analysis skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with excellent actionability and workflow clarity. The content provides concrete detection methods, specific patterns to look for, and clear remediation guidance. The main weakness is some verbosity in explaining concepts Claude already understands, particularly in the 'What Makes Tests Flaky' section which could be condensed to a simple reference list.
Suggestions
Condense the 'What Makes Tests Flaky' section to a brief bullet list without explanations - Claude understands these concepts
Reduce the framework-specific guidance sections to just the unique patterns and fixes, removing general best practices Claude already knows
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is reasonably efficient but includes some explanatory sections that Claude would already know (e.g., 'What Makes Tests Flaky' lists concepts Claude understands). The framework-specific guidance sections could be more condensed. | 2 / 3 |
Actionability | Provides concrete, executable guidance including specific script usage with command examples, JSON input format, specific patterns to search for with actual code snippets (time.sleep(), requests.get()), and structured report format examples. | 3 / 3 |
Workflow Clarity | Clear 5-step workflow with explicit decision points (choose detection method based on context), structured process for each method, and clear output format. The workflow includes validation through reporting and remediation steps. | 3 / 3 |
Progressive Disclosure | Well-organized with clear sections, appropriate references to external files (flaky-patterns.md, remediation-strategies.md) that are one level deep and clearly signaled. Quick Start provides immediate guidance while detailed sections follow. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.