Flaky Test Detector - Auto-activating skill for Test Automation. Triggers on: flaky test detector, flaky test detector Part of the Test Automation skill category.
35
3%
Does it follow best practices?
Impact
91%
0.98xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./planned-skills/generated/09-test-automation/flaky-test-detector/SKILL.mdQuality
Discovery
7%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is essentially a label with no substantive content. It names the skill and its category but provides zero information about what concrete actions it performs, what inputs it expects, or when Claude should select it. The trigger terms are redundant and miss the many natural ways users might describe flaky test issues.
Suggestions
Add concrete action verbs describing what the skill does, e.g., 'Detects intermittently failing tests by analyzing test run history, identifies root causes of non-deterministic behavior, and suggests fixes or quarantine strategies.'
Add an explicit 'Use when...' clause with diverse trigger scenarios, e.g., 'Use when the user mentions flaky tests, intermittent failures, non-deterministic test results, randomly failing CI builds, or unreliable test suites.'
Remove the duplicated trigger term and expand with natural keyword variations users would actually say, such as 'test flakiness', 'unstable tests', 'tests passing sometimes', 'CI failures', or 'randomly broken tests'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain ('Test Automation') and the concept ('Flaky Test Detector') but provides no concrete actions. There is no indication of what the skill actually does—no verbs describing capabilities like 'identifies', 'reruns', 'analyzes', or 'reports'. | 1 / 3 |
Completeness | The description fails to answer 'what does this do' beyond naming itself, and the 'when' clause is essentially absent—there is no explicit 'Use when...' guidance, only a redundant trigger phrase. Both dimensions are very weak. | 1 / 3 |
Trigger Term Quality | The trigger terms listed are just 'flaky test detector' repeated twice. There are no natural variations a user might say, such as 'intermittent test failures', 'unreliable tests', 'test flakiness', 'non-deterministic tests', or 'randomly failing tests'. | 1 / 3 |
Distinctiveness Conflict Risk | The term 'flaky test' is somewhat niche and unlikely to conflict with many other skills, giving it some distinctiveness. However, the lack of specific actions or clear scope means it could still overlap with general test analysis or debugging skills. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is essentially a placeholder with no substantive content. It repeatedly references 'flaky test detector' without ever explaining what flaky tests are, how to detect them, or what to do about them. There is no actionable guidance, no code, no workflow, and no references to deeper materials—it is entirely meta-description.
Suggestions
Add concrete, executable code examples showing how to detect flaky tests (e.g., running a test suite N times, parsing results for inconsistent pass/fail patterns, using pytest-repeat or Jest --repeat flags).
Define a clear multi-step workflow: identify flaky tests → diagnose root causes (timing, shared state, external dependencies) → apply fixes → validate stability, with explicit validation checkpoints.
Remove all meta-description sections (Purpose, When to Use, Capabilities, Example Triggers) and replace with actual technical content—specific patterns, commands, and strategies for flaky test detection and remediation.
Add references to detailed guides for specific frameworks (e.g., a Jest flaky test guide, a pytest flaky test guide) to provide progressive disclosure.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is entirely filler and meta-description. It explains what the skill does in abstract terms without providing any actual technical content. Every section restates the same vague idea—that this skill helps with 'flaky test detector'—without adding substance. | 1 / 3 |
Actionability | There is zero concrete guidance—no code, no commands, no specific techniques for detecting or handling flaky tests. The content describes rather than instructs, offering only vague promises like 'provides step-by-step guidance' without actually delivering any. | 1 / 3 |
Workflow Clarity | No workflow, steps, or process is defined. There are no sequences, no validation checkpoints, and no actual procedure for detecting or resolving flaky tests. | 1 / 3 |
Progressive Disclosure | The content is a monolithic block of meta-description with no references to detailed materials, no links to examples or advanced guides, and no meaningful structural organization of actual content. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
3e83543
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.