Flaky Test Detector - Auto-activating skill for Test Automation. Triggers on: flaky test detector, flaky test detector Part of the Test Automation skill category.
35
3%
Does it follow best practices?
Impact
91%
0.98xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./planned-skills/generated/09-test-automation/flaky-test-detector/SKILL.mdQuality
Discovery
7%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is essentially a label with no substance. It names the skill and its category but provides zero information about what concrete actions it performs, what inputs it expects, or when Claude should select it. The duplicated trigger term and absence of natural keyword variations make it very weak for skill selection.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Detects flaky tests by analyzing test run history, identifies non-deterministic failures, and suggests fixes for intermittent test issues.'
Add an explicit 'Use when...' clause with trigger scenarios, e.g., 'Use when the user mentions flaky tests, intermittent failures, non-deterministic test results, or unreliable test suites.'
Expand trigger terms to include natural variations users would say: 'flaky tests', 'intermittent test failures', 'unreliable tests', 'test instability', 'non-deterministic tests', 'randomly failing tests'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain ('Test Automation') and the concept ('Flaky Test Detector') but provides no concrete actions. There is no indication of what the skill actually does—no verbs describing capabilities like 'identifies', 'reruns', 'analyzes', or 'reports'. | 1 / 3 |
Completeness | The description fails to answer both 'what does this do' and 'when should Claude use it'. There is no explanation of capabilities and no explicit 'Use when...' clause with meaningful trigger guidance. | 1 / 3 |
Trigger Term Quality | The trigger terms listed are just 'flaky test detector' repeated twice. There are no natural variations a user might say, such as 'flaky tests', 'intermittent test failures', 'unreliable tests', 'test instability', or 'non-deterministic tests'. | 1 / 3 |
Distinctiveness Conflict Risk | The term 'flaky test' is somewhat specific to a niche area, which provides some distinctiveness. However, the lack of concrete actions or clear scope means it could overlap with other test-related skills. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is essentially a placeholder with no substantive content. It repeatedly references 'flaky test detector' without ever defining what flaky tests are, how to detect them, or what to do about them. There is no actionable guidance, no code, no examples, and no workflow—just self-referential boilerplate.
Suggestions
Add concrete, executable code examples for detecting flaky tests (e.g., running a test suite N times with pytest --count or Jest's --repeat flag and parsing results for inconsistencies).
Define a clear workflow: 1) Identify candidate flaky tests, 2) Run repeated executions, 3) Analyze results for non-determinism, 4) Classify root causes (timing, shared state, external dependencies), 5) Apply targeted fixes.
Remove all meta-description sections ('Purpose', 'When to Use', 'Capabilities', 'Example Triggers') and replace with actual technical content—patterns for common flaky test causes and specific remediation strategies.
Add concrete examples of flaky test patterns (e.g., race conditions, time-dependent assertions, order-dependent tests) with before/after code showing how to fix each.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is entirely filler and meta-description. It explains what the skill does in abstract terms without providing any actual technical content. Every section restates the same vague idea—that this skill helps with 'flaky test detector'—without adding substance. | 1 / 3 |
Actionability | There is zero concrete guidance—no code, no commands, no specific techniques for detecting or handling flaky tests. The content describes rather than instructs, offering only vague promises like 'provides step-by-step guidance' without actually providing any. | 1 / 3 |
Workflow Clarity | No workflow, steps, or process is defined. There are no sequences, no validation checkpoints, and no actual instructions for detecting or resolving flaky tests. | 1 / 3 |
Progressive Disclosure | The content is a monolithic block of meta-description with no references to detailed materials, no links to examples or advanced guides, and no meaningful structural organization of actual content. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
3076d78
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.