flaky-test-detector

Flaky Test Detector - Auto-activating skill for Test Automation. Triggers on: flaky test detector, flaky test detector Part of the Test Automation skill category.

0.98x

Quality

Does it follow best practices?

Impact

91%

0.98x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./planned-skills/generated/09-test-automation/flaky-test-detector/SKILL.md

Quality

Discovery

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is essentially a label with no substance. It names the skill and its category but provides zero information about what concrete actions it performs, what inputs it expects, or when Claude should select it. The duplicated trigger term and absence of natural keyword variations make it very weak for skill selection.

Suggestions

Add concrete actions describing what the skill does, e.g., 'Detects flaky tests by analyzing test run history, identifies non-deterministic failures, and suggests fixes for intermittent test issues.'

Add an explicit 'Use when...' clause with trigger scenarios, e.g., 'Use when the user mentions flaky tests, intermittent failures, non-deterministic test results, or unreliable test suites.'

Expand trigger terms to include natural variations users would say: 'flaky tests', 'intermittent test failures', 'unreliable tests', 'test instability', 'non-deterministic tests', 'randomly failing tests'.

Dimension	Reasoning	Score
Specificity	The description names the domain ('Test Automation') and the concept ('Flaky Test Detector') but provides no concrete actions. There is no indication of what the skill actually does—no verbs describing capabilities like 'identifies', 'reruns', 'analyzes', or 'reports'.	1 / 3
Completeness	The description fails to answer both 'what does this do' and 'when should Claude use it'. There is no explanation of capabilities and no explicit 'Use when...' clause with meaningful trigger guidance.	1 / 3
Trigger Term Quality	The trigger terms listed are just 'flaky test detector' repeated twice. There are no natural variations a user might say, such as 'flaky tests', 'intermittent test failures', 'unreliable tests', 'test instability', or 'non-deterministic tests'.	1 / 3
Distinctiveness Conflict Risk	The term 'flaky test' is somewhat specific to a niche area, which provides some distinctiveness. However, the lack of concrete actions or clear scope means it could overlap with other test-related skills.	2 / 3
	Total	5 / 12 Passed

Implementation

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is essentially a placeholder with no substantive content. It repeatedly references 'flaky test detector' without ever defining what flaky tests are, how to detect them, or what to do about them. There is no actionable guidance, no code, no examples, and no workflow—just self-referential boilerplate.

Suggestions

Add concrete, executable code examples for detecting flaky tests (e.g., running a test suite N times with pytest --count or Jest's --repeat flag and parsing results for inconsistencies).

Define a clear workflow: 1) Identify candidate flaky tests, 2) Run repeated executions, 3) Analyze results for non-determinism, 4) Classify root causes (timing, shared state, external dependencies), 5) Apply targeted fixes.

Remove all meta-description sections ('Purpose', 'When to Use', 'Capabilities', 'Example Triggers') and replace with actual technical content—patterns for common flaky test causes and specific remediation strategies.

Add concrete examples of flaky test patterns (e.g., race conditions, time-dependent assertions, order-dependent tests) with before/after code showing how to fix each.

Dimension	Reasoning	Score
Conciseness	The content is entirely filler and meta-description. It explains what the skill does in abstract terms without providing any actual technical content. Every section restates the same vague idea—that this skill helps with 'flaky test detector'—without adding substance.	1 / 3
Actionability	There is zero concrete guidance—no code, no commands, no specific techniques for detecting or handling flaky tests. The content describes rather than instructs, offering only vague promises like 'provides step-by-step guidance' without actually providing any.	1 / 3
Workflow Clarity	No workflow, steps, or process is defined. There are no sequences, no validation checkpoints, and no actual instructions for detecting or resolving flaky tests.	1 / 3
Progressive Disclosure	The content is a monolithic block of meta-description with no references to detailed materials, no links to examples or advanced guides, and no meaningful structural organization of actual content.	1 / 3
	Total	4 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: jeremylongshore/claude-code-plugins-plus-skills
Commit: 3076d78

Reviewed: 10 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.