Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
66
47%
Does it follow best practices?
Impact
100%
1.01xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./docs/v19.7/configuration/agent/skills_external/antigravity-awesome-skills-main/skills/ab-test-setup/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear domain (A/B testing) and hints at a structured process with mandatory checkpoints, but lacks explicit trigger guidance and comprehensive action details. The absence of a 'Use when...' clause significantly limits Claude's ability to know when to select this skill, and the description would benefit from more natural user-facing keywords.
Suggestions
Add a 'Use when...' clause with trigger terms like 'A/B test', 'split test', 'experiment setup', 'hypothesis validation', or 'test metrics'
List specific concrete actions such as 'define hypothesis, select metrics, calculate sample size, set success criteria, document test plan'
Include common variations users might say: 'split testing', 'experimentation', 'variant testing', 'conversion experiment'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (A/B tests) and mentions some actions (setting up, gates for hypothesis, metrics, execution readiness), but doesn't list multiple concrete actions like 'define control groups, calculate sample sizes, track conversion rates'. | 2 / 3 |
Completeness | Describes what it does (structured guide for A/B test setup with gates) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | Includes 'A/B tests' which is a natural term users would say, but misses common variations like 'split testing', 'experiment', 'variant testing', or 'conversion optimization'. | 2 / 3 |
Distinctiveness Conflict Risk | A/B testing is a reasonably specific niche, but 'structured guide' and 'gates' are vague enough that it could overlap with other process/workflow skills or general experimentation guides. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a well-structured, gate-based workflow for A/B test setup with strong validation checkpoints and clear sequencing. However, it lacks concrete examples (sample hypotheses, actual sample size calculations) and could be more concise by removing philosophical statements. The actionability would improve significantly with specific examples of good vs. bad hypotheses and concrete calculation methods.
Suggestions
Add a concrete example of a well-formed hypothesis with all required components (observation, change, expectation, audience, success criteria)
Include a sample size calculation example with actual numbers or reference a specific calculator/formula
Remove or condense the 'Final Reminder' section and philosophical statements that don't add actionable guidance
Consider splitting detailed sections (metrics definition, analysis discipline) into separate referenced files for better progressive disclosure
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is reasonably efficient but includes some unnecessary philosophical statements ('A/B testing is not about proving ideas right') and explanatory text that Claude would already understand. The checklists and tables are appropriately concise, but sections like 'Final Reminder' add little actionable value. | 2 / 3 |
Actionability | Provides clear checklists and decision criteria, but lacks concrete examples of what a good hypothesis looks like, specific formulas for sample size calculation, or executable code/commands for tracking verification. The guidance is structured but remains somewhat abstract. | 2 / 3 |
Workflow Clarity | Excellent multi-step workflow with explicit hard gates at hypothesis lock and execution readiness. Clear sequencing from prerequisites through analysis, with explicit validation checkpoints ('Do NOT proceed until confirmed', 'Do NOT proceed without a realistic sample size estimate'). The numbered sections create unambiguous ordering. | 3 / 3 |
Progressive Disclosure | Content is well-organized with clear section headers and logical grouping, but everything is in a single monolithic file. For a skill of this length (~150 lines), some content like the detailed metrics definitions or analysis discipline could be split into referenced files for better navigation. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
20ba150
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.