Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
68
51%
Does it follow best practices?
Impact
100%
1.09xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/antigravity-ab-test-setup/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear niche (A/B test setup with structured gates) and is distinctive, but it lacks explicit trigger guidance ('Use when...') and could be more specific about the concrete actions it performs. Adding natural user trigger terms and a 'when to use' clause would significantly improve skill selection accuracy.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to set up an A/B test, design an experiment, or plan a split test.'
Include more natural trigger term variations such as 'experiment,' 'split test,' 'variant testing,' 'conversion optimization,' or 'test design.'
List specific concrete actions, e.g., 'Guides users through defining hypotheses, selecting success metrics, calculating sample sizes, and validating execution readiness before launching A/B tests.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (A/B testing) and describes it as a 'structured guide' with 'mandatory gates for hypothesis, metrics, and execution readiness,' but doesn't list multiple concrete actions like creating hypotheses, defining metrics, validating sample sizes, etc. | 2 / 3 |
Completeness | Describes what it does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric should cap completeness at 2, and the 'what' is also somewhat vague, bringing it closer to 1. | 1 / 3 |
Trigger Term Quality | Includes 'A/B tests,' 'hypothesis,' and 'metrics' which are relevant keywords, but misses common variations users might say like 'experiment,' 'split test,' 'variant testing,' 'conversion,' or 'statistical significance.' | 2 / 3 |
Distinctiveness Conflict Risk | A/B testing with mandatory gates for hypothesis, metrics, and execution readiness is a clearly defined niche that is unlikely to conflict with other skills; the combination of A/B testing and structured gates makes it quite distinctive. | 3 / 3 |
Total | 8 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured process skill with strong workflow clarity and explicit gates/checkpoints, which is its primary strength. However, it suffers from a lack of concrete examples (no sample hypothesis, no calculation tools/formulas) and includes some unnecessary philosophical content. The monolithic structure could benefit from splitting detailed sections into referenced files.
Suggestions
Add a concrete example of a well-formed hypothesis with all required components (observation, change, direction, audience, MDE) to make the hypothesis gate actionable.
Include a specific sample size calculation example or reference a tool/formula (e.g., a Python snippet using statsmodels or a specific online calculator command).
Remove the 'Final Reminder' section and trim the 'Purpose & Scope' to a single line—these are motivational rather than actionable.
Consider splitting 'Analyzing Results' and 'Documentation & Learning' into separate referenced files to keep the main skill focused on the setup workflow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably structured but includes motivational/philosophical content ('A/B testing is not about proving ideas right...') and explains concepts Claude already knows (what A/B, A/B/n, MVT, and Split URL tests are). The 'Final Reminder' and 'Purpose & Scope' sections add little actionable value. | 2 / 3 |
Actionability | The skill provides structured checklists and clear gates, which is good for a process-oriented skill. However, it lacks concrete examples—no sample hypothesis, no sample size calculation formula or tool command, no example metric definition. It describes what to do but rarely shows how with specifics. | 2 / 3 |
Workflow Clarity | The multi-step process is clearly sequenced with numbered stages, explicit hard gates ('Do NOT proceed until confirmed,' 'Execution Readiness Gate'), and clear validation checkpoints. The refusal conditions and guardrail failure handling provide good feedback loops for error recovery. | 3 / 3 |
Progressive Disclosure | The content is well-organized with clear headers and sections, but it's a long monolithic document (~150 lines of substantive content) with no references to external files. The metrics definition, analysis, and documentation sections could be split into separate reference files to keep the main skill lean. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
636b862
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.