Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
48
51%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/antigravity-ab-test-setup/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear niche (A/B test setup with structured gates) which makes it distinctive, but it lacks explicit trigger guidance ('Use when...') and could be more specific about the concrete actions it performs. The trigger terms cover the core concept but miss common synonyms and variations.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to set up an A/B test, split test, or experiment and needs structured validation before launch.'
Include common synonyms and variations as trigger terms: 'split test', 'experiment design', 'variant testing', 'conversion experiment'.
List more concrete actions, e.g., 'Guides users through defining a hypothesis, selecting success metrics, calculating sample size, and validating execution readiness before launching an A/B test.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (A/B testing) and mentions some specific elements (hypothesis, metrics, execution readiness), but doesn't list concrete actions beyond 'setting up'. It describes structure rather than specific capabilities like 'define hypotheses, configure metrics, validate execution readiness'. | 2 / 3 |
Completeness | Describes what the skill does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' itself is only moderately clear, so this scores at 1. | 1 / 3 |
Trigger Term Quality | Includes 'A/B tests' which is a natural keyword, plus 'hypothesis' and 'metrics' which are relevant. However, it misses common variations like 'split test', 'experiment', 'variant testing', 'conversion testing', or 'feature flag'. | 2 / 3 |
Distinctiveness Conflict Risk | A/B testing with mandatory gates for hypothesis, metrics, and execution readiness is a fairly distinct niche. It's unlikely to conflict with other skills due to the specific combination of A/B testing and structured gate-based workflow. | 3 / 3 |
Total | 8 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured process skill with strong workflow clarity through explicit gates and refusal conditions. Its main weaknesses are a lack of concrete worked examples (e.g., a sample hypothesis, a sample size calculation) and some verbosity in motivational/explanatory content that Claude doesn't need. The monolithic structure is acceptable but could benefit from splitting detailed reference sections into separate files.
Suggestions
Add a concrete worked example: a sample hypothesis statement, sample metric definition, and sample size calculation to make the skill more actionable and copy-paste ready.
Remove or drastically shorten the motivational closing ('Final Reminder') and the generic 'When to Use' / 'Limitations' boilerplate—these consume tokens without adding actionable guidance.
Include a specific sample size calculation formula or reference to a tool/command (e.g., a Python snippet using statsmodels or an online calculator URL) to make Section 7 executable rather than descriptive.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably structured but includes some unnecessary padding—motivational closing paragraphs ('A/B testing is not about proving ideas right...'), emoji section numbering, and the 'When to Use' / 'Limitations' boilerplate add little value. Several sections explain concepts Claude already understands (e.g., what A/B vs A/B/n vs MVT tests are). Could be tightened by ~30%. | 2 / 3 |
Actionability | The skill provides clear checklists and gate conditions, which are actionable for a process-oriented skill. However, it lacks concrete examples—no sample hypothesis, no sample size calculation formula or tool command, no example metric definition. It describes what to do but rarely shows a worked example, keeping it at the 'some concrete guidance but incomplete' level. | 2 / 3 |
Workflow Clarity | The multi-step process is clearly sequenced with explicit hard gates (Hypothesis Lock, Execution Readiness Gate) that block progression. Refusal conditions serve as validation checkpoints, and the 'During the Test' DO/DO NOT lists provide guardrails. The feedback loop of 'if assumptions are weak → warn → recommend delaying' is present. This is a well-structured gated workflow. | 3 / 3 |
Progressive Disclosure | The content is a single monolithic file with no references to supporting documents. For a skill of this length (~150 lines of substantive content), sections like metrics definition, test type selection, and analysis discipline could be split into referenced files. However, the internal section structure is clear and navigable, preventing a score of 1. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
95574f3
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.