When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.
78
Quality
73%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./config/claude/skills/ab-test-setup/SKILL.mdQuality
Discovery
62%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description excels at trigger term coverage and distinctiveness, making it easy for Claude to know when to select it. However, it fails to explain what the skill actually does - the 'what' is almost entirely missing, replaced by vague verbs like 'plan, design, or implement.' Users and Claude won't know what concrete capabilities this skill provides.
Suggestions
Add specific concrete actions the skill performs, e.g., 'Creates experiment designs with control/variant groups, calculates required sample sizes, generates hypothesis statements, and produces variant copy alternatives.'
Replace vague verbs 'plan, design, or implement' with specific outputs like 'generates test plans, writes variant copy, defines success metrics, and structures experiment documentation.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague language like 'plan, design, or implement' without listing concrete actions. It doesn't specify what the skill actually does - no mention of specific capabilities like creating test variants, calculating sample sizes, or analyzing results. | 1 / 3 |
Completeness | The 'when' is explicitly and thoroughly covered with trigger terms and use cases. However, the 'what' is weak - it only says 'plan, design, or implement' without explaining what concrete actions or outputs the skill provides. | 2 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would say: 'A/B test,' 'split test,' 'experiment,' 'test this change,' 'variant copy,' 'multivariate test,' 'hypothesis.' These are all terms users would naturally use when needing this skill. | 3 / 3 |
Distinctiveness Conflict Risk | Clear niche with distinct triggers specific to A/B testing and experimentation. The explicit mention of trigger terms and the cross-reference to analytics-tracking for related but different functionality helps avoid conflicts. | 3 / 3 |
Total | 9 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, comprehensive A/B testing skill with strong actionability and clear workflows. The main weakness is some verbosity—explaining concepts Claude already understands (statistical significance definitions, the peeking problem explanation) and unnecessary persona framing. The tables, checklists, and hypothesis framework provide excellent concrete guidance.
Suggestions
Remove the opening persona statement ('You are an expert...') and basic concept explanations (what statistical significance means, why peeking is bad) that Claude already knows
Tighten the 'Core Principles' section—these are standard testing principles that could be reduced to a brief reminder list rather than explained
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is generally efficient but includes some unnecessary framing ('You are an expert...') and explanatory text that Claude already knows (e.g., explaining what statistical significance means, what peeking is). The tables and frameworks are well-structured but could be tighter. | 2 / 3 |
Actionability | Provides concrete frameworks (hypothesis structure with fill-in template), specific sample size tables, clear checklists, and actionable decision matrices. The examples distinguish weak vs strong hypotheses effectively, and the implementation section names specific tools. | 3 / 3 |
Workflow Clarity | Clear sequential workflow from initial assessment through hypothesis creation, test design, implementation, running, and analysis. Includes explicit pre-launch checklist with validation steps, DO/DON'T guidance during execution, and a structured analysis checklist. | 3 / 3 |
Progressive Disclosure | Well-organized with clear sections, appropriate use of tables for quick reference, and explicit references to deeper materials (sample-size-guide.md, test-templates.md). Content is appropriately split between overview and detailed references. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
355d067
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.