CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

When the user wants to plan, design, or implement an A/B test or experiment, or build a growth experimentation program. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," "how long should I run this test," "growth experiments," "experiment velocity," "experiment backlog," "ICE score," "experimentation program," or "experiment playbook." Use this whenever someone is comparing two approaches and wants to measure which performs better, or when they want to build a systematic experimentation practice. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro.

90

Quality

87%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description with excellent trigger term coverage and clear completeness, explicitly addressing both what the skill does and when to use it. The cross-references to related skills are a notable strength for reducing conflict. The main weakness is that the 'what' portion could be more specific about concrete actions the skill performs beyond planning/designing/implementing.

Suggestions

Add more specific concrete actions to the capability description, e.g., 'create test hypotheses, calculate required sample sizes, design experiment structures, prioritize experiment backlogs using ICE scoring, analyze statistical significance of results.'

DimensionReasoningScore

Specificity

The description mentions planning, designing, and implementing A/B tests, building a growth experimentation program, and comparing approaches to measure performance. However, it doesn't list multiple concrete specific actions like 'create hypothesis documents, calculate sample sizes, analyze test results, prioritize experiment backlogs' — it stays at a moderate level of specificity.

2 / 3

Completeness

The description clearly answers both 'what' (plan, design, implement A/B tests, build experimentation programs, compare approaches) and 'when' with extensive explicit trigger terms and use-case guidance. It also includes helpful cross-references to related skills (analytics-tracking, page-cro).

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms including 'A/B test,' 'split test,' 'experiment,' 'variant copy,' 'multivariate test,' 'hypothesis,' 'statistical significance,' 'ICE score,' 'experiment backlog,' and many more variations a user would naturally say.

3 / 3

Distinctiveness Conflict Risk

The description carves out a clear niche around A/B testing and experimentation, and explicitly differentiates itself from related skills by referencing 'analytics-tracking' for tracking implementation and 'page-cro' for page-level conversion optimization, reducing conflict risk.

3 / 3

Total

11

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured skill that provides highly actionable guidance for A/B testing and experimentation programs. Its main strength is the concrete frameworks, templates, and reference tables that make it immediately executable. The primary weakness is moderate verbosity—some sections explain concepts Claude already knows (statistical significance basics, core principles of testing) that could be trimmed to save tokens.

Suggestions

Trim the 'Core Principles' section and 'The Peeking Problem' explanation—Claude already understands these concepts. Reduce to brief reminders rather than explanations.

Condense the 'Statistical Significance' explanation (95% confidence, p-values) to a single line since Claude knows statistics well.

DimensionReasoningScore

Conciseness

The skill is fairly comprehensive but includes some unnecessary explanations Claude already knows (e.g., explaining what statistical significance means, what A/B tests are, the peeking problem). The tables and frameworks are efficient, but sections like 'Core Principles' explain basics that Claude would already understand. The Growth Experimentation Program section adds substantial length but provides genuinely useful structured frameworks.

2 / 3

Actionability

The skill provides highly concrete, actionable guidance: specific hypothesis templates with fill-in-the-blank structure, sample size reference tables with exact numbers, ICE scoring methodology, pre-launch checklists, analysis checklists, experiment playbook templates with specific fields, and clear metrics targets. The guidance is specific enough to execute immediately.

3 / 3

Workflow Clarity

Multi-step processes are clearly sequenced with explicit validation checkpoints: the pre-launch checklist includes tracking verification and QA, the experiment loop is numbered and cyclical, the analysis checklist has ordered steps with decision gates (e.g., 'Reach sample size? If not, result is preliminary'), and the cadence section provides clear feedback loops at weekly/bi-weekly/monthly/quarterly intervals. Guardrail metrics serve as explicit stop conditions.

3 / 3

Progressive Disclosure

The skill is well-structured with clear sections and appropriate references to external files (references/sample-size-guide.md, references/test-templates.md) that are one level deep and clearly signaled. It also references related skills (page-cro, analytics-tracking, copywriting) and checks for existing context files. The main content serves as a comprehensive overview without burying critical information in nested references.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
coreyhaines31/marketingskills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.