CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

When the user wants to plan, design, or implement an A/B test or experiment, or build a growth experimentation program. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," "how long should I run this test," "growth experiments," "experiment velocity," "experiment backlog," "ICE score," "experimentation program," or "experiment playbook." Use this whenever someone is comparing two approaches and wants to measure which performs better, or when they want to build a systematic experimentation practice. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro.

90

Quality

87%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description with excellent trigger term coverage and clear completeness, explicitly addressing both what the skill does and when to use it. The cross-references to related skills (analytics-tracking, page-cro) are a notable strength for reducing conflict. The main weakness is that the 'what' portion could be more specific about concrete actions the skill performs beyond the general 'plan, design, or implement.'

Suggestions

Add more specific concrete actions to the 'what' portion, e.g., 'Generates hypotheses, calculates required sample sizes, designs test variants, prioritizes experiment backlogs using ICE scoring, and determines test duration.'

DimensionReasoningScore

Specificity

The description mentions planning, designing, and implementing A/B tests, building a growth experimentation program, and comparing approaches to measure performance. However, it doesn't list multiple concrete specific actions like 'create hypothesis documents, calculate sample sizes, design test variants, prioritize experiment backlogs' — it stays at a moderate level of specificity.

2 / 3

Completeness

Clearly answers both 'what' (plan, design, implement A/B tests, build experimentation programs, compare approaches) and 'when' with extensive explicit trigger terms and use-case guidance. Also includes helpful cross-references to related skills (analytics-tracking, page-cro).

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms including 'A/B test,' 'split test,' 'experiment,' 'variant copy,' 'multivariate test,' 'hypothesis,' 'statistical significance,' 'ICE score,' 'experiment backlog,' and many more variations that users would naturally say.

3 / 3

Distinctiveness Conflict Risk

The description carves out a clear niche around A/B testing and experimentation, and explicitly differentiates itself from related skills by referencing analytics-tracking for implementation and page-cro for page-level conversion optimization, reducing conflict risk.

3 / 3

Total

11

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, comprehensive skill that provides highly actionable guidance for A/B testing and experimentation programs. Its main weakness is moderate verbosity — some sections explain concepts Claude already understands (statistical significance definitions, basic test type descriptions) and could be trimmed. The workflow clarity and progressive disclosure are excellent, with proper checklists, validation steps, and external references.

Suggestions

Trim explanations of concepts Claude already knows, such as the definition of statistical significance, what A/B vs MVT tests are, and the general description of client-side vs server-side testing — keep only the actionable tool recommendations and decision criteria.

Condense the 'Common Mistakes' section into the relevant workflow sections (e.g., merge design mistakes into the hypothesis/design sections) rather than repeating guidance in a separate section.

DimensionReasoningScore

Conciseness

The skill is reasonably well-structured but includes some unnecessary explanation Claude already knows (e.g., explaining what statistical significance means, what A/B vs MVT tests are, the peeking problem). The Growth Experimentation Program section adds significant length. Some tables restate common knowledge rather than providing novel, project-specific guidance.

2 / 3

Actionability

The skill provides concrete, actionable frameworks: a fill-in-the-blank hypothesis template, specific sample size tables, checklists with checkboxes, ICE scoring methodology, an experiment playbook template with specific fields, and clear cadence recommendations. The guidance is specific enough to execute immediately.

3 / 3

Workflow Clarity

Multi-step processes are clearly sequenced with explicit validation checkpoints: the pre-launch checklist includes tracking verification and QA, the analysis checklist has ordered steps, the experiment loop is numbered, and the cadence section provides clear review cycles. The 'During the Test' DO/Avoid lists serve as guardrails.

3 / 3

Progressive Disclosure

The skill provides a clear overview with well-signaled one-level-deep references to external files (references/sample-size-guide.md, references/test-templates.md) and related skills (page-cro, analytics-tracking, copywriting). Content is appropriately structured with headers and tables for scanning, and detailed reference material is correctly deferred.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
coreyhaines31/marketingskills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.