Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Install with Tessl CLI
npx tessl i github:boisenoise/skills-collections --skill ab-test-setup56
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
A/B test plan creation
Hypothesis: evidence
100%
100%
Hypothesis: single change
100%
100%
Hypothesis: directional expectation
100%
100%
Hypothesis: defined audience
100%
100%
Hypothesis: MDE or success criteria
100%
100%
Hypothesis lock question
70%
100%
Assumptions listed
100%
100%
Test type is A/B
100%
100%
Single primary metric
100%
100%
Secondary metrics included
100%
100%
Guardrail metrics defined
100%
100%
Statistical parameters
100%
100%
Test duration estimated
100%
100%
Without context: $0.2524 · 1m 34s · 9 turns · 10 in / 4,290 out tokens
With context: $0.4553 · 2m 33s · 16 turns · 66 in / 7,579 out tokens
Refusal conditions and readiness gate
Declines to proceed
100%
100%
Multiple variables flagged
100%
100%
Unknown baseline cited
100%
100%
One hypothesis per test
100%
100%
Traffic or sample size concern
100%
100%
Readiness gate applied
100%
100%
Separate tests recommended
100%
100%
Baseline data recommended
100%
100%
Primary metric undefined
100%
100%
Concrete next steps
100%
100%
Without context: $0.1594 · 55s · 7 turns · 56 in / 2,481 out tokens
With context: $0.3343 · 1m 33s · 14 turns · 2,221 in / 4,253 out tokens
Results analysis and test documentation
Do not ship decision
33%
100%
Guardrail failure cited
70%
100%
No guardrail override
62%
100%
Secondary metric not overrides primary
100%
100%
Record: hypothesis
100%
100%
Record: variants
100%
100%
Record: all metrics
100%
100%
Record: sample size vs achieved
100%
100%
Record: decision
42%
100%
Record: learnings
100%
100%
Record: follow-up ideas
85%
100%
No overgeneralization
100%
100%
Stat significance vs business judgment
80%
100%
Without context: $0.2237 · 1m 12s · 7 turns · 8 in / 4,023 out tokens
With context: $0.3450 · 1m 47s · 14 turns · 82 in / 5,170 out tokens
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.