CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.

Install with Tessl CLI

npx tessl i github:boisenoise/skills-collections --skill ab-test-setup
What are skills?

56

Does it follow best practices?

Validation for skill structure

SKILL.md
Review
Evals

Evaluation results

100%

3%

Checkout Conversion Test Plan

A/B test plan creation

Criteria
Without context
With context

Hypothesis: evidence

100%

100%

Hypothesis: single change

100%

100%

Hypothesis: directional expectation

100%

100%

Hypothesis: defined audience

100%

100%

Hypothesis: MDE or success criteria

100%

100%

Hypothesis lock question

70%

100%

Assumptions listed

100%

100%

Test type is A/B

100%

100%

Single primary metric

100%

100%

Secondary metrics included

100%

100%

Guardrail metrics defined

100%

100%

Statistical parameters

100%

100%

Test duration estimated

100%

100%

Without context: $0.2524 · 1m 34s · 9 turns · 10 in / 4,290 out tokens

With context: $0.4553 · 2m 33s · 16 turns · 66 in / 7,579 out tokens

100%

Test Proposal Review: Rebrand Landing Page

Refusal conditions and readiness gate

Criteria
Without context
With context

Declines to proceed

100%

100%

Multiple variables flagged

100%

100%

Unknown baseline cited

100%

100%

One hypothesis per test

100%

100%

Traffic or sample size concern

100%

100%

Readiness gate applied

100%

100%

Separate tests recommended

100%

100%

Baseline data recommended

100%

100%

Primary metric undefined

100%

100%

Concrete next steps

100%

100%

Without context: $0.1594 · 55s · 7 turns · 56 in / 2,481 out tokens

With context: $0.3343 · 1m 33s · 14 turns · 2,221 in / 4,253 out tokens

100%

22%

Analyze and Document Completed A/B Test

Results analysis and test documentation

Criteria
Without context
With context

Do not ship decision

33%

100%

Guardrail failure cited

70%

100%

No guardrail override

62%

100%

Secondary metric not overrides primary

100%

100%

Record: hypothesis

100%

100%

Record: variants

100%

100%

Record: all metrics

100%

100%

Record: sample size vs achieved

100%

100%

Record: decision

42%

100%

Record: learnings

100%

100%

Record: follow-up ideas

85%

100%

No overgeneralization

100%

100%

Stat significance vs business judgment

80%

100%

Without context: $0.2237 · 1m 12s · 7 turns · 8 in / 4,023 out tokens

With context: $0.3450 · 1m 47s · 14 turns · 82 in / 5,170 out tokens

Evaluated
Agent
Claude Code

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.