The Experiment Designer specialist for Headout's PM OS. Use this skill when a feature or change needs to be validated via an A/B test or controlled experiment before full rollout. It designs the experiment end-to-end: hypothesis, variants, user assignment, sample size, guardrails, measurement window, and the BigQuery/Statsig setup needed to track results. This skill is conditional — not every feature needs a formal experiment design. Use it when the PM says "we want to A/B test this", "how do we measure if this works", "design an experiment for X", "what's our holdout strategy", "help me think through the experiment setup", or when a spec includes an experiment as its validation method. The Experiment Designer works with Headout's experimentation infrastructure: Statsig for assignment and feature flags, BigQuery for outcome measurement, Mixpanel for behavioral signals, and Delphi (#ask-delphi) for ad hoc queries.
91
88%
Does it follow best practices?
Impact
96%
1.14xAverage score across 3 eval scenarios
Passed
No known issues
Full experiment design output format and decision framework
Correct output filename
100%
100%
Structured hypothesis format
50%
100%
Specific variant definition
100%
100%
Assignment unit reasoning
100%
100%
Traffic split justification
83%
100%
Target segment dimensions
100%
100%
Single primary metric
100%
100%
MDE and baseline stated
100%
100%
Sample size and duration calculation
100%
100%
Guardrail metrics named
100%
100%
Minimum 2-week window
100%
100%
Statsig flag named
20%
100%
New events specified
100%
100%
BQ query or table referenced
0%
100%
Pre-defined decision framework
100%
100%
Low-traffic surface and proxy metric handling
Traffic problem flagged
100%
100%
Low-traffic alternatives offered
80%
100%
Alternative experiment type recommended
10%
50%
Instrumentation gap identified
100%
100%
Proxy metric documented
40%
100%
Single primary metric
100%
20%
Guardrail metrics included
100%
100%
Decision framework present
100%
100%
Assumptions named explicitly
75%
87%
Minimum 2-week window
100%
100%
Output file correct name
100%
100%
Feasibility assessment section
100%
100%
Structured critique and multivariate experiment design
Critique section present
100%
100%
Internal validity threat flagged
100%
100%
Novelty effect considered
100%
100%
Guardrail sufficiency addressed
62%
100%
Metric gaming risk assessed
100%
100%
Post-hoc decision framework flagged
100%
100%
Pre-committed decision framework present
100%
100%
Multivariate justification
85%
100%
New events instrumented
100%
100%
Statsig flag specified
0%
100%
Sample size covers all arms
100%
100%
MDE conservatism assessed
42%
100%
Output file correct name
100%
100%
Pre-experiment checklist
100%
100%
d8db811
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.