experiment-design

A discipline for designing experiments (A/B tests, multivariate, holdouts) so the results actually answer the question you asked. Hypothesis writing, sample size, duration, segment analysis, interpretation, decision-making, and the common failure modes that produce confidently wrong shipping decisions.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A well-structured, actionable playbook with clear lifecycle sequencing and excellent progressive disclosure through real, one-level-deep references. The main weakness is verbosity: substantial narrative and motivational prose could be trimmed to improve token efficiency.

Suggestions

Tighten the opening framing and per-section lead-ins (e.g. "This is a section many discussions skip. Worth being direct about.") to reduce motivational prose that does not add actionable guidance.

Convert the dense narrative paragraphs in sections like Sample size and Network effects into tighter bulleted decision rules to cut tokens while preserving the concrete thresholds.

Dimension	Reasoning	Score
Conciseness	The body delivers domain-specific judgment Claude would not already know, but it is prose-heavy with motivational framing ("The default state of experimentation in most companies is sloppy...") that could be tightened without losing substance.	2 / 3
Actionability	Concrete, executable guidance throughout: specific thresholds (5% absolute MDE, 80% power, 14-day UI minimum), a worked good-vs-bad hypothesis with real numbers, and decision rules like "default to two-sided" and "do not ship on a violated guardrail."	3 / 3
Workflow Clarity	A clearly sequenced 12-consideration lifecycle from pre-experiment readiness through decision-making, reinforced by referenced checklists and an explicit pre-commitment discipline that acts as a validation checkpoint.	3 / 3
Progressive Disclosure	The SKILL.md body is an overview that points to seven well-signaled, one-level-deep reference files (all verified present), with inline links and a dedicated "Reference files" index for easy navigation.	3 / 3
	Total	11 / 12 Passed

Description

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong, specific description that names concrete capabilities and natural trigger terms with a clear niche. Its main weakness is the absence of an explicit "Use when..." clause, which leaves the invocation trigger implied rather than stated.

Suggestions

Append an explicit trigger clause, e.g. "Use when designing or interpreting A/B tests, multivariate experiments, or holdouts, or when deciding sample size, duration, or whether to ship a result."

Add a few natural variation terms users might say (e.g. "experimentation," "test results," "statistical significance") to broaden trigger coverage.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions such as "Hypothesis writing, sample size, duration, segment analysis, interpretation, decision-making" rather than vague abstractions.	3 / 3
Completeness	It clearly states what the skill does (designing experiments so results answer the question) but lacks any explicit "Use when..." trigger clause, so the "when" is only implied and completeness is capped at 2 per the rubric.	2 / 3
Trigger Term Quality	Natural terms a user would actually say appear directly: "A/B tests, multivariate, holdouts" alongside "experiments," giving good coverage of common phrasings.	3 / 3
Distinctiveness Conflict Risk	The experiment-design niche is specific and the triggers (A/B tests, multivariate, holdouts) are distinct enough that it is unlikely to fire for the wrong skill.	3 / 3
	Total	11 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	15 / 16 Passed

Repository: rampstackco/claude-skills
Commit: f1bc195

Reviewed: 14 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.