craft-experiment-design

Write a hypothesis, define success metrics, and plan a holdout strategy. Use when designing A/B tests or experiment plans.

Quality

57%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./product-skills/skills/craft-experiment-design/SKILL.md

Quality

Discovery

85%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid skill description that clearly communicates specific capabilities and includes an explicit 'Use when' trigger clause. Its main weakness is limited trigger term coverage—users might describe their need using terms like 'split test', 'experimentation framework', or 'sample size planning' that aren't captured. Overall it's well-structured and concise.

Suggestions

Expand trigger terms to include common variations like 'split test', 'experimentation', 'control group', 'sample size', or 'statistical testing'.

Dimension	Reasoning	Score
Specificity	Lists three specific concrete actions: 'Write a hypothesis', 'define success metrics', and 'plan a holdout strategy'. These are clear, actionable capabilities.	3 / 3
Completeness	Clearly answers both 'what' (write hypothesis, define success metrics, plan holdout strategy) and 'when' (explicit 'Use when designing A/B tests or experiment plans').	3 / 3
Trigger Term Quality	Includes good terms like 'A/B tests' and 'experiment plans', but misses common variations users might say such as 'split test', 'experimentation', 'control group', 'statistical significance', 'sample size', or 'test design'.	2 / 3
Distinctiveness Conflict Risk	The combination of A/B testing, hypothesis writing, success metrics, and holdout strategy creates a clear niche that is unlikely to conflict with other skills. The domain is well-defined and specific.	3 / 3
	Total	11 / 12 Passed

Implementation

29%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads more like a prompt template to copy-paste than actionable skill guidance for Claude. It over-explains experimentation concepts Claude already knows, lacks concrete examples of good experiment designs, and provides no workflow for iterating on or validating the plan. The structure is clean but the content doesn't earn its token cost.

Suggestions

Remove the explanatory framing and instead provide a concrete example: show a sample input (e.g., 'Test a new checkout flow') and the expected output experiment plan, so Claude knows the exact format and depth expected.

Add a brief workflow: e.g., 1) Draft hypothesis, 2) Identify metrics, 3) Estimate sample size using provided traffic numbers, 4) Review assumptions—with explicit checkpoints like 'Confirm primary metric is measurable before proceeding.'

Strip the prompt template format and rewrite as direct instructions to Claude—Claude doesn't need to be told 'You are an experienced product manager'; instead, specify the output structure and quality bar directly.

Add a concrete example of a guardrail metric selection or duration calculation to make the guidance actionable rather than descriptive.

Dimension	Reasoning	Score
Conciseness	The skill is verbose and explains concepts Claude already knows well—what an A/B test is, what a hypothesis looks like, what guardrail metrics are. The introductory paragraph ('You want to run an A/B test but need to get the plan straight first') adds no value. The prompt template essentially re-explains experimentation fundamentals that Claude inherently understands.	1 / 3
Actionability	The prompt template provides a structured numbered list of what to produce, which gives some concrete guidance. However, it's essentially a prompt-to-paste rather than executable steps or concrete examples of actual experiment designs. There are no example inputs/outputs showing what a good experiment plan looks like.	2 / 3
Workflow Clarity	There is no clear workflow or sequence of steps. The skill is a single prompt template with no validation checkpoints, no iterative refinement process, and no guidance on what to do after generating the plan (e.g., review with stakeholders, validate assumptions with data). For experiment design—which involves multiple decision points—this lack of sequencing is a significant gap.	1 / 3
Progressive Disclosure	This is a simple, single-purpose skill under 50 lines with no need for external references. The content is organized into clear sections (prompt template, tips), which is appropriate for its scope.	3 / 3
	Total	7 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: amplitude/builder-skills
Commit: 221ffaa

Reviewed: 22 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.