Write a hypothesis, define success metrics, and plan a holdout strategy. Use when designing A/B tests or experiment plans.
65
57%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./product-skills/skills/craft-experiment-design/SKILL.mdQuality
Discovery
85%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid skill description that clearly communicates specific capabilities and includes an explicit 'Use when' trigger clause. Its main weakness is limited trigger term coverage—users might describe their need using terms like 'split test', 'experimentation framework', or 'sample size planning' that aren't captured. Overall it's well-structured and concise.
Suggestions
Expand trigger terms to include common variations like 'split test', 'experimentation', 'control group', 'sample size', or 'statistical testing'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists three specific concrete actions: 'Write a hypothesis', 'define success metrics', and 'plan a holdout strategy'. These are clear, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (write hypothesis, define success metrics, plan holdout strategy) and 'when' (explicit 'Use when designing A/B tests or experiment plans'). | 3 / 3 |
Trigger Term Quality | Includes good terms like 'A/B tests' and 'experiment plans', but misses common variations users might say such as 'split test', 'experimentation', 'control group', 'statistical significance', 'sample size', or 'test design'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of A/B testing, hypothesis writing, success metrics, and holdout strategy creates a clear niche that is unlikely to conflict with other skills. The domain is well-defined and specific. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
29%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like a prompt template to copy-paste than actionable skill guidance for Claude. It over-explains experimentation concepts Claude already knows, lacks concrete examples of good experiment designs, and provides no workflow for iterating on or validating the plan. The structure is clean but the content doesn't earn its token cost.
Suggestions
Remove the explanatory framing and instead provide a concrete example: show a sample input (e.g., 'Test a new checkout flow') and the expected output experiment plan, so Claude knows the exact format and depth expected.
Add a brief workflow: e.g., 1) Draft hypothesis, 2) Identify metrics, 3) Estimate sample size using provided traffic numbers, 4) Review assumptions—with explicit checkpoints like 'Confirm primary metric is measurable before proceeding.'
Strip the prompt template format and rewrite as direct instructions to Claude—Claude doesn't need to be told 'You are an experienced product manager'; instead, specify the output structure and quality bar directly.
Add a concrete example of a guardrail metric selection or duration calculation to make the guidance actionable rather than descriptive.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is verbose and explains concepts Claude already knows well—what an A/B test is, what a hypothesis looks like, what guardrail metrics are. The introductory paragraph ('You want to run an A/B test but need to get the plan straight first') adds no value. The prompt template essentially re-explains experimentation fundamentals that Claude inherently understands. | 1 / 3 |
Actionability | The prompt template provides a structured numbered list of what to produce, which gives some concrete guidance. However, it's essentially a prompt-to-paste rather than executable steps or concrete examples of actual experiment designs. There are no example inputs/outputs showing what a good experiment plan looks like. | 2 / 3 |
Workflow Clarity | There is no clear workflow or sequence of steps. The skill is a single prompt template with no validation checkpoints, no iterative refinement process, and no guidance on what to do after generating the plan (e.g., review with stakeholders, validate assumptions with data). For experiment design—which involves multiple decision points—this lack of sequencing is a significant gap. | 1 / 3 |
Progressive Disclosure | This is a simple, single-purpose skill under 50 lines with no need for external references. The content is organized into clear sections (prompt template, tips), which is appropriate for its scope. | 3 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
221ffaa
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.