Content
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable experiment design skill with excellent workflow clarity and a well-structured multi-step process including validation gates and a pre-finalization critique. Its main weaknesses are moderate verbosity (repeating key points across sections, explaining concepts Claude already knows) and a monolithic structure that could benefit from splitting reference material into separate files. The worked example and decision framework template are particularly valuable.
Suggestions
Trim repeated points — the 'define decision framework before results' message appears in three places (Step 3k, Structured Critique, Common Issues); consolidate to one authoritative statement.
Remove explanatory asides that Claude already knows (e.g., what novelty effects are, what p-hacking means) and replace with just the actionable instruction (e.g., 'Check for novelty effect risk' rather than explaining what it is).
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is thorough but includes some unnecessary verbosity — philosophical framing about poorly designed experiments, explanations of concepts Claude already knows (what p-hacking is, what novelty effects are), and repeated emphasis on the same points (e.g., 'define decision framework before, not after' appears in Steps 3k, the critique section, and Common Issues). The content could be tightened by ~30% without losing actionability. | 2 / 3 |
Actionability | The skill provides highly concrete, executable guidance: specific hypothesis templates, exact decision tree logic, named tools (Statsig, BigQuery, Mixpanel), specific metrics (S2O, C2O, CVR, GBV, ARPU, CM1), sample size calculation parameters, a complete output template with checklist, and a worked example with the scarcity booster. Every step tells Claude exactly what to produce. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced (Steps 1 → 2 → 2.5 → 3 → Critique → Output) with explicit validation checkpoints: Step 2 validates experiment readiness before design, Step 2.5 uses AskUserQuestion to surface blind spots with clear completion criteria, and the Structured Critique section serves as a review gate before finalizing. The decision framework itself includes feedback loops for inconclusive results. | 3 / 3 |
Progressive Disclosure | The skill references `${CLAUDE_PLUGIN_ROOT}/CLAUDE.md` for context loading, which is appropriate. However, the content is monolithic — all experiment design details, the critique framework, common issues, and the example are inline in a single long file. The structured critique section and common issues could reasonably be split into separate reference files, and no bundle files exist to support this. | 2 / 3 |
Total | 10 / 12 Passed |