Design, run, and learn from experiments that test your riskiest assumptions. Handles the full experiment lifecycle — from designing the test to recording results to propagating what you learned back into the opportunity space.
47
51%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./discovery/skills/experiment/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description communicates a reasonable high-level concept around experiment design and learning but lacks the concrete specificity and explicit trigger guidance needed for reliable skill selection. It uses appropriate third-person voice but relies on abstract product-management terminology without grounding in natural user language or clear 'Use when' triggers.
Suggestions
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user wants to test a hypothesis, validate an assumption, design an experiment, or run a prototype test.'
Include more concrete actions and outputs, e.g., 'Creates experiment briefs, defines success metrics, documents results in structured templates, and updates the opportunity solution tree with findings.'
Add natural keyword variations users might say, such as 'hypothesis', 'validate', 'A/B test', 'test plan', 'experiment results', to improve trigger term coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a domain (experiments/assumption testing) and mentions some actions (design, run, learn, recording results, propagating learnings), but the actions are fairly high-level and lack concrete specifics about what formats, tools, or outputs are involved. | 2 / 3 |
Completeness | The description covers 'what' (design/run/learn from experiments) but has no explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric should cap completeness at 2. Additionally, the 'when' is entirely missing — not even implied clearly — so it scores a 1. | 1 / 3 |
Trigger Term Quality | Terms like 'experiments', 'assumptions', 'opportunity space' are somewhat relevant but lean toward product management jargon. Missing common natural variations users might say like 'hypothesis testing', 'validate idea', 'A/B test', 'prototype test', or 'user research'. | 2 / 3 |
Distinctiveness Conflict Risk | The experiment lifecycle focus provides some distinctiveness, but terms like 'riskiest assumptions' and 'opportunity space' are broad enough to overlap with strategy, product discovery, or research skills. Without explicit trigger boundaries, conflict risk remains moderate. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured instruction-only skill that clearly defines the experiment lifecycle workflow, transitions to other skills, and boundaries of responsibility. Its main weakness is moderate verbosity in framing sections and reliance on external references for concrete execution details, making the skill itself more of a philosophical guide than a hands-on playbook. The progressive disclosure and workflow clarity are strong points.
Suggestions
Trim the opening paragraph and 'Your Stance' section — Claude doesn't need motivational framing like 'You are the empirical engine' or 'This is where you earn your keep.'
Add a concrete example of a complete experiment record (even abbreviated) inline, showing the flow from assumption → experiment design → result → assumption update, rather than deferring all concrete details to referenced files.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient but includes some unnecessary philosophical framing ('You are the empirical engine of the discovery system') and explanatory prose that Claude doesn't need. The 'Your Stance' section, while useful, could be tighter. The outcome propagation and transitions sections are well-structured but slightly verbose. | 2 / 3 |
Actionability | The skill provides concrete guidance on success criteria and pre-commitment with good examples ('7 of 10 users complete the task without asking for help'), and clear instructions for outcome propagation. However, it lacks executable code/commands and relies heavily on references for the actual how-to (design, lifecycle tracking). The skill itself is more directional than copy-paste actionable. | 2 / 3 |
Workflow Clarity | The experiment lifecycle is clearly sequenced: design with success criteria → pre-commit actions → run → record results → update assumption → review parent ideas → check shared assumptions → suggest next actions. The outcome propagation section provides an explicit multi-step process with validation-like checkpoints (review impact, check shared assumptions). Transitions to other skills are clearly defined with trigger conditions. | 3 / 3 |
Progressive Disclosure | The skill provides a clear overview with well-signaled one-level-deep references: design principles in references/design-experiment.md, lifecycle tracking in references/experiment-lifecycle.md, schemas in experiment-record.md and assumption.md, and the artifacts skill for writing guidance. Content is appropriately split between the overview and referenced files. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
632c389
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.