Summarize experiment results, call a winner, and draft a stakeholder-ready recommendation. Use when an A/B test is complete and you need to communicate results.
67
60%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./product-skills/skills/craft-experiment-readout/SKILL.mdQuality
Discovery
85%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-structured description that clearly communicates specific capabilities and includes an explicit 'Use when' trigger clause. Its main weakness is moderate trigger term coverage—it could benefit from additional natural keywords users might use when referring to A/B testing or experiment analysis. The description is concise and avoids fluff while remaining informative.
Suggestions
Add more natural trigger terms like 'split test', 'variant analysis', 'statistical significance', 'conversion rate', or 'experiment report' to improve keyword coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'summarize experiment results', 'call a winner', and 'draft a stakeholder-ready recommendation'. These are clear, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers both what ('summarize experiment results, call a winner, draft a stakeholder-ready recommendation') and when ('when an A/B test is complete and you need to communicate results') with an explicit 'Use when' clause. | 3 / 3 |
Trigger Term Quality | Includes 'A/B test', 'experiment results', and 'communicate results' which are relevant, but misses common variations like 'split test', 'variant', 'conversion rate', 'statistical significance', or 'test analysis'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of A/B testing, winner calling, and stakeholder recommendation drafting creates a clear niche that is unlikely to conflict with general data analysis or reporting skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is essentially a prompt template wrapped in markdown, which adds limited value beyond what Claude can already do when asked to summarize an A/B test. It lacks concrete examples of input data and expected output, contains unnecessary motivational/explanatory text, and provides no validation steps for statistical claims. The structured output format is helpful but the skill would benefit significantly from an example readout and tighter writing.
Suggestions
Remove filler text like 'No stats degree required' and 'The experiment is done and the data is in' — these waste tokens and patronize Claude.
Add a concrete example showing sample input data (metrics, sample sizes, p-values) and the expected readout output, so Claude has a clear reference for quality and format.
Add a validation step: after generating the readout, verify that statistical significance claims are consistent with the provided confidence intervals and sample sizes.
Trim the prompt template to focus only on what Claude wouldn't already know — the specific output structure and audience-matching requirements — rather than explaining what an experiment readout is.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is verbose with unnecessary hand-holding ('No stats degree required', 'The experiment is done and the data is in'). It explains concepts Claude already knows and includes filler phrases. The prompt template itself is largely a description of what a good experiment readout looks like — something Claude already knows how to produce. | 1 / 3 |
Actionability | The prompt template provides a structured numbered list of sections to produce, which gives some concrete guidance. However, it's essentially a prompt-within-a-prompt with no executable code, no example input/output, and no concrete example of what a good readout looks like. The placeholders are vague. | 2 / 3 |
Workflow Clarity | The 10-step numbered list provides a clear sequence for the output structure, but there are no validation checkpoints — no step to verify statistical calculations, no feedback loop for checking if confidence intervals are correctly interpreted, and no guidance on what to do if data is incomplete or malformed. | 2 / 3 |
Progressive Disclosure | The content is organized into sections (Prompt Template, Tips) which provides some structure. However, the entire skill is a single monolithic file with no references to supporting materials. The tips section could be better integrated, and there's no separation between the template and example outputs or advanced customization. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
221ffaa
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.