Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Install with Tessl CLI
npx tessl i github:haniakrim21/everything-claude-code --skill ab-test-setup60
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear domain (A/B testing) and hints at a structured approach with gates, but lacks explicit trigger guidance for when to use the skill. It would benefit from more specific action verbs and natural keyword variations to improve discoverability and reduce ambiguity.
Suggestions
Add a 'Use when...' clause with trigger terms like 'A/B test', 'split test', 'experiment setup', 'test hypothesis', or 'conversion experiment'.
Include common keyword variations such as 'split testing', 'experiment', 'variant testing', 'multivariate' to improve trigger term coverage.
List more specific concrete actions like 'define hypotheses', 'select metrics', 'calculate sample size', 'validate experiment readiness' to clarify capabilities.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (A/B tests) and mentions some actions (setting up, gates for hypothesis, metrics, execution readiness), but doesn't list comprehensive concrete actions like 'define control groups', 'calculate sample sizes', or 'analyze results'. | 2 / 3 |
Completeness | Describes what it does (structured guide for A/B test setup with gates) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | Includes 'A/B tests' which is a natural term users would say, but misses common variations like 'split testing', 'experiment', 'variant testing', 'conversion testing', or 'multivariate test'. | 2 / 3 |
Distinctiveness Conflict Risk | The focus on A/B tests with mandatory gates provides some distinctiveness, but 'hypothesis' and 'metrics' are generic terms that could overlap with general experiment planning or analytics skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill excels at workflow clarity with explicit gates and validation checkpoints that prevent common A/B testing mistakes. However, it lacks concrete examples (sample hypotheses, calculation formulas, tool-specific commands) that would make it immediately actionable. The content is moderately concise but includes some motivational filler that doesn't add operational value.
Suggestions
Add a concrete example of a well-formed hypothesis with all required components (observation, change, expectation, audience, success criteria)
Include a sample size calculation example or formula, e.g., using a specific tool or statistical formula
Consider splitting detailed sections (Metrics Definition, Analysis Discipline) into separate reference files to improve progressive disclosure
Remove or condense philosophical statements like 'Final Reminder' section that don't provide actionable guidance
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is reasonably efficient but includes some unnecessary philosophical statements ('A/B testing is not about proving ideas right') and motivational reminders that Claude doesn't need. The checklists and tables are appropriately dense, but sections like 'Final Reminder' add little actionable value. | 2 / 3 |
Actionability | Provides clear checklists and decision criteria, but lacks concrete examples of what a good hypothesis looks like, sample size calculations, or specific tool commands. The guidance is procedural but abstract—no executable code or copy-paste templates for tracking setup or analysis. | 2 / 3 |
Workflow Clarity | Excellent sequential structure with explicit hard gates ('Do NOT proceed until confirmed', 'Execution Readiness Gate'). Clear validation checkpoints before each phase, explicit refusal conditions, and a well-defined feedback loop for assumption violations. | 3 / 3 |
Progressive Disclosure | Content is well-organized with clear sections and headers, but everything is in a single monolithic file. For a skill of this length (~200 lines), detailed sections like 'Metrics Definition' or 'Analysis Discipline' could be split into referenced files for better navigation. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.