Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Install with Tessl CLI
npx tessl i github:sickn33/antigravity-awesome-skills --skill ab-test-setup65
Quality
47%
Does it follow best practices?
Impact
100%
1.09xAverage score across 3 eval scenarios
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/ab-test-setup/SKILL.mdDiscovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear domain (A/B testing) and hints at a structured process with checkpoints, but lacks explicit trigger guidance and comprehensive action details. It would benefit from a 'Use when...' clause and more natural keyword variations to help Claude reliably select this skill when users discuss experimentation.
Suggestions
Add a 'Use when...' clause with trigger terms like 'A/B test', 'split test', 'experiment setup', 'hypothesis validation', 'test metrics'
Expand specific actions beyond 'setting up' to include concrete steps like 'define hypothesis, select metrics, determine sample size, validate statistical significance'
Include common keyword variations users might say: 'experiment', 'split testing', 'variant testing', 'conversion optimization'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (A/B tests) and mentions some actions (setting up, gates for hypothesis, metrics, execution readiness), but doesn't list multiple concrete specific actions like 'define control groups, calculate sample sizes, track conversion rates'. | 2 / 3 |
Completeness | Describes what it does (structured guide for A/B test setup with gates) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | Includes 'A/B tests' which is a natural term users would say, but misses common variations like 'split testing', 'experiment', 'variant testing', 'feature flags', or 'conversion testing'. | 2 / 3 |
Distinctiveness Conflict Risk | The A/B testing focus provides some distinctiveness, but 'structured guide' and 'mandatory gates' are generic patterns that could overlap with other process/workflow skills. The specific mention of hypothesis and metrics helps somewhat. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a well-structured A/B testing workflow with strong gating mechanisms and clear sequencing. However, it lacks concrete examples (sample hypotheses, calculation formulas, specific tools) and includes some unnecessary motivational content. The workflow clarity is excellent, but actionability suffers from abstract guidance rather than executable specifics.
Suggestions
Add a concrete example of a well-formed hypothesis with all required components (observation, change, expectation, audience, success criteria)
Include a sample size calculation formula or reference to a specific calculator tool with example inputs/outputs
Remove philosophical statements like 'A/B testing is not about proving ideas right' and 'Final Reminder' section - Claude doesn't need motivation
Consider splitting detailed sections (Metrics Definition, Analysis Discipline) into separate reference files to improve progressive disclosure
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is reasonably efficient but includes some unnecessary philosophical statements ('A/B testing is not about proving ideas right') and motivational reminders that Claude doesn't need. The checklists and tables are well-structured, but some sections could be tightened. | 2 / 3 |
Actionability | Provides clear checklists and decision criteria, but lacks concrete examples of what a good hypothesis looks like, no sample size calculation formulas or tools, and no specific code/commands for tracking verification. The guidance is structured but abstract. | 2 / 3 |
Workflow Clarity | Excellent multi-step workflow with explicit hard gates ('Hypothesis Lock', 'Execution Readiness Gate'), clear sequencing from hypothesis through analysis, and explicit validation checkpoints. The 'Do NOT proceed' statements create clear feedback loops. | 3 / 3 |
Progressive Disclosure | Content is well-organized with clear sections and headers, but everything is in a single monolithic file. For a skill of this length (~200 lines), detailed sections like 'Metrics Definition' or 'Analysis Discipline' could be split into separate reference files. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.