ab-test-setup

When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.

Quality

62%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./config/claude/skills/ab-test-setup/SKILL.md

Quality

Discovery

62%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description excels at trigger term coverage and distinctiveness, including a helpful cross-reference to a related skill. However, it is notably weak on specificity—it reads more like a routing rule than a skill description, telling Claude when to use it but barely explaining what concrete actions or outputs the skill provides. Adding specific capabilities would significantly improve it.

Suggestions

Add concrete actions the skill performs, e.g., 'Generates experiment hypotheses, calculates required sample sizes, creates variant copy, designs test plans, and analyzes statistical significance of results.'

Restructure to lead with specific capabilities before the trigger clause, following the pattern: '[What it does]. Use when [triggers].'

Dimension	Reasoning	Score
Specificity	The description mentions 'plan, design, or implement an A/B test or experiment' but these are very high-level actions without concrete specifics. It doesn't list what the skill actually does—no specific deliverables like 'create hypothesis documents,' 'calculate sample sizes,' 'generate variant copy,' or 'analyze test results.'	1 / 3
Completeness	The 'when' is explicitly and thoroughly covered with a 'Use when' clause and trigger terms. However, the 'what' is weak—it says 'plan, design, or implement' but doesn't describe what concrete outputs or actions the skill performs. The description is almost entirely trigger-focused with minimal capability description.	2 / 3
Trigger Term Quality	Excellent coverage of natural trigger terms: 'A/B test,' 'split test,' 'experiment,' 'test this change,' 'variant copy,' 'multivariate test,' 'hypothesis.' These are terms users would naturally use when requesting help with experimentation.	3 / 3
Distinctiveness Conflict Risk	The description carves out a clear niche around A/B testing and experimentation, and even explicitly delineates boundaries by directing tracking implementation to 'analytics-tracking.' The trigger terms are specific to this domain and unlikely to conflict with other skills.	3 / 3
	Total	9 / 12 Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a comprehensive A/B testing guide with strong workflow structure and useful reference tables, but it's more of a knowledge document than an actionable skill for Claude. It explains concepts Claude likely already understands (statistical significance, p-values) while lacking concrete implementation code. The progressive disclosure structure is partially implemented but undermined by missing bundle files and too much inline content.

Suggestions

Remove explanations of concepts Claude already knows (statistical significance definitions, what p-values mean, why you shouldn't peek) and replace with terse reminders or decision rules.

Add concrete, executable code examples for at least one testing tool (e.g., PostHog feature flag setup, LaunchDarkly variant configuration) to increase actionability.

Move the sample size quick reference table and common mistakes into referenced bundle files to reduce the main skill's length and improve progressive disclosure.

Provide the referenced bundle files (references/sample-size-guide.md, references/test-templates.md) or remove the references to avoid broken links.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably well-organized but includes some unnecessary explanations Claude already knows (e.g., explaining what statistical significance means, what p-values are, the peeking problem). Some sections like 'Core Principles' state obvious experimentation concepts. The tables and quick references are efficient, but overall it could be tightened by ~30%.	2 / 3
Actionability	The skill provides structured frameworks (hypothesis template, checklists, tables) which are useful, but lacks executable code or concrete implementation examples. There are no code snippets for setting up tracking, configuring tools, or implementing variants. The guidance is more conceptual/procedural than copy-paste ready, though the hypothesis example and metrics examples add some concreteness.	2 / 3
Workflow Clarity	The skill presents a clear sequential workflow from initial assessment through hypothesis formation, test design, implementation, running, and analysis. The pre-launch checklist with checkboxes, DO/DON'T lists during testing, and the analysis checklist provide explicit validation checkpoints. The flow is logical and well-sequenced with clear decision points.	3 / 3
Progressive Disclosure	The skill references two external files (references/sample-size-guide.md and references/test-templates.md) and related skills, which is good structure. However, no bundle files are provided, so these references are broken. Additionally, the main file is quite long (~200+ lines) with content like the full sample size table and common mistakes that could be split into reference files to keep the overview leaner.	2 / 3
	Total	9 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: freekmurze/dotfiles
Commit: 3974caa

Reviewed: 20 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.