CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.

78

Quality

73%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./config/claude/skills/ab-test-setup/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

62%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description excels at trigger term coverage and distinctiveness, making it easy for Claude to know when to select it. However, it fails to explain what the skill actually does - the 'what' is almost entirely missing, replaced by vague verbs like 'plan, design, or implement.' Users and Claude won't know what concrete capabilities this skill provides.

Suggestions

Add specific concrete actions the skill performs, e.g., 'Creates experiment designs with control/variant groups, calculates required sample sizes, generates hypothesis statements, and produces variant copy alternatives.'

Replace vague verbs 'plan, design, or implement' with specific outputs like 'generates test plans, writes variant copy, defines success metrics, and structures experiment documentation.'

DimensionReasoningScore

Specificity

The description uses vague language like 'plan, design, or implement' without listing concrete actions. It doesn't specify what the skill actually does - no mention of specific capabilities like creating test variants, calculating sample sizes, or analyzing results.

1 / 3

Completeness

The 'when' is explicitly and thoroughly covered with trigger terms and use cases. However, the 'what' is weak - it only says 'plan, design, or implement' without explaining what concrete actions or outputs the skill provides.

2 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms users would say: 'A/B test,' 'split test,' 'experiment,' 'test this change,' 'variant copy,' 'multivariate test,' 'hypothesis.' These are all terms users would naturally use when needing this skill.

3 / 3

Distinctiveness Conflict Risk

Clear niche with distinct triggers specific to A/B testing and experimentation. The explicit mention of trigger terms and the cross-reference to analytics-tracking for related but different functionality helps avoid conflicts.

3 / 3

Total

9

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, comprehensive A/B testing skill with strong actionability and clear workflows. The main weakness is some verbosity—explaining concepts Claude already understands (statistical significance definitions, the peeking problem explanation) and unnecessary persona framing. The tables, checklists, and hypothesis framework provide excellent concrete guidance.

Suggestions

Remove the opening persona statement ('You are an expert...') and basic concept explanations (what statistical significance means, why peeking is bad) that Claude already knows

Tighten the 'Core Principles' section—these are standard testing principles that could be reduced to a brief reminder list rather than explained

DimensionReasoningScore

Conciseness

The content is generally efficient but includes some unnecessary framing ('You are an expert...') and explanatory text that Claude already knows (e.g., explaining what statistical significance means, what peeking is). The tables and frameworks are well-structured but could be tighter.

2 / 3

Actionability

Provides concrete frameworks (hypothesis structure with fill-in template), specific sample size tables, clear checklists, and actionable decision matrices. The examples distinguish weak vs strong hypotheses effectively, and the implementation section names specific tools.

3 / 3

Workflow Clarity

Clear sequential workflow from initial assessment through hypothesis creation, test design, implementation, running, and analysis. Includes explicit pre-launch checklist with validation steps, DO/DON'T guidance during execution, and a structured analysis checklist.

3 / 3

Progressive Disclosure

Well-organized with clear sections, appropriate use of tables for quick reference, and explicit references to deeper materials (sample-size-guide.md, test-templates.md). Content is appropriately split between overview and detailed references.

3 / 3

Total

11

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
freekmurze/dotfiles
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.