CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.

68

1.09x
Quality

51%

Does it follow best practices?

Impact

100%

1.09x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/antigravity-ab-test-setup/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear niche (A/B test setup with structured gates) and is distinctive, but it lacks explicit trigger guidance ('Use when...') and could be more specific about the concrete actions it performs. Adding natural user trigger terms and a 'when to use' clause would significantly improve skill selection accuracy.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to set up an A/B test, design an experiment, or plan a split test.'

Include more natural trigger term variations such as 'experiment,' 'split test,' 'variant testing,' 'conversion optimization,' or 'test design.'

List specific concrete actions, e.g., 'Guides users through defining hypotheses, selecting success metrics, calculating sample sizes, and validating execution readiness before launching A/B tests.'

DimensionReasoningScore

Specificity

Names the domain (A/B testing) and describes it as a 'structured guide' with 'mandatory gates for hypothesis, metrics, and execution readiness,' but doesn't list multiple concrete actions like creating hypotheses, defining metrics, validating sample sizes, etc.

2 / 3

Completeness

Describes what it does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric should cap completeness at 2, and the 'what' is also somewhat vague, bringing it closer to 1.

1 / 3

Trigger Term Quality

Includes 'A/B tests,' 'hypothesis,' and 'metrics' which are relevant keywords, but misses common variations users might say like 'experiment,' 'split test,' 'variant testing,' 'conversion,' or 'statistical significance.'

2 / 3

Distinctiveness Conflict Risk

A/B testing with mandatory gates for hypothesis, metrics, and execution readiness is a clearly defined niche that is unlikely to conflict with other skills; the combination of A/B testing and structured gates makes it quite distinctive.

3 / 3

Total

8

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured process skill with strong workflow clarity and explicit gates/checkpoints, which is its primary strength. However, it suffers from a lack of concrete examples (no sample hypothesis, no calculation tools/formulas) and includes some unnecessary philosophical content. The monolithic structure could benefit from splitting detailed sections into referenced files.

Suggestions

Add a concrete example of a well-formed hypothesis with all required components (observation, change, direction, audience, MDE) to make the hypothesis gate actionable.

Include a specific sample size calculation example or reference a tool/formula (e.g., a Python snippet using statsmodels or a specific online calculator command).

Remove the 'Final Reminder' section and trim the 'Purpose & Scope' to a single line—these are motivational rather than actionable.

Consider splitting 'Analyzing Results' and 'Documentation & Learning' into separate referenced files to keep the main skill focused on the setup workflow.

DimensionReasoningScore

Conciseness

The skill is reasonably structured but includes motivational/philosophical content ('A/B testing is not about proving ideas right...') and explains concepts Claude already knows (what A/B, A/B/n, MVT, and Split URL tests are). The 'Final Reminder' and 'Purpose & Scope' sections add little actionable value.

2 / 3

Actionability

The skill provides structured checklists and clear gates, which is good for a process-oriented skill. However, it lacks concrete examples—no sample hypothesis, no sample size calculation formula or tool command, no example metric definition. It describes what to do but rarely shows how with specifics.

2 / 3

Workflow Clarity

The multi-step process is clearly sequenced with numbered stages, explicit hard gates ('Do NOT proceed until confirmed,' 'Execution Readiness Gate'), and clear validation checkpoints. The refusal conditions and guardrail failure handling provide good feedback loops for error recovery.

3 / 3

Progressive Disclosure

The content is well-organized with clear headers and sections, but it's a long monolithic document (~150 lines of substantive content) with no references to external files. The metrics definition, analysis, and documentation sections could be split into separate reference files to keep the main skill lean.

2 / 3

Total

9

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
boisenoise/skills-collections
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.