CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.

56

Quality

47%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/antigravity-ab-test-setup/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear domain (A/B testing) and hints at a structured process with gates, but lacks explicit trigger guidance ('Use when...') and concrete action verbs. It would benefit from listing specific actions and including natural user trigger terms to help Claude select it appropriately from a large skill set.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to set up an A/B test, split test, or experiment and needs structured validation before launch.'

Include common trigger term variations such as 'split test', 'experiment', 'variant testing', 'experimentation framework'.

Replace the passive 'structured guide for setting up' with concrete action verbs, e.g., 'Guides users through defining hypotheses, selecting success metrics, configuring test parameters, and validating execution readiness for A/B tests.'

DimensionReasoningScore

Specificity

Names the domain (A/B testing) and mentions some specific elements (hypothesis, metrics, execution readiness), but doesn't list concrete actions beyond 'setting up'. It describes structure rather than specific capabilities like 'define hypotheses, configure metrics, validate execution readiness'.

2 / 3

Completeness

Describes what the skill does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' itself is only moderately clear, so this scores at 1.

1 / 3

Trigger Term Quality

Includes 'A/B tests' which is a natural keyword, plus 'hypothesis' and 'metrics' which are relevant. However, it misses common variations like 'split test', 'experiment', 'variant testing', 'conversion testing', or 'experimentation'.

2 / 3

Distinctiveness Conflict Risk

A/B testing is a reasonably specific niche, and the mention of 'mandatory gates' adds some distinctiveness. However, it could overlap with general experiment design or testing skills, and without explicit trigger terms it's not fully distinct.

2 / 3

Total

7

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured process skill with strong workflow clarity through explicit gates and refusal conditions, making it effective as a procedural guide. Its main weaknesses are the lack of concrete worked examples (e.g., a sample hypothesis, a sample size calculation) and some verbosity in motivational/explanatory content that Claude doesn't need. The monolithic structure would benefit from splitting detailed sections into referenced files.

Suggestions

Add a concrete worked example: a sample hypothesis statement, sample metric definition, and sample size calculation to make the skill more actionable and copy-paste ready.

Remove or drastically shorten the motivational closing paragraphs ('Final Reminder') and the generic 'When to Use' / 'Limitations' boilerplate—these consume tokens without adding actionable value.

Split detailed reference content (test type descriptions, metrics taxonomy, analysis interpretation table) into separate referenced files to improve progressive disclosure.

Include a specific sample size calculation formula or reference to a concrete tool/command (e.g., a Python snippet using statsmodels) rather than just listing the required inputs.

DimensionReasoningScore

Conciseness

The skill is reasonably structured but includes some unnecessary padding—motivational closing paragraphs ('A/B testing is not about proving ideas right...'), emoji section numbering, and the 'When to Use' / 'Limitations' boilerplate add little value. Several sections explain concepts Claude already understands (e.g., what A/B vs A/B/n vs MVT tests are). Could be tightened by ~30%.

2 / 3

Actionability

The skill provides clear checklists and decision tables, which are actionable for a process-oriented skill. However, it lacks concrete examples—no sample hypothesis, no sample size calculation formula or tool command, no example metric definition. It describes what to do but rarely shows a worked example, keeping it at the 'incomplete guidance' level.

2 / 3

Workflow Clarity

The multi-step process is clearly sequenced with explicit hard gates (Hypothesis Lock, Execution Readiness Gate) that block progression. Refusal conditions serve as validation checkpoints, and the 'During the Test' DO/DO NOT lists provide guardrails. The feedback loop of 'if assumptions are weak → warn → delay or redesign' is present. This is a well-structured gated workflow.

3 / 3

Progressive Disclosure

All content is inline in a single monolithic file with no references to supporting documents. For a skill of this length (~150+ lines), some sections like 'Test Type Selection,' 'Metrics Definition,' or 'Analysis Discipline' could be split into referenced files. The organization within the file is decent with clear headings, but there's no layered navigation.

2 / 3

Total

9

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
boisenoise/skills-collections
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.