CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.

45

Quality

47%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/antigravity-awesome-skills-claude/skills/ab-test-setup/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides a well-structured procedural workflow with strong gating mechanisms for A/B test setup, which is its primary strength. However, it lacks concrete executable examples (no code, no calculator snippets, no templates) and explains several concepts Claude already understands. The monolithic structure with no supporting files means all content competes for context window space despite being a lengthy document.

Suggestions

Add concrete, executable examples: a sample size calculation snippet (e.g., Python using statsmodels), a hypothesis template with filled-in example, and a tracking verification checklist with specific tool commands.

Split detailed reference content (test type selection criteria, analysis interpretation table, documentation template) into separate referenced files to reduce the main skill's token footprint.

Remove explanatory content Claude already knows—e.g., definitions of A/B vs A/B/n vs MVT, what guardrail metrics are—and replace with terse decision criteria only.

Cut the motivational closing paragraphs and the generic 'When to Use' / 'Limitations' boilerplate, which add no actionable information.

DimensionReasoningScore

Conciseness

The skill is moderately efficient but includes some unnecessary padding—motivational closing paragraphs ('A/B testing is not about proving ideas right...'), emoji section numbering, and the 'When to Use' / 'Limitations' boilerplate at the end add little value. Several sections explain concepts Claude already knows (what an A/B test is, what guardrail metrics are).

2 / 3

Actionability

The skill provides structured checklists and clear gates, which is good procedural guidance. However, it lacks any concrete code, commands, or executable examples—no sample size calculator snippet, no analytics query template, no tracking verification script. It describes what to do but not how to do it concretely.

2 / 3

Workflow Clarity

The multi-step process is clearly sequenced with explicit hard gates (Hypothesis Lock, Execution Readiness Gate) that block progression. Refusal conditions and guardrail failure handling provide feedback loops. The numbered sections create an unambiguous sequence with validation checkpoints at critical junctures.

3 / 3

Progressive Disclosure

The entire skill is a monolithic document with no references to supporting files. All content—hypothesis templates, metrics definitions, analysis guidance, documentation templates—is inline. For a skill of this length (~180 lines), content like the analysis discipline section, documentation template, and test type selection could be split into referenced files.

1 / 3

Total

8

/

12

Passed

Description

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear niche (A/B test setup with structured gates) which makes it distinctive, but it lacks explicit trigger guidance ('Use when...') and misses common user-facing synonyms. The actions described are limited to 'setting up' without enumerating specific capabilities.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to plan, design, or set up an A/B test, split test, or experiment.'

Include common trigger term variations such as 'split test', 'experiment', 'variant testing', 'experimentation framework'.

List more specific concrete actions, e.g., 'Guides users through defining hypotheses, selecting success metrics, calculating sample sizes, and validating execution readiness before launching A/B tests.'

DimensionReasoningScore

Specificity

Names the domain (A/B tests) and mentions some specific elements (hypothesis, metrics, execution readiness), but doesn't list concrete actions beyond 'setting up'. It describes structure rather than specific capabilities like 'define hypotheses, configure metrics, validate sample sizes'.

2 / 3

Completeness

Describes what it does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also only moderately clear, placing this at 1.

1 / 3

Trigger Term Quality

Includes 'A/B tests' which is a natural keyword, plus 'hypothesis' and 'metrics' which are relevant. However, it misses common variations like 'split test', 'experiment', 'variant testing', 'conversion testing', or 'experimentation'.

2 / 3

Distinctiveness Conflict Risk

The combination of 'A/B tests' with 'mandatory gates for hypothesis, metrics, and execution readiness' creates a clear, specific niche that is unlikely to conflict with other skills. This is a well-defined domain.

3 / 3

Total

8

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
sickn33/antigravity-awesome-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.