CtrlK
BlogDocsLog inGet started
Tessl Logo

ab-test-setup

Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.

48

Quality

51%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/antigravity-ab-test-setup/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear niche (A/B test setup with structured gates) which makes it distinctive, but it lacks explicit trigger guidance ('Use when...') and could be more specific about the concrete actions it performs. The trigger terms cover the core concept but miss common synonyms and variations.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to set up an A/B test, split test, or experiment and needs structured validation before launch.'

Include common synonyms and variations as trigger terms: 'split test', 'experiment design', 'variant testing', 'conversion experiment'.

List more concrete actions, e.g., 'Guides users through defining a hypothesis, selecting success metrics, calculating sample size, and validating execution readiness before launching an A/B test.'

DimensionReasoningScore

Specificity

Names the domain (A/B testing) and mentions some specific elements (hypothesis, metrics, execution readiness), but doesn't list concrete actions beyond 'setting up'. It describes structure rather than specific capabilities like 'define hypotheses, configure metrics, validate execution readiness'.

2 / 3

Completeness

Describes what the skill does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' itself is only moderately clear, so this scores at 1.

1 / 3

Trigger Term Quality

Includes 'A/B tests' which is a natural keyword, plus 'hypothesis' and 'metrics' which are relevant. However, it misses common variations like 'split test', 'experiment', 'variant testing', 'conversion testing', or 'feature flag'.

2 / 3

Distinctiveness Conflict Risk

A/B testing with mandatory gates for hypothesis, metrics, and execution readiness is a fairly distinct niche. It's unlikely to conflict with other skills due to the specific combination of A/B testing and structured gate-based workflow.

3 / 3

Total

8

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured process skill with strong workflow clarity through explicit gates and refusal conditions. Its main weaknesses are a lack of concrete worked examples (e.g., a sample hypothesis, a sample size calculation) and some verbosity in motivational/explanatory content that Claude doesn't need. The monolithic structure is acceptable but could benefit from splitting detailed reference sections into separate files.

Suggestions

Add a concrete worked example: a sample hypothesis statement, sample metric definition, and sample size calculation to make the skill more actionable and copy-paste ready.

Remove or drastically shorten the motivational closing ('Final Reminder') and the generic 'When to Use' / 'Limitations' boilerplate—these consume tokens without adding actionable guidance.

Include a specific sample size calculation formula or reference to a tool/command (e.g., a Python snippet using statsmodels or an online calculator URL) to make Section 7 executable rather than descriptive.

DimensionReasoningScore

Conciseness

The skill is reasonably structured but includes some unnecessary padding—motivational closing paragraphs ('A/B testing is not about proving ideas right...'), emoji section numbering, and the 'When to Use' / 'Limitations' boilerplate add little value. Several sections explain concepts Claude already understands (e.g., what A/B vs A/B/n vs MVT tests are). Could be tightened by ~30%.

2 / 3

Actionability

The skill provides clear checklists and gate conditions, which are actionable for a process-oriented skill. However, it lacks concrete examples—no sample hypothesis, no sample size calculation formula or tool command, no example metric definition. It describes what to do but rarely shows a worked example, keeping it at the 'some concrete guidance but incomplete' level.

2 / 3

Workflow Clarity

The multi-step process is clearly sequenced with explicit hard gates (Hypothesis Lock, Execution Readiness Gate) that block progression. Refusal conditions serve as validation checkpoints, and the 'During the Test' DO/DO NOT lists provide guardrails. The feedback loop of 'if assumptions are weak → warn → recommend delaying' is present. This is a well-structured gated workflow.

3 / 3

Progressive Disclosure

The content is a single monolithic file with no references to supporting documents. For a skill of this length (~150 lines of substantive content), sections like metrics definition, test type selection, and analysis discipline could be split into referenced files. However, the internal section structure is clear and navigable, preventing a score of 1.

2 / 3

Total

9

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
boisenoise/skills-collections
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.