ab-test-setup

Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.

Quality

37%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No findings from the security scan

Fix and improve this skill with Tessl

tessl review fix ./.agent/skills/ab-test-setup/SKILL.md

Quality

Content

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is essentially a table of contents with minimal actionable content in the body itself. While the structure hints at a rigorous gated workflow for A/B test setup, the SKILL.md provides no concrete guidance, examples, templates, or executable steps — everything is deferred to sub-skill files that aren't available. The top-level file needs substantive overview content to be useful on its own.

Suggestions

Add a concise quick-start workflow summary in the body (e.g., a numbered checklist with 1-2 sentence descriptions of each gate and what 'passing' looks like) so the skill is usable without opening every sub-file.

Include at least one concrete example — e.g., a sample hypothesis statement that passes the quality checklist, or a filled-in test record template — to make the skill actionable at the top level.

Explicitly describe the gate/checkpoint logic: what blocks progression, what happens on failure, and how to iterate (feedback loops), rather than just labeling steps as 'Hard Gate' or 'Hard Stop'.

Trim the 'Purpose & Scope' bullets (e.g., 'Prevents peeking') or expand them into actionable constraints — as written they are too vague to guide behavior.

Dimension	Reasoning	Score
Conciseness	Reasonably brief at the top level, but the 'Purpose & Scope' section includes some filler (e.g., bullet points like 'Prevents peeking' without context are vague rather than efficient). The emoji-heavy headers add visual noise without adding information.	2 / 3
Actionability	The SKILL.md contains no concrete code, commands, examples, or executable guidance. It is essentially a table of contents pointing to sub-skills, with no actionable content in the body itself — no templates, no checklists, no sample hypotheses or metric definitions.	1 / 3
Workflow Clarity	The numbered sub-skill references imply a sequence (hypothesis → assumptions → test type → metrics → sample size → execution gate), and the 'Hard Gate' / 'Hard Stop' labels suggest validation checkpoints. However, the workflow is not explicitly described in the body — there's no explanation of what happens at gates, no feedback loops, and no description of how to proceed or recover if a gate fails.	2 / 3
Progressive Disclosure	The skill does attempt progressive disclosure by linking to 11 sub-skill files, which is a reasonable structure. However, no bundle files were provided, so we cannot verify the references resolve. More importantly, the top-level SKILL.md provides almost no usable overview content — it's nearly 100% links with no summary of what each sub-skill covers or quick-start guidance, making it hard to use without clicking through every link.	2 / 3
	Total	7 / 12 Passed

Description

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear niche (A/B test setup with structured gates) which makes it distinctive, but it lacks explicit trigger guidance ('Use when...') and misses common user-facing synonyms. The actions described are limited to 'setting up' without enumerating specific capabilities.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to plan, design, or set up an A/B test, split test, or experiment.'

Include common trigger term variations such as 'split test', 'experiment', 'variant testing', 'experimentation framework'.

List more specific concrete actions, e.g., 'Guides users through defining hypotheses, selecting success metrics, calculating sample sizes, and validating execution readiness before launching A/B tests.'

Dimension	Reasoning	Score
Specificity	Names the domain (A/B tests) and mentions some specific elements (hypothesis, metrics, execution readiness), but doesn't list concrete actions beyond 'setting up'. It describes structure rather than specific capabilities like 'define hypotheses, configure metrics, validate sample sizes'.	2 / 3
Completeness	Describes what it does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also only moderately clear, placing this at 1.	1 / 3
Trigger Term Quality	Includes 'A/B tests' which is a natural keyword, plus 'hypothesis' and 'metrics' which are relevant. However, it misses common variations like 'split test', 'experiment', 'variant testing', 'conversion testing', or 'experimentation'.	2 / 3
Distinctiveness Conflict Risk	The combination of 'A/B tests' with 'mandatory gates for hypothesis, metrics, and execution readiness' creates a clear, specific niche that is unlikely to conflict with other skills. This is a well-defined domain.	3 / 3
	Total	8 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: Dokhacgiakhoa/antigravity-ide
Path: .agent/skills/ab-test-setup/SKILL.md
Commit: c4a059b

Reviewed: 2 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.