ab-test-setup

Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.

Quality

33%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.agent/skills/ab-test-setup/SKILL.md

Quality

Content

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This SKILL.md functions almost entirely as a table of contents with no substantive content in the body itself. It lacks any actionable guidance, concrete examples, or executable steps—everything is deferred to 11 sub-skill files that aren't provided. While the sequential structure hints at a reasonable workflow, the absence of inline summaries, validation criteria, or any concrete instructions makes the skill body insufficient as a standalone reference.

Suggestions

Add a concise end-to-end workflow summary in the body (e.g., numbered steps with the key action and gate criteria for each phase) so the skill is useful even without opening sub-files.

Include at least one concrete, actionable example—such as a sample hypothesis statement that passes the quality checklist, or a sample size calculation command/formula.

Add brief 1-sentence descriptions next to each sub-skill link explaining what it covers and when to use it, rather than bare links.

Define the 'hard gate' criteria inline (e.g., what specifically must be true to pass the Hypothesis Lock or Execution Readiness Gate) so Claude can act on them without navigating to sub-files.

Dimension	Reasoning	Score
Conciseness	Reasonably brief at the top level, but includes some unnecessary framing ('Ensure every A/B test is valid, rigorous, and safe before a single line of code is written') and emoji decoration that adds no value. The bullet points under Purpose & Scope are vague rather than instructive.	2 / 3
Actionability	The SKILL.md contains no concrete guidance, no executable steps, no code, no commands, and no examples. It is entirely a table of contents pointing to sub-skill files, with no actionable content in the body itself.	1 / 3
Workflow Clarity	The numbered sub-skill references imply a sequence (hypothesis → assumptions → test type → metrics → sample size → execution gate → during test → analysis → interpretation → record), and the 'hard gate' labels suggest validation checkpoints. However, no explicit workflow steps, validation criteria, or feedback loops are described in the body—everything is deferred to sub-files.	2 / 3
Progressive Disclosure	The skill does attempt progressive disclosure by linking to 11 sub-skill files, which is a reasonable structure. However, the links have no descriptions or summaries explaining what each module covers, making navigation harder. Additionally, no bundle files were provided, so we cannot verify the references resolve, and the sheer number of sub-files (11) without any inline summary content makes the top-level file nearly useless on its own.	2 / 3
	Total	7 / 12 Passed

Description

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear domain (A/B testing) and hints at a structured process with gates, but lacks explicit trigger guidance ('Use when...') and concrete action verbs. It would benefit from listing specific actions and including natural user trigger terms to help Claude select it appropriately from a large skill set.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to set up an A/B test, split test, or experiment and needs structured validation before launch.'

Include common trigger term variations such as 'split test', 'experiment', 'variant testing', 'experimentation framework'.

Replace the abstract 'structured guide' phrasing with concrete actions, e.g., 'Defines hypotheses, selects success metrics, validates sample size, and checks execution readiness for A/B tests.'

Dimension	Reasoning	Score
Specificity	Names the domain (A/B testing) and mentions some specific elements (hypothesis, metrics, execution readiness), but doesn't list concrete actions beyond 'setting up'. It describes structure rather than specific capabilities like 'define hypotheses, configure metrics, validate execution readiness'.	2 / 3
Completeness	Describes what the skill does (structured guide for A/B test setup with gates) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' itself is only moderately clear, so this scores at 1.	1 / 3
Trigger Term Quality	Includes 'A/B tests' which is a natural keyword, plus 'hypothesis' and 'metrics' which are relevant. However, it misses common variations like 'split test', 'experiment', 'variant testing', 'conversion testing', or 'experimentation'.	2 / 3
Distinctiveness Conflict Risk	A/B testing is a reasonably specific niche, and the mention of 'mandatory gates' adds some distinctiveness. However, it could overlap with general experiment design skills or metrics/analytics skills without clearer trigger boundaries.	2 / 3
	Total	7 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: Dokhacgiakhoa/antigravity-ide
Commit: 061c4c6

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.