analyze-experiments

Designs A/B tests with proper metrics and variants, analyzes running or completed experiments, and interprets results with statistical rigor. Use when setting up experiments, checking experiment status, analyzing results, or making ship decisions.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Risky

Do not use without reviewing

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is highly actionable with a clear, validated multi-step workflow, but it is over-long and monolithic, explaining statistical concepts Claude already knows and inlining material better suited to separate reference files.

Suggestions

Trim explanations of concepts Claude already knows (e.g., the SRM definition and the power/precision interpretation tables) to reduce token cost without losing actionable thresholds.

Extract the 7-flag statistical-validity catalog and the full output template into referenced files (e.g., references/VALIDITY_FLAGS.md, references/OUTPUT_TEMPLATE.md) and link them from SKILL.md, improving progressive disclosure.

Consolidate the duplicated recommendation/rationale structure between Step 8 and 'Best Practices' to remove redundancy.

Dimension	Reasoning	Score
Conciseness	The ~527-line body is mostly actionable but padded with explanations Claude already knows (SRM meaning, power-interpretation tables, CI-width buckets), so it is not fully lean; not score 1 because the bulk is concrete guidance rather than concept primer.	2 / 3
Actionability	Concrete, executable guidance throughout: specific MCP tool calls with parameters (e.g., 'Amplitude:query_experiment with metricIds'), explicit numeric thresholds, and copy-paste-ready output/segment table templates.	3 / 3
Workflow Clarity	A clear 8-step sequence with explicit validation checkpoints ('If incomplete, explain what's missing and stop'), a pre-finalization verification checklist, and error-recovery scenarios providing feedback loops.	3 / 3
Progressive Disclosure	It is a monolithic single file with no bundle files present; the inline validity-flag catalog and full output template could be split into referenced files. Not score 1 because sections are reasonably organized and one external skill is signaled.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is strong: it states concrete capabilities in third person and pairs them with explicit, natural-language triggers for when to use the skill. No missing 'Use when...' clause or voice issues.

Dimension	Reasoning	Score
Specificity	The description names multiple concrete actions ('Designs A/B tests with proper metrics and variants, analyzes running or completed experiments, and interprets results with statistical rigor') rather than vague language.	3 / 3
Completeness	It explicitly answers both what (designs, analyzes, interprets) and when (an explicit 'Use when...' clause with several triggers), matching the top anchor.	3 / 3
Trigger Term Quality	'Use when setting up experiments, checking experiment status, analyzing results, or making ship decisions' plus the suggest_when triggers ('did this test win', 'should we ship this') cover natural terms users would actually say.	3 / 3
Distinctiveness Conflict Risk	It occupies a clear niche (Amplitude A/B experiment analysis) with distinct triggers unlikely to fire for unrelated skills.	3 / 3
	Total	12 / 12 Passed

Validation

87%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 14 / 16 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (528 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	14 / 16 Passed

Repository: amplitude/builder-skills
Commit: 22b0634

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.