CtrlK
BlogDocsLog inGet started
Tessl Logo

analyze-experiments

Designs A/B tests with proper metrics and variants, analyzes running or completed experiments, and interprets results with statistical rigor. Use when setting up experiments, checking experiment status, analyzing results, or making ship decisions.

80

Quality

77%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

Optimize this skill with Tessl

npx tessl skill review --optimize ./analytics-skills/skills/analyze-experiment/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly articulates specific capabilities around A/B testing and experimentation, includes natural trigger terms users would employ, and explicitly states both what the skill does and when to use it. The description is concise, uses third person voice correctly, and carves out a distinct niche that minimizes conflict risk with other skills.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'Designs A/B tests with proper metrics and variants', 'analyzes running or completed experiments', 'interprets results with statistical rigor'. These are clear, actionable capabilities.

3 / 3

Completeness

Clearly answers both what ('Designs A/B tests with proper metrics and variants, analyzes running or completed experiments, interprets results with statistical rigor') and when ('Use when setting up experiments, checking experiment status, analyzing results, or making ship decisions') with explicit trigger guidance.

3 / 3

Trigger Term Quality

Includes strong natural trigger terms users would say: 'A/B tests', 'experiments', 'metrics', 'variants', 'experiment status', 'results', 'ship decisions'. These cover the natural vocabulary of someone working with experimentation.

3 / 3

Distinctiveness Conflict Risk

The A/B testing and experimentation niche is clearly distinct from other skills. Terms like 'A/B tests', 'variants', 'ship decisions', and 'statistical rigor' create a well-defined domain that is unlikely to conflict with general data analysis or other skills.

3 / 3

Total

12

/

12

Passed

Implementation

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill excels at actionability and workflow clarity, providing a thorough, well-sequenced experiment analysis framework with specific tool calls, thresholds, and validation checkpoints. However, it is severely over-long and monolithic—it explains statistical concepts Claude already knows (power analysis, p-values, SRM, Simpson's Paradox), includes massive output templates inline, and repeats information across steps. The content would benefit enormously from splitting into a concise overview SKILL.md with references to separate files for the output template, statistical thresholds reference, and scenario handling.

Suggestions

Extract the Step 8 output template, key scenarios section, and best practices into separate referenced files (e.g., OUTPUT_TEMPLATE.md, SCENARIOS.md) to reduce SKILL.md to a concise overview with clear navigation.

Remove explanations of statistical concepts Claude already knows (what SRM is, what p-values mean, what statistical power represents) and replace with just the thresholds and decision rules.

Consolidate the power/precision analysis that appears in Steps 2, 6, and 8 into a single location to eliminate redundancy.

Add a brief 'Quick Reference' section at the top summarizing the 8 steps in 2-3 lines each, with the detailed instructions in a separate file for when Claude needs the full workflow.

DimensionReasoningScore

Conciseness

This skill is extremely verbose at ~400+ lines. It over-explains concepts Claude already understands (what SRM means, what p-values indicate, what statistical power is), provides exhaustive templates and formatting instructions, and repeats information across steps (power analysis appears in Steps 2, 6, and 8). Many sections could be condensed by 50-70% without losing actionable content.

1 / 3

Actionability

The skill provides highly specific, concrete guidance: exact API calls (Amplitude:query_experiment, Amplitude:get_feedback_insights), specific parameter usage (groupBy, metricIds, entityTypes), precise thresholds (p < 0.05, power > 80%, CI width < 5%), and detailed table formats. Every step maps to specific tool calls with clear parameters.

3 / 3

Workflow Clarity

The 8-step workflow is clearly sequenced with explicit validation checkpoints (Step 2 data quality checks before analysis, Step 1 validation gate, Step 8 verification checklist). It includes feedback loops (extend duration if underpowered, fix and re-validate), error recovery scenarios (inconclusive results, guardrail regressions), and clear stop conditions ('explain what's missing and stop').

3 / 3

Progressive Disclosure

The entire skill is a monolithic wall of text with no bundle files or external references to offload detailed content. The output template (Step 8), segment analysis formatting, key scenarios, and best practices could all be separate reference files. The only external reference is a brief mention of 'setup-experiment-and-flags' skill at the very end. For a skill this long, the lack of content splitting is a significant organizational weakness.

1 / 3

Total

8

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (528 lines); consider splitting into references/ and linking

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
amplitude/builder-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.