Content
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill excels at actionability and workflow clarity, providing a thorough, well-sequenced experiment analysis framework with specific tool calls, thresholds, and validation checkpoints. However, it is severely over-long and monolithic—it explains statistical concepts Claude already knows (power analysis, p-values, SRM, Simpson's Paradox), includes massive output templates inline, and repeats information across steps. The content would benefit enormously from splitting into a concise overview SKILL.md with references to separate files for the output template, statistical thresholds reference, and scenario handling.
Suggestions
Extract the Step 8 output template, key scenarios section, and best practices into separate referenced files (e.g., OUTPUT_TEMPLATE.md, SCENARIOS.md) to reduce SKILL.md to a concise overview with clear navigation.
Remove explanations of statistical concepts Claude already knows (what SRM is, what p-values mean, what statistical power represents) and replace with just the thresholds and decision rules.
Consolidate the power/precision analysis that appears in Steps 2, 6, and 8 into a single location to eliminate redundancy.
Add a brief 'Quick Reference' section at the top summarizing the 8 steps in 2-3 lines each, with the detailed instructions in a separate file for when Claude needs the full workflow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | This skill is extremely verbose at ~400+ lines. It over-explains concepts Claude already understands (what SRM means, what p-values indicate, what statistical power is), provides exhaustive templates and formatting instructions, and repeats information across steps (power analysis appears in Steps 2, 6, and 8). Many sections could be condensed by 50-70% without losing actionable content. | 1 / 3 |
Actionability | The skill provides highly specific, concrete guidance: exact API calls (Amplitude:query_experiment, Amplitude:get_feedback_insights), specific parameter usage (groupBy, metricIds, entityTypes), precise thresholds (p < 0.05, power > 80%, CI width < 5%), and detailed table formats. Every step maps to specific tool calls with clear parameters. | 3 / 3 |
Workflow Clarity | The 8-step workflow is clearly sequenced with explicit validation checkpoints (Step 2 data quality checks before analysis, Step 1 validation gate, Step 8 verification checklist). It includes feedback loops (extend duration if underpowered, fix and re-validate), error recovery scenarios (inconclusive results, guardrail regressions), and clear stop conditions ('explain what's missing and stop'). | 3 / 3 |
Progressive Disclosure | The entire skill is a monolithic wall of text with no bundle files or external references to offload detailed content. The output template (Step 8), segment analysis formatting, key scenarios, and best practices could all be separate reference files. The only external reference is a brief mention of 'setup-experiment-and-flags' skill at the very end. For a skill this long, the lack of content splitting is a significant organizational weakness. | 1 / 3 |
Total | 8 / 12 Passed |