Guided statistical analysis with test selection and reporting. Use when you need help choosing appropriate tests for your data, assumption checking, power analysis, and APA-formatted results. Best for academic research reporting, test selection guidance. For implementing specific models programmatically use statsmodels.
84
75%
Does it follow best practices?
Impact
91%
1.13xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/statistical-analysis/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates specific capabilities (test selection, assumption checking, power analysis, APA formatting), includes natural trigger terms researchers would use, and explicitly delineates its boundary with a related skill (statsmodels). The only minor issue is the use of second person ('you need help', 'your data') which the rubric guidelines say should be penalized, but the overall quality is high.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: test selection, assumption checking, power analysis, and APA-formatted results. These are distinct, well-defined statistical analysis tasks. | 3 / 3 |
Completeness | Clearly answers both what (guided statistical analysis with test selection, assumption checking, power analysis, APA-formatted reporting) and when ('Use when you need help choosing appropriate tests for your data'). Also includes a helpful boundary condition distinguishing it from statsmodels for programmatic implementation. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'statistical analysis', 'test selection', 'assumption checking', 'power analysis', 'APA-formatted results', 'academic research reporting'. These are terms researchers and students naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Clearly carves out a distinct niche focused on guided statistical test selection and academic reporting, and explicitly differentiates itself from programmatic statistical modeling ('For implementing specific models programmatically use statsmodels'). This boundary-setting reduces conflict risk significantly. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
50%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill is highly actionable with excellent executable code examples and concrete APA reporting templates, which is its primary strength. However, it is significantly too verbose—it explains many concepts Claude already knows, includes best practices and pitfalls lists that are general knowledge, and inlines content that should live in the referenced files. The workflow structure exists but lacks explicit validation-feedback loops that would make the multi-step analysis process more robust.
Suggestions
Cut the content by 50-60%: remove 'When to Use This Skill' (redundant with frontmatter), 'Best Practices,' 'Common Pitfalls,' 'Key Advantages' of Bayesian methods, textbook recommendations, and online resources—Claude already knows these.
Move detailed code examples (regression diagnostics, Bayesian t-test, full ANOVA workflow) into reference files and keep only one compact example in the main skill to demonstrate the pattern.
Add explicit validation checkpoints to the workflow: e.g., 'Run assumption_checks.py → If violations found → apply remediation from table → re-check → only proceed when assumptions satisfied or alternative test selected.'
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~500+ lines. It explains concepts Claude already knows (what effect sizes are, what p-values mean, advantages of Bayesian methods, definitions of common pitfalls like p-hacking). Sections like 'When to Use This Skill,' 'Best Practices,' 'Common Pitfalls,' and 'Key Advantages' of Bayesian methods are largely redundant for Claude. The textbook recommendations and online resources at the end add no value. | 1 / 3 |
Actionability | The skill provides fully executable Python code examples for t-tests, ANOVA, regression, Bayesian analysis, power analysis, and assumption checking. Code is copy-paste ready with specific library calls (pingouin, statsmodels, pymc) and concrete APA report templates that serve as actionable output examples. | 3 / 3 |
Workflow Clarity | There is a decision tree and a getting-started checklist that outline the workflow sequence, and assumption checking is emphasized before analysis. However, the workflow lacks explicit validation checkpoints with feedback loops—there's no 'if assumption check fails, do X, then re-check' loop built into the main workflow, and the decision tree uses vague section references rather than concrete steps with verification gates. | 2 / 3 |
Progressive Disclosure | The skill references external files (references/*.md, scripts/*.py) appropriately, but the main SKILL.md itself contains far too much inline content that should be in those reference files. The test selection guide, full code examples for every test type, effect size tables, and APA report templates could all be in referenced documents, keeping the main file as a concise overview. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (631 lines); consider splitting into references/ and linking | Warning |
metadata_version | 'metadata.version' is missing | Warning |
Total | 9 / 11 Passed | |
086de41
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.