Advanced sample size and power calculations for complex study designs including survival analysis, clustered designs, and multiple comparisons.
44
31%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/Academic Writing/sample-size-power-calculator/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear and distinctive statistical niche but falls short on completeness by lacking any explicit 'Use when...' trigger guidance. The specificity could be improved by listing concrete actions rather than just study design types. Trigger terms are reasonable but miss common user phrasings for this domain.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about sample size, power analysis, how many subjects are needed, or designing clinical trials and experiments.'
Include additional natural trigger terms users might say: 'power analysis', 'effect size', 'statistical power', 'how many participants', 'clinical trial design', 'RCT sample size'.
List concrete actions the skill performs, e.g., 'Computes required sample sizes, generates power curves, estimates minimum detectable effects, and adjusts for clustering and multiplicity.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (sample size and power calculations) and lists some specific design types (survival analysis, clustered designs, multiple comparisons), but doesn't describe concrete actions like 'compute required sample size', 'generate power curves', or 'output effect size tables'. | 2 / 3 |
Completeness | Describes what the skill does but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and since the 'what' is also only moderately detailed, this scores at 1. | 1 / 3 |
Trigger Term Quality | Includes relevant terms like 'sample size', 'power calculations', 'survival analysis', 'clustered designs', and 'multiple comparisons', which are natural keywords. However, it misses common variations users might say such as 'power analysis', 'how many subjects', 'statistical power', 'RCT', 'clinical trial design', or 'effect size'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of 'sample size and power calculations' with specific complex study designs (survival analysis, clustered designs, multiple comparisons) creates a clear niche that is unlikely to conflict with other skills. This is a well-defined statistical specialty area. | 3 / 3 |
Total | 8 / 12 Passed |
Implementation
22%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is heavily padded with generic template boilerplate (security checklists, risk assessments, lifecycle status, response templates) that adds no value specific to sample size and power calculations. The actual domain-specific content—CLI usage, parameters, and test types—is reasonable but shallow, with no guidance on the statistical concepts, formulas, or decision-making needed for complex designs like survival analysis or clustered studies. The workflow is entirely generic and lacks validation steps specific to statistical calculations.
Suggestions
Remove or drastically reduce generic boilerplate sections (Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, Response Template) and replace with domain-specific content such as formulas, decision trees for choosing test types, and interpretation guidance.
Add concrete examples with expected outputs for each test type (e.g., 'For a two-sample t-test with Cohen's d=0.5, alpha=0.05, power=0.8 → n=64 per group') to make the skill truly actionable.
Replace the generic workflow with a statistical workflow that includes validation checkpoints (e.g., verify effect size is clinically meaningful, check test assumptions, validate output against known benchmarks).
Fix circular cross-references ('See ## Prerequisites above' in a section that appears before Prerequisites) and consolidate redundant sections.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and repetitive. Multiple sections reference each other circularly ('See ## Prerequisites above', 'See ## Usage above', 'See ## Workflow above'). The 'When to Use' section repeats the description verbatim. Boilerplate sections like Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, and Response Template add significant token overhead without providing actionable value for the actual task of sample size calculation. Much content is generic template filler rather than domain-specific guidance. | 1 / 3 |
Actionability | The Usage section provides concrete CLI commands with specific parameters, and the Parameters table is useful. However, the actual statistical computation logic is entirely delegated to an opaque `scripts/main.py` with no explanation of the underlying calculations, formulas, or how to handle the advertised complex designs (survival analysis, clustered designs, multiple comparisons). The workflow steps are generic process steps, not specific to power analysis. | 2 / 3 |
Workflow Clarity | The workflow section is entirely generic ('Confirm the user objective', 'Validate that the request matches the documented scope') with no steps specific to sample size calculation. There are no validation checkpoints for statistical correctness (e.g., verifying effect size is reasonable, checking assumptions of the statistical test). The 'Example run plan' is also generic. No feedback loops for verifying calculation results against known benchmarks or sanity checks. | 1 / 3 |
Progressive Disclosure | There is some structure with sections and a reference to `references/audit-reference.md`, but the document is monolithic with many sections that could be consolidated or removed. The circular cross-references ('See ## Prerequisites above') are confusing rather than helpful. Content is not well-organized for discovery—the actual usage examples and parameters are buried among boilerplate sections. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
8277276
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.