Advanced sample size and power calculations for complex study designs including survival analysis, clustered designs, and multiple comparisons.
30
23%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/Academic Writing/sample-size-power-calculator/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear statistical niche (sample size and power calculations for complex designs) and names specific design types, which is helpful. However, it lacks a 'Use when...' clause with explicit trigger guidance, misses common user-facing keyword variations, and describes capabilities at a category level rather than listing concrete actions the skill performs.
Suggestions
Add a 'Use when...' clause with explicit triggers, e.g., 'Use when the user asks about sample size, power analysis, how many participants are needed, or designing a clinical trial or experiment.'
Include additional natural trigger terms users would say, such as 'power analysis', 'effect size', 'statistical power', 'how many subjects', 'clinical trial design', 'RCT', or 'study planning'.
List more concrete actions the skill performs, e.g., 'Computes required sample sizes, generates power curves, estimates minimum detectable effects, and adjusts for clustering, dropout, and multiple testing corrections.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (sample size and power calculations) and lists some specific design types (survival analysis, clustered designs, multiple comparisons), but doesn't describe concrete actions like 'compute required sample size', 'generate power curves', or 'compare design alternatives'. | 2 / 3 |
Completeness | Describes what the skill does but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per the rubric, a missing 'Use when...' clause should cap completeness at 2, and since the 'what' is also only moderately detailed, this scores at 1. | 1 / 3 |
Trigger Term Quality | Includes relevant terms like 'sample size', 'power calculations', 'survival analysis', 'clustered designs', and 'multiple comparisons', which are natural terms a user might use. However, it misses common variations like 'power analysis', 'effect size', 'statistical power', 'RCT', 'clinical trial design', or 'how many subjects do I need'. | 2 / 3 |
Distinctiveness Conflict Risk | The focus on 'complex study designs' with specific subtypes like survival analysis and clustered designs provides some distinctiveness, but it could overlap with a general statistics skill or a simpler sample size calculator skill. The 'advanced' qualifier helps but is somewhat vague. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
14%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is heavily padded with generic boilerplate that obscures the actual domain-specific content about sample size and power calculations. The useful content (test types, CLI parameters, usage examples) represents perhaps 20% of the document, while the rest is generic project management scaffolding, security checklists, and response templates that Claude doesn't need. The circular cross-references and repetitive sections suggest auto-generation without meaningful curation.
Suggestions
Remove all generic boilerplate sections (Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, Response Template, Output Requirements) and focus exclusively on statistical calculation guidance.
Add concrete examples showing input parameters and expected output for each test type (e.g., 'For a two-sample t-test with Cohen's d=0.5, alpha=0.05, power=0.8: N=64 per group').
Consolidate the three workflow-related sections into a single clear workflow with domain-specific validation steps (e.g., 'Verify effect size is within plausible range for the domain', 'Check that allocation ratio is feasible').
Fix circular cross-references ('See ## Prerequisites above' appearing before Prerequisites) and organize content in a logical top-down flow: quick usage → parameters → test types with examples → advanced options.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and repetitive. Contains extensive boilerplate sections (Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, Response Template) that add no domain-specific value. Multiple sections reference each other circularly ('See ## Prerequisites above', 'See ## Usage above'). The actual statistical content (test types, parameters, usage) is buried under generic project management scaffolding. Much of the content explains things Claude already knows (error handling philosophy, input validation principles). | 1 / 3 |
Actionability | The Usage section provides concrete CLI commands with specific parameters (--test ttest --effect 0.5 --alpha 0.05 --power 0.8), and the Parameters table is useful. However, there's no executable code showing how the calculations actually work, no example output, and the workflow steps are generic ('Confirm the user objective') rather than specific to sample size calculations. The skill relies entirely on an external script without showing what it does. | 2 / 3 |
Workflow Clarity | The workflow section is entirely generic and not specific to sample size calculations. Steps like 'Confirm the user objective' and 'Validate that the request matches the documented scope' are boilerplate applicable to any skill. There are no validation checkpoints specific to statistical calculations (e.g., checking effect size ranges, verifying assumptions about distributions). The 'Example run plan' is similarly generic. No feedback loops for catching statistical errors. | 1 / 3 |
Progressive Disclosure | The document is a monolithic wall of text with many sections that could be consolidated or removed. Cross-references are circular and broken ('See ## Prerequisites above' appears before Prerequisites). The single reference to 'references/audit-reference.md' is generic. No bundle files are provided to support the referenced paths. Content is poorly organized with related information scattered across multiple sections (Usage appears twice in different forms, workflow details are in three separate sections). | 1 / 3 |
Total | 5 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
73f6514
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.