CtrlK
BlogDocsLog inGet started
Tessl Logo

sample-size-power-calculator

Advanced sample size and power calculations for complex study designs including survival analysis, clustered designs, and multiple comparisons.

44

Quality

31%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/Academic Writing/sample-size-power-calculator/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear and distinctive statistical niche but falls short on completeness by lacking any explicit 'Use when...' trigger guidance. The specificity could be improved by listing concrete actions rather than just study design types. Trigger terms are reasonable but miss common user phrasings for this domain.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about sample size, power analysis, how many subjects are needed, or designing clinical trials and experiments.'

Include additional natural trigger terms users might say: 'power analysis', 'effect size', 'statistical power', 'how many participants', 'clinical trial design', 'RCT sample size'.

List concrete actions the skill performs, e.g., 'Computes required sample sizes, generates power curves, estimates minimum detectable effects, and adjusts for clustering and multiplicity.'

DimensionReasoningScore

Specificity

Names the domain (sample size and power calculations) and lists some specific design types (survival analysis, clustered designs, multiple comparisons), but doesn't describe concrete actions like 'compute required sample size', 'generate power curves', or 'output effect size tables'.

2 / 3

Completeness

Describes what the skill does but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and since the 'what' is also only moderately detailed, this scores at 1.

1 / 3

Trigger Term Quality

Includes relevant terms like 'sample size', 'power calculations', 'survival analysis', 'clustered designs', and 'multiple comparisons', which are natural keywords. However, it misses common variations users might say such as 'power analysis', 'how many subjects', 'statistical power', 'RCT', 'clinical trial design', or 'effect size'.

2 / 3

Distinctiveness Conflict Risk

The combination of 'sample size and power calculations' with specific complex study designs (survival analysis, clustered designs, multiple comparisons) creates a clear niche that is unlikely to conflict with other skills. This is a well-defined statistical specialty area.

3 / 3

Total

8

/

12

Passed

Implementation

22%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is heavily padded with generic template boilerplate (security checklists, risk assessments, lifecycle status, response templates) that adds no value specific to sample size and power calculations. The actual domain-specific content—CLI usage, parameters, and test types—is reasonable but shallow, with no guidance on the statistical concepts, formulas, or decision-making needed for complex designs like survival analysis or clustered studies. The workflow is entirely generic and lacks validation steps specific to statistical calculations.

Suggestions

Remove or drastically reduce generic boilerplate sections (Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, Response Template) and replace with domain-specific content such as formulas, decision trees for choosing test types, and interpretation guidance.

Add concrete examples with expected outputs for each test type (e.g., 'For a two-sample t-test with Cohen's d=0.5, alpha=0.05, power=0.8 → n=64 per group') to make the skill truly actionable.

Replace the generic workflow with a statistical workflow that includes validation checkpoints (e.g., verify effect size is clinically meaningful, check test assumptions, validate output against known benchmarks).

Fix circular cross-references ('See ## Prerequisites above' in a section that appears before Prerequisites) and consolidate redundant sections.

DimensionReasoningScore

Conciseness

Extremely verbose and repetitive. Multiple sections reference each other circularly ('See ## Prerequisites above', 'See ## Usage above', 'See ## Workflow above'). The 'When to Use' section repeats the description verbatim. Boilerplate sections like Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, and Response Template add significant token overhead without providing actionable value for the actual task of sample size calculation. Much content is generic template filler rather than domain-specific guidance.

1 / 3

Actionability

The Usage section provides concrete CLI commands with specific parameters, and the Parameters table is useful. However, the actual statistical computation logic is entirely delegated to an opaque `scripts/main.py` with no explanation of the underlying calculations, formulas, or how to handle the advertised complex designs (survival analysis, clustered designs, multiple comparisons). The workflow steps are generic process steps, not specific to power analysis.

2 / 3

Workflow Clarity

The workflow section is entirely generic ('Confirm the user objective', 'Validate that the request matches the documented scope') with no steps specific to sample size calculation. There are no validation checkpoints for statistical correctness (e.g., verifying effect size is reasonable, checking assumptions of the statistical test). The 'Example run plan' is also generic. No feedback loops for verifying calculation results against known benchmarks or sanity checks.

1 / 3

Progressive Disclosure

There is some structure with sections and a reference to `references/audit-reference.md`, but the document is monolithic with many sections that could be consolidated or removed. The circular cross-references ('See ## Prerequisites above') are confusing rather than helpful. Content is not well-organized for discovery—the actual usage examples and parameters are buried among boilerplate sections.

2 / 3

Total

6

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
aipoch/medical-research-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.