case-control-study-quality-assessment-nos

Clinical Research Bias Assessment - Case-Control Study (NOS) v2.3.0. Use when you need to assess the bias of a case-control study using the Newcastle-Ottawa Scale (NOS) criteria, or when evaluating the quality of a medical paper.

Quality

62%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/Data Analysis/Case-control-study-quality-assessment-nos/SKILL.md

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-constructed description with strong trigger terms, clear 'when' guidance, and a highly distinctive niche. Its main weakness is that it doesn't enumerate the specific actions or outputs the skill produces (e.g., scoring selection, comparability, and exposure domains), which limits specificity. However, the domain is narrow enough that the description is effective for skill selection.

Suggestions

Add specific concrete actions the skill performs, e.g., 'Scores selection, comparability, and exposure domains; generates an overall quality rating and detailed justification for each criterion.'

Dimension	Reasoning	Score
Specificity	The description names the domain (clinical research bias assessment) and the specific methodology (Newcastle-Ottawa Scale for case-control studies), but does not list the concrete actions performed—e.g., scoring selection criteria, evaluating comparability, assessing exposure ascertainment. It identifies what it is but not the specific steps it takes.	2 / 3
Completeness	The description answers both 'what' (assess bias of a case-control study using NOS criteria) and 'when' (explicitly states 'Use when you need to assess the bias of a case-control study using the Newcastle-Ottawa Scale (NOS) criteria, or when evaluating the quality of a medical paper').	3 / 3
Trigger Term Quality	Includes strong natural keywords a user would say: 'bias assessment', 'case-control study', 'Newcastle-Ottawa Scale', 'NOS', 'quality of a medical paper'. These cover both the formal methodology name and common user phrasing like 'evaluate the quality' of a study.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive—it targets a very specific niche (case-control studies assessed via NOS). The combination of study type and assessment tool makes it unlikely to conflict with other skills, even other research quality assessment skills (e.g., cohort studies, RCTs, ROBINS-I).	3 / 3
	Total	11 / 12 Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill suffers heavily from boilerplate bloat — roughly 60-70% of the content is generic template text that applies to any skill and wastes tokens without adding NOS-specific value. The actual domain-relevant content (NOS evaluation workflow) is reasonable in structure but lacks concrete examples, expected output schemas, and validation checkpoints. The skill would benefit enormously from stripping all generic sections and replacing them with a concrete NOS assessment example with input/output.

Suggestions

Remove all generic boilerplate sections (When to Use, When Not to Use, Required Inputs, Validation and Safety Rules, Failure Handling, Output Contract, Recommended Workflow) — these describe behaviors Claude already knows and duplicate each other.

Add a concrete example showing a sample NOS assessment: input study description → evaluation reasoning for each domain → final JSON output with the exact schema expected by format_nos_table.py.

Complete the truncated 'PDF Text Extraction' section and provide the actual command with arguments rather than just a header.

Include the JSON schema for the structured output that format_nos_table.py expects, so the intermediate data format is unambiguous.

Dimension	Reasoning	Score
Conciseness	Extremely verbose and repetitive. The skill contains multiple redundant sections (e.g., 'When to Use', 'When Not to Use', 'Required Inputs', 'Recommended Workflow', 'Output Contract', 'Validation and Safety Rules', 'Failure Handling') that are generic boilerplate not specific to NOS assessment. The actual NOS-specific content (the Usage and Detailed Workflow sections) is buried among template filler. Sections like 'Key Features' restate the description verbatim, and 'Example Usage' says 'See ## Usage above' circularly. Much of this content explains things Claude already knows (how to handle missing inputs, how to validate, etc.).	1 / 3
Actionability	The Detailed Workflow provides some concrete steps (Selection, Comparability, Exposure evaluation) and references a specific script command (`python scripts/format_nos_table.py '<json_string>'`), but critical details are missing: no example JSON schema for the structured output, no concrete example of what a NOS evaluation looks like, and the PDF extraction section is incomplete (cuts off with just a header). The actual NOS criteria are deferred to a reference file that isn't provided.	2 / 3
Workflow Clarity	The workflow steps (Extract Metadata → Evaluate Selection/Comparability/Exposure → Synthesize → Format) are listed in a logical sequence, but there are no validation checkpoints between steps. There's no feedback loop for when evaluation is ambiguous, no example of what 'pass' vs 'fail' looks like for each criterion, and the 'Recommended Workflow' section is a generic template that duplicates the actual workflow without adding value. The detailed workflow lacks explicit verification steps.	2 / 3
Progressive Disclosure	There is a reference to `references/nos_criteria_prompts.md` which is appropriate one-level-deep disclosure, and script paths are mentioned. However, no bundle files are provided to verify these references exist, the content that should be in the main file (concrete NOS criteria examples, JSON schema) is missing, and the overall organization is poor with redundant generic sections drowning out the domain-specific content.	2 / 3
	Total	7 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: aipoch/medical-research-skills
Commit: 73f6514

Reviewed: about 14 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.