Evaluate scientific claims and evidence quality. Use for assessing experimental design validity, identifying biases and confounders, applying evidence grading frameworks (GRADE, Cochrane Risk of Bias), or teaching critical analysis. Best for understanding evidence quality, identifying flaws. For formal peer review writing use peer-review.
76
67%
Does it follow best practices?
Impact
96%
1.02xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/scientific-critical-thinking/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates specific capabilities, includes natural trigger terms from the scientific evidence evaluation domain, and explicitly addresses both what the skill does and when to use it. The inclusion of a boundary condition pointing to the peer-review skill is a nice touch that reduces ambiguity. The description is concise yet comprehensive.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: assessing experimental design validity, identifying biases and confounders, applying evidence grading frameworks (GRADE, Cochrane Risk of Bias), and teaching critical analysis. | 3 / 3 |
Completeness | Clearly answers both what ('evaluate scientific claims and evidence quality, assess experimental design validity, identify biases...') and when ('Use for assessing experimental design validity... Best for understanding evidence quality, identifying flaws'). Also includes a boundary condition distinguishing it from peer-review skill. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'scientific claims', 'evidence quality', 'experimental design', 'biases', 'confounders', 'GRADE', 'Cochrane Risk of Bias', 'critical analysis', 'evidence grading'. These cover a good range of terms a user working in this domain would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche around scientific evidence evaluation with distinct triggers like 'GRADE', 'Cochrane Risk of Bias', 'experimental design validity'. The explicit boundary ('For formal peer review writing use peer-review') further reduces conflict risk with adjacent skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is a comprehensive but excessively verbose reference document that explains many concepts Claude already knows (biases, fallacies, statistical principles, scientific methodology). It would benefit enormously from moving detailed taxonomies and checklists into the referenced files and keeping only procedural guidance and novel information in SKILL.md. The tangential 'Visual Enhancement with Scientific Schematics' section adds unnecessary length without contributing to the core skill.
Suggestions
Move the detailed bias taxonomy, logical fallacy catalog, statistical checklist, and evidence hierarchy content into their respective reference files, keeping only brief summaries and cross-references in SKILL.md
Remove or drastically reduce the 'Visual Enhancement with Scientific Schematics' section, which is tangential to the core scientific critical thinking skill
Add a concrete worked example showing how to apply the evaluation framework to a real or realistic paper excerpt, with specific input and expected output
Add a concise decision flowchart or quick-reference summary at the top showing when to use which capability and in what order, rather than listing seven parallel sections without sequencing guidance
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Extensively explains concepts Claude already knows well (what confirmation bias is, what p-values mean, what logical fallacies are, basic scientific methodology). The entire document reads like a textbook chapter rather than a skill that adds novel, actionable information. The 'Visual Enhancement with Scientific Schematics' section is largely tangential padding. | 1 / 3 |
Actionability | Provides structured checklists and frameworks (GRADE considerations, bias detection steps, claim evaluation process) which give some concrete guidance. However, there are no executable code examples, no concrete input/output examples showing how to apply the skill to an actual paper, and guidance remains at the level of 'check for X' rather than demonstrating application with specific examples. | 2 / 3 |
Workflow Clarity | The skill provides numbered steps within each capability section and a structured feedback template (Summary → Strengths → Concerns → Recommendations → Overall Assessment). However, there are no validation checkpoints or feedback loops for the overall evaluation process, and the seven core capabilities lack clear sequencing guidance on when/how to combine them for a complete evaluation. | 2 / 3 |
Progressive Disclosure | References to external files (references/scientific_method.md, references/common_biases.md, etc.) are well-signaled and one level deep, which is good. However, the SKILL.md itself is monolithic with enormous amounts of content that should be in those reference files rather than inline. The bias taxonomy, logical fallacy catalog, and statistical checklist are all detailed enough to belong in the referenced files, not the main skill. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (571 lines); consider splitting into references/ and linking | Warning |
metadata_version | 'metadata.version' is missing | Warning |
Total | 9 / 11 Passed | |
25e1c0f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.