Evaluate scientific claims and evidence quality. Use for assessing experimental design validity, identifying biases and confounders, applying evidence grading frameworks (GRADE, Cochrane Risk of Bias), or teaching critical analysis. Best for understanding evidence quality, identifying flaws. For formal peer review writing use peer-review.
76
67%
Does it follow best practices?
Impact
96%
1.02xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/scientific-critical-thinking/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates specific capabilities in scientific evidence evaluation, includes natural trigger terms that domain users would employ, and explicitly addresses both what the skill does and when to use it. The inclusion of specific frameworks (GRADE, Cochrane) and the explicit boundary with the peer-review skill demonstrate thoughtful design for disambiguation in a multi-skill environment.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'assessing experimental design validity', 'identifying biases and confounders', 'applying evidence grading frameworks (GRADE, Cochrane Risk of Bias)', and 'teaching critical analysis'. These are well-defined, concrete capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (evaluate scientific claims, assess experimental design, identify biases, apply grading frameworks) and 'when' ('Use for assessing experimental design validity...', 'Best for understanding evidence quality, identifying flaws'). Also includes a boundary condition distinguishing from peer-review skill. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'scientific claims', 'evidence quality', 'experimental design', 'biases', 'confounders', 'GRADE', 'Cochrane Risk of Bias', 'critical analysis', 'evidence grading'. Good coverage of domain-specific terms users in this field would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche around scientific evidence evaluation with distinct triggers like 'GRADE', 'Cochrane Risk of Bias', 'experimental design validity', and 'confounders'. The explicit boundary ('For formal peer review writing use peer-review') further reduces conflict risk with adjacent skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is a comprehensive but excessively verbose textbook-style reference on scientific critical thinking. It explains many concepts Claude already knows (logical fallacies, bias types, statistical concepts) at length, consuming significant token budget without adding proportional value. The content would benefit enormously from aggressive trimming to focus on the specific procedural guidance, output formats, and decision frameworks that Claude wouldn't already know, while pushing detailed taxonomies entirely into the referenced files.
Suggestions
Cut content by 60-70%: Remove explanations of concepts Claude already knows (what confirmation bias is, what p-values mean, lists of logical fallacies) and keep only the procedural workflow, decision criteria, and output format specifications.
Add a concrete worked example showing input (a scientific claim or paper excerpt) and expected output (a structured critique following the specified format), so Claude knows exactly what good output looks like.
Move the detailed bias taxonomy, fallacy catalog, and statistical pitfalls entirely into the reference files rather than duplicating them inline—this is exactly what progressive disclosure is for.
Remove or drastically reduce the 'Visual Enhancement with Scientific Schematics' section, which is tangential to the core skill of evaluating scientific claims and evidence quality.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Extensively explains concepts Claude already knows well (what confirmation bias is, what p-values mean, what logical fallacies are, basic study design hierarchy). The entire skill reads like a textbook chapter rather than actionable instructions that add novel knowledge. The 'Visual Enhancement with Scientific Schematics' section is largely irrelevant padding. Nearly every section could be cut by 60-70% without losing actionable value. | 1 / 3 |
Actionability | Provides structured checklists and frameworks (GRADE considerations, bias review steps, claim evaluation process) which give some concrete guidance. However, there are no executable code examples, no concrete input/output examples showing how to apply these frameworks to actual text, and no templates or specific formats for producing critique output. The guidance remains at the level of 'check for X' rather than showing exactly what a completed evaluation looks like. | 2 / 3 |
Workflow Clarity | The 'When Providing Critique' section provides a clear output structure (Summary → Strengths → Concerns → Recommendations → Overall Assessment), and the claim evaluation process has numbered steps. However, there are no validation checkpoints, no feedback loops for error recovery, and no clear decision points for when to apply which framework. The seven core capabilities are listed without guidance on sequencing or when to combine them. | 2 / 3 |
Progressive Disclosure | References six external files in a references/ directory with clear descriptions of what each contains, and provides grep commands for searching them. However, no bundle files were provided, so we cannot verify these references exist. The SKILL.md itself is monolithic—the vast majority of content that should be in those reference files (detailed bias taxonomies, fallacy lists, statistical pitfalls) is duplicated inline, defeating the purpose of the reference structure. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (571 lines); consider splitting into references/ and linking | Warning |
metadata_version | 'metadata.version' is missing | Warning |
Total | 9 / 11 Passed | |
cbcae7b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.