CtrlK
BlogDocsLog inGet started
Tessl Logo

scientific-critical-thinking

Evaluate scientific claims and evidence quality. Use for assessing experimental design validity, identifying biases and confounders, applying evidence grading frameworks (GRADE, Cochrane Risk of Bias), or teaching critical analysis. Best for understanding evidence quality, identifying flaws. For formal peer review writing use peer-review.

76

1.02x
Quality

67%

Does it follow best practices?

Impact

96%

1.02x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/scientific-critical-thinking/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly articulates specific capabilities in scientific evidence evaluation, includes natural trigger terms that domain users would employ, and explicitly addresses both what the skill does and when to use it. The inclusion of specific frameworks (GRADE, Cochrane) and the explicit boundary with the peer-review skill demonstrate thoughtful design for disambiguation in a multi-skill environment.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'assessing experimental design validity', 'identifying biases and confounders', 'applying evidence grading frameworks (GRADE, Cochrane Risk of Bias)', and 'teaching critical analysis'. These are well-defined, concrete capabilities.

3 / 3

Completeness

Clearly answers both 'what' (evaluate scientific claims, assess experimental design, identify biases, apply grading frameworks) and 'when' ('Use for assessing experimental design validity...', 'Best for understanding evidence quality, identifying flaws'). Also includes a boundary condition distinguishing from peer-review skill.

3 / 3

Trigger Term Quality

Includes strong natural keywords users would say: 'scientific claims', 'evidence quality', 'experimental design', 'biases', 'confounders', 'GRADE', 'Cochrane Risk of Bias', 'critical analysis', 'evidence grading'. Good coverage of domain-specific terms users in this field would naturally use.

3 / 3

Distinctiveness Conflict Risk

Occupies a clear niche around scientific evidence evaluation with distinct triggers like 'GRADE', 'Cochrane Risk of Bias', 'experimental design validity', and 'confounders'. The explicit boundary ('For formal peer review writing use peer-review') further reduces conflict risk with adjacent skills.

3 / 3

Total

12

/

12

Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is a comprehensive but excessively verbose textbook-style reference on scientific critical thinking. It explains many concepts Claude already knows (logical fallacies, bias types, statistical concepts) at length, consuming significant token budget without adding proportional value. The content would benefit enormously from aggressive trimming to focus on the specific procedural guidance, output formats, and decision frameworks that Claude wouldn't already know, while pushing detailed taxonomies entirely into the referenced files.

Suggestions

Cut content by 60-70%: Remove explanations of concepts Claude already knows (what confirmation bias is, what p-values mean, lists of logical fallacies) and keep only the procedural workflow, decision criteria, and output format specifications.

Add a concrete worked example showing input (a scientific claim or paper excerpt) and expected output (a structured critique following the specified format), so Claude knows exactly what good output looks like.

Move the detailed bias taxonomy, fallacy catalog, and statistical pitfalls entirely into the reference files rather than duplicating them inline—this is exactly what progressive disclosure is for.

Remove or drastically reduce the 'Visual Enhancement with Scientific Schematics' section, which is tangential to the core skill of evaluating scientific claims and evidence quality.

DimensionReasoningScore

Conciseness

Extremely verbose at ~500+ lines. Extensively explains concepts Claude already knows well (what confirmation bias is, what p-values mean, what logical fallacies are, basic study design hierarchy). The entire skill reads like a textbook chapter rather than actionable instructions that add novel knowledge. The 'Visual Enhancement with Scientific Schematics' section is largely irrelevant padding. Nearly every section could be cut by 60-70% without losing actionable value.

1 / 3

Actionability

Provides structured checklists and frameworks (GRADE considerations, bias review steps, claim evaluation process) which give some concrete guidance. However, there are no executable code examples, no concrete input/output examples showing how to apply these frameworks to actual text, and no templates or specific formats for producing critique output. The guidance remains at the level of 'check for X' rather than showing exactly what a completed evaluation looks like.

2 / 3

Workflow Clarity

The 'When Providing Critique' section provides a clear output structure (Summary → Strengths → Concerns → Recommendations → Overall Assessment), and the claim evaluation process has numbered steps. However, there are no validation checkpoints, no feedback loops for error recovery, and no clear decision points for when to apply which framework. The seven core capabilities are listed without guidance on sequencing or when to combine them.

2 / 3

Progressive Disclosure

References six external files in a references/ directory with clear descriptions of what each contains, and provides grep commands for searching them. However, no bundle files were provided, so we cannot verify these references exist. The SKILL.md itself is monolithic—the vast majority of content that should be in those reference files (detailed bias taxonomies, fallacy lists, statistical pitfalls) is duplicated inline, defeating the purpose of the reference structure.

2 / 3

Total

7

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (571 lines); consider splitting into references/ and linking

Warning

metadata_version

'metadata.version' is missing

Warning

Total

9

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.