CtrlK
BlogDocsLog inGet started
Tessl Logo

scholar-evaluation

Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.

63

1.67x
Quality

46%

Does it follow best practices?

Impact

92%

1.67x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/scholar-evaluation/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

57%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear and distinctive domain (scholarly work evaluation) and names a specific framework, which helps differentiation. However, it lacks an explicit 'Use when...' clause, misses common natural trigger terms users would employ (e.g., 'paper review', 'peer review', 'thesis'), and the stated actions remain somewhat abstract rather than concretely enumerated.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user asks to review, critique, or score an academic paper, thesis, dissertation, or manuscript.'

Include natural trigger terms users would actually say: 'paper review', 'peer review', 'academic paper', 'manuscript', 'thesis', 'dissertation', 'grade my paper'.

Make actions more concrete by listing specific outputs, e.g., 'generates dimension-level scores, highlights methodological weaknesses, and produces a prioritized revision checklist'.

DimensionReasoningScore

Specificity

Names the domain (scholarly work evaluation) and mentions several dimensions (problem formulation, methodology, analysis, writing, quantitative scoring, actionable feedback), but the actions are somewhat abstract — 'evaluate', 'providing structured assessment' — rather than listing multiple concrete discrete actions like 'score methodology rigor, flag citation gaps, generate revision checklists'.

2 / 3

Completeness

The 'what' is reasonably covered (evaluate scholarly work across multiple dimensions with scoring and feedback), but there is no explicit 'Use when...' clause or equivalent trigger guidance telling Claude when to select this skill, which per the rubric caps completeness at 2.

2 / 3

Trigger Term Quality

Includes some relevant terms like 'scholarly work', 'research quality', 'methodology', 'writing', and 'feedback', but misses common natural user phrases such as 'review my paper', 'grade this essay', 'peer review', 'academic paper', 'manuscript', 'thesis', or 'dissertation'.

2 / 3

Distinctiveness Conflict Risk

The description carves out a clear niche — structured evaluation of scholarly/academic work using a named framework (ScholarEval) with quantitative scoring — which is unlikely to conflict with general writing, editing, or coding skills.

3 / 3

Total

9

/

12

Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is significantly over-engineered and verbose for what it delivers. It spends most of its token budget describing concepts Claude already understands (what good feedback looks like, what scholarly dimensions exist, discipline-specific norms) rather than providing novel, actionable evaluation criteria. The irrelevant 'Visual Enhancement with Scientific Schematics' section and extensive 'When to Use' list further dilute the content. The actual differentiating value—the ScholarEval framework's specific criteria—is deferred to an external reference file rather than being included.

Suggestions

Remove the entire 'Visual Enhancement with Scientific Schematics' section as it's unrelated to the evaluation skill and wastes significant tokens.

Remove or drastically reduce the 'When to Use', 'Best Practices', 'Notes', and 'Contextual Considerations' sections—these describe things Claude already knows about academic evaluation.

Inline the key evaluation criteria from 'references/evaluation_framework.md' as a concise rubric table rather than deferring the actual actionable content to an external file while keeping generic descriptions in the main file.

Add a concrete, complete example of an actual evaluation output (even abbreviated) showing the expected format, scoring, and feedback structure rather than just describing the process narratively.

DimensionReasoningScore

Conciseness

Extremely verbose with extensive content Claude already knows (what types of scholarly work exist, what makes good feedback, discipline-specific norms). The 'Visual Enhancement with Scientific Schematics' section is largely irrelevant promotional content. The 'When to Use This Skill' section lists obvious use cases. Best practices like 'Maintain Objectivity' and 'Be Constructive' are things Claude inherently understands.

1 / 3

Actionability

The evaluation dimensions provide structured criteria, and there are references to scripts and external files, but the core guidance is largely descriptive rather than executable. The actual evaluation criteria are deferred to 'references/evaluation_framework.md' rather than being provided inline. The scoring rubric is generic (5-point scale descriptions Claude already knows). No concrete example of an actual evaluation output is shown.

2 / 3

Workflow Clarity

The 6-step workflow is clearly sequenced and logically ordered, but lacks validation checkpoints. There's no feedback loop for verifying evaluation quality, no checkpoint to confirm the right dimensions were selected, and no mechanism to validate that scores are calibrated or consistent. The workflow reads more like a description of what to do than a precise operational procedure.

2 / 3

Progressive Disclosure

References to 'references/evaluation_framework.md' and 'scripts/calculate_scores.py' show some progressive disclosure, but the main file itself is a monolithic wall of text (~250 lines) with much content that could be split out. The detailed dimension descriptions, contextual considerations, and integration notes could all be separate reference files. The search patterns for the reference file are a nice touch but don't compensate for the bloated main file.

2 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

metadata_version

'metadata.version' is missing

Warning

Total

10

/

11

Passed

Repository
K-Dense-AI/claude-scientific-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.