scholar-evaluation

Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.

Install with Tessl CLI

npx tessl i github:K-Dense-AI/claude-scientific-skills --skill scholar-evaluation

What are skills?

1.67x

Review — 68%

Does it follow best practices?

If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:

npx tessl skill review --optimize ./path/to/skill

Learn more

Evaluation — 92%

↑ 1.67x

Agent success when using this skill

Validation — 10 / 11 Passed

Validation for skill structure

SKILL.md

Review

Evals

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description excels at specificity and distinctiveness by naming concrete evaluation dimensions and a specific framework. However, it lacks explicit trigger guidance ('Use when...') and relies on formal academic terminology that users may not naturally use when requesting paper reviews or academic feedback.

Suggestions

Add a 'Use when...' clause with trigger scenarios like 'Use when reviewing academic papers, theses, dissertations, or when the user asks for paper feedback or peer review'

Include common user terms alongside formal language: 'paper review', 'grade paper', 'thesis feedback', 'academic writing review', 'research paper'

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'evaluate scholarly work', 'structured assessment across research quality dimensions', and explicitly names dimensions (problem formulation, methodology, analysis, writing) plus outputs (quantitative scoring, actionable feedback).	3 / 3
Completeness	Clearly answers 'what' (evaluate scholarly work with structured assessment and scoring), but lacks explicit 'when' guidance. No 'Use when...' clause or equivalent trigger guidance to indicate when Claude should select this skill.	2 / 3
Trigger Term Quality	Includes some relevant terms like 'scholarly work', 'research quality', 'methodology', but uses formal academic language. Missing common user terms like 'paper review', 'academic paper', 'thesis', 'dissertation', 'peer review', 'grade paper'.	2 / 3
Distinctiveness Conflict Risk	Clear niche focused on scholarly/academic evaluation with the specific 'ScholarEval framework' identifier. Distinct from general writing feedback or document analysis skills due to academic focus and named framework.	3 / 3
	Total	10 / 12 Passed

Implementation

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides a well-structured evaluation framework with clear workflow progression and appropriate use of external references. However, it suffers from verbosity (explaining obvious concepts, tangential sections on schematics) and lacks concrete executable examples showing actual evaluation output. The actionability would improve significantly with sample evaluation text demonstrating the scoring in practice.

Suggestions

Remove or drastically shorten the 'Visual Enhancement with Scientific Schematics' section - it's tangential to the core evaluation skill and adds ~200 tokens of marginally relevant content

Add a concrete example showing actual evaluation output for one dimension (e.g., sample text evaluating a methodology section with specific scores and feedback)

Trim explanatory content about work types and feedback principles - Claude already understands these concepts; focus on the specific ScholarEval criteria that are novel

Replace the abstract dimension descriptions with brief rubric snippets showing what distinguishes a score of 3 vs 5 for at least one dimension

Dimension	Reasoning	Score
Conciseness	The skill contains significant verbosity, including sections explaining concepts Claude already knows (what different work types are, basic feedback principles). The 'Visual Enhancement' section is tangential to the core evaluation skill and adds unnecessary tokens.	2 / 3
Actionability	While the skill provides structured workflows and dimension lists, it lacks concrete executable examples. The evaluation criteria are described abstractly rather than demonstrated with actual evaluation text. The script usage shown is minimal and the referenced files may not exist.	2 / 3
Workflow Clarity	The 6-step workflow is clearly sequenced with logical progression from scope definition through actionable feedback. Each step has clear substeps and the process includes synthesis and contextual adjustment phases.	3 / 3
Progressive Disclosure	The skill appropriately references external files (references/evaluation_framework.md, scripts/calculate_scores.py) for detailed criteria and tooling, keeping the main skill as an overview. Navigation is clear with well-signaled references.	3 / 3
	Total	10 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
metadata_version	'metadata.version' is missing	Warning

	Total	10 / 11 Passed

Reviewed: 2 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.