Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.
Install with Tessl CLI
npx tessl i github:K-Dense-AI/claude-scientific-skills --skill scholar-evaluation77
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillEvaluation — 92%
↑ 1.67xAgent success when using this skill
Validation for skill structure
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description excels at specificity and distinctiveness by naming concrete evaluation dimensions and a specific framework. However, it lacks explicit trigger guidance ('Use when...') and relies on formal academic terminology that users may not naturally use when requesting paper reviews or academic feedback.
Suggestions
Add a 'Use when...' clause with trigger scenarios like 'Use when reviewing academic papers, theses, dissertations, or when the user asks for paper feedback or peer review'
Include common user terms alongside formal language: 'paper review', 'grade paper', 'thesis feedback', 'academic writing review', 'research paper'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'evaluate scholarly work', 'structured assessment across research quality dimensions', and explicitly names dimensions (problem formulation, methodology, analysis, writing) plus outputs (quantitative scoring, actionable feedback). | 3 / 3 |
Completeness | Clearly answers 'what' (evaluate scholarly work with structured assessment and scoring), but lacks explicit 'when' guidance. No 'Use when...' clause or equivalent trigger guidance to indicate when Claude should select this skill. | 2 / 3 |
Trigger Term Quality | Includes some relevant terms like 'scholarly work', 'research quality', 'methodology', but uses formal academic language. Missing common user terms like 'paper review', 'academic paper', 'thesis', 'dissertation', 'peer review', 'grade paper'. | 2 / 3 |
Distinctiveness Conflict Risk | Clear niche focused on scholarly/academic evaluation with the specific 'ScholarEval framework' identifier. Distinct from general writing feedback or document analysis skills due to academic focus and named framework. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a well-structured evaluation framework with clear workflow progression and appropriate use of external references. However, it suffers from verbosity (explaining obvious concepts, tangential sections on schematics) and lacks concrete executable examples showing actual evaluation output. The actionability would improve significantly with sample evaluation text demonstrating the scoring in practice.
Suggestions
Remove or drastically shorten the 'Visual Enhancement with Scientific Schematics' section - it's tangential to the core evaluation skill and adds ~200 tokens of marginally relevant content
Add a concrete example showing actual evaluation output for one dimension (e.g., sample text evaluating a methodology section with specific scores and feedback)
Trim explanatory content about work types and feedback principles - Claude already understands these concepts; focus on the specific ScholarEval criteria that are novel
Replace the abstract dimension descriptions with brief rubric snippets showing what distinguishes a score of 3 vs 5 for at least one dimension
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill contains significant verbosity, including sections explaining concepts Claude already knows (what different work types are, basic feedback principles). The 'Visual Enhancement' section is tangential to the core evaluation skill and adds unnecessary tokens. | 2 / 3 |
Actionability | While the skill provides structured workflows and dimension lists, it lacks concrete executable examples. The evaluation criteria are described abstractly rather than demonstrated with actual evaluation text. The script usage shown is minimal and the referenced files may not exist. | 2 / 3 |
Workflow Clarity | The 6-step workflow is clearly sequenced with logical progression from scope definition through actionable feedback. Each step has clear substeps and the process includes synthesis and contextual adjustment phases. | 3 / 3 |
Progressive Disclosure | The skill appropriately references external files (references/evaluation_framework.md, scripts/calculate_scores.py) for detailed criteria and tooling, keeping the main skill as an overview. Navigation is clear with well-signaled references. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.