Simulates NIH study section peer review for grant proposals. Triggers.
37
22%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/Academic Writing/grant-mock-reviewer/SKILL.mdQuality
Discovery
22%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a specific niche (NIH peer review simulation) but is severely underdeveloped. It lacks concrete actions, explicit trigger conditions, and appears to have a truncated or broken 'Triggers' section. The domain specificity provides some distinctiveness, but the description fails to communicate what the skill actually does or when it should be selected.
Suggestions
Add specific concrete actions such as 'Scores proposals on NIH criteria (significance, investigator, innovation, approach, environment), generates critique summaries, and produces preliminary impact scores'.
Replace the truncated 'Triggers.' with a complete 'Use when...' clause, e.g., 'Use when the user asks for grant review, NIH critique, study section feedback, R01 review, or mock peer review of a research proposal'.
Include common user-facing trigger terms like 'grant critique', 'R01', 'specific aims page', 'impact score', 'study section', and 'grant feedback' to improve keyword coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain ('NIH study section peer review for grant proposals') but does not list any concrete actions like scoring, critiquing specific aims, evaluating significance, or generating summary statements. 'Simulates' is a single vague verb. | 1 / 3 |
Completeness | The 'what' is only vaguely stated ('simulates peer review') without specifics, and the 'when' is entirely absent. The word 'Triggers.' appears to be a truncated or incomplete fragment rather than an explicit 'Use when...' clause, which should cap completeness at 2 at best, but since even that fragment provides no actual trigger guidance, this scores a 1. | 1 / 3 |
Trigger Term Quality | Includes some relevant keywords like 'NIH', 'study section', 'peer review', and 'grant proposals' that users might naturally say. However, it misses common variations like 'R01', 'specific aims', 'grant critique', 'significance score', or 'impact score'. | 2 / 3 |
Distinctiveness Conflict Risk | The NIH study section context is fairly niche and unlikely to conflict with most other skills. However, it could overlap with general 'grant writing' or 'academic review' skills since it doesn't clearly delineate its specific scope or outputs. | 2 / 3 |
Total | 6 / 12 Passed |
Implementation
22%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is heavily padded with generic boilerplate that appears auto-generated and not tailored to the specific task of NIH grant review. While it contains some genuinely useful domain content (NIH scoring rubric, common weaknesses, review output format), this is buried in repetitive, self-referential sections. The workflow is entirely generic rather than describing how to actually conduct a mock peer review, and the skill would benefit enormously from removing boilerplate and focusing on the actual review methodology.
Suggestions
Remove all generic boilerplate sections (Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, Response Template, Input Validation, Error Handling) that don't contain NIH-review-specific content—these waste tokens on things Claude already knows how to do.
Replace the generic 'Workflow' section with a concrete, NIH-review-specific workflow: e.g., 1) Read proposal and identify aims, 2) Score each criterion using the rubric, 3) Identify strengths/weaknesses per criterion, 4) Generate summary statement, 5) Validate score consistency across criteria.
Move the Common Weaknesses catalog and detailed NIH Scoring System tables to reference files (they're already listed in References) and keep only a concise summary inline.
Fix the circular/dead references ('See ## Prerequisites above', 'See ## Usage above', 'See ## Workflow above') which point to sections that appear after the reference, creating confusion.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and bloated. Contains massive amounts of boilerplate (Risk Assessment, Security Checklist, Lifecycle Status, Evaluation Criteria, Response Template) that add no value for Claude. Sections like 'When to Use' repeat the description verbatim, and there are self-referential dead links ('See ## Prerequisites above'). The NIH scoring tables and common weaknesses catalog are useful domain content but could be in reference files rather than inline. | 1 / 3 |
Actionability | Provides CLI commands and a Python library usage example with specific parameters, which is good. However, the commands reference a script (scripts/main.py) whose actual implementation and behavior are unclear—it's not clear if these commands actually work or are aspirational. The 'Common Weaknesses Detected' section provides concrete review criteria, but the core review process itself lacks executable specifics on how Claude should actually perform the review. | 2 / 3 |
Workflow Clarity | The 'Workflow' section is entirely generic boilerplate (confirm objective, validate request, use packaged script, return structured result) with no NIH-review-specific steps. There are no validation checkpoints specific to the review process. The 'Example run plan' is also generic. For a skill involving structured critique generation, there should be clear steps for how to apply scoring criteria, generate critiques, and validate consistency—none of which are present. | 1 / 3 |
Progressive Disclosure | References to files in references/ directory are well-signaled and one level deep, which is good. However, the main SKILL.md is a monolithic wall of text with enormous amounts of content that should be in separate files (the full NIH scoring system, common weaknesses catalog, security checklist, etc.). The inline content is poorly organized with redundant sections and circular references ('See ## Usage above'). | 2 / 3 |
Total | 6 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
e1f6461
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.