Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.
81
72%
Does it follow best practices?
Impact
100%
1.61xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/skill-judge/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-structured description with explicit 'Use when' guidance and good trigger term coverage. The main weakness is that the capabilities could be more specific about what concrete evaluation actions are performed (e.g., checking frontmatter, validating structure, assessing description quality). Overall, it effectively communicates its purpose and selection criteria.
Suggestions
Add more specific concrete actions like 'validates YAML frontmatter structure', 'checks description trigger terms', or 'verifies file organization' to improve specificity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Agent Skill design) and some actions (reviewing, auditing, improving, scoring, suggestions), but lacks comprehensive detail on specific concrete actions like 'validates YAML frontmatter' or 'checks trigger term coverage'. | 2 / 3 |
Completeness | Clearly answers both what ('Evaluate Agent Skill design quality', 'Provides multi-dimensional scoring and actionable improvement suggestions') and when ('Use when reviewing, auditing, or improving SKILL.md files and skill packages'). | 3 / 3 |
Trigger Term Quality | Includes natural keywords users would say: 'SKILL.md', 'skill packages', 'reviewing', 'auditing', 'improving', 'scoring', and 'improvement suggestions'. Good coverage of terms someone evaluating skills would use. | 3 / 3 |
Distinctiveness Conflict Risk | Very specific niche targeting 'SKILL.md files' and 'skill packages' with distinct triggers like 'Agent Skill design quality' - unlikely to conflict with general code review or documentation skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill demonstrates strong actionability and workflow clarity with concrete scoring rubrics and a clear evaluation protocol. However, it severely violates its own principles: it's extremely verbose (~600+ lines), explains concepts Claude already knows (what Skills are, paradigm shifts), and fails to implement the progressive disclosure it advocates for. The irony is palpable—a skill about token efficiency that wastes hundreds of tokens on meta-explanations.
Suggestions
Cut the 'Core Philosophy' section by 80%—Claude doesn't need explanations of what Skills are or the 'paradigm shift' from training to education. Keep only the 'Core Formula' and 'Three Types of Knowledge' as brief reminders.
Move detailed pattern examples, common failure patterns, and the extensive dimension explanations to separate reference files (e.g., patterns.md, failures.md), keeping only the scoring tables and quick reference in SKILL.md.
Delete redundant content like the ASCII art boxes, the 'Self-Evaluation Note' section, and repetitive explanations of the same concepts (knowledge delta is explained 4+ times).
Reduce SKILL.md to <200 lines by keeping only: frontmatter requirements, dimension scoring tables, evaluation protocol steps, and report template. The current length contradicts the skill's own '<500 lines ideal' guidance.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~600+ lines. Contains extensive explanations of concepts Claude already knows (what Skills are, paradigm shifts, basic evaluation principles). The 'Core Philosophy' section alone spends hundreds of tokens explaining meta-concepts rather than providing actionable evaluation criteria. | 1 / 3 |
Actionability | Provides highly concrete, executable guidance: specific scoring rubrics with point values, detailed evaluation protocol with 5 clear steps, report template with exact format, and quick reference checklist. The dimension scoring criteria are specific and actionable. | 3 / 3 |
Workflow Clarity | Excellent multi-step workflow with clear sequence: First Pass (Knowledge Delta Scan) → Structure Analysis → Score Each Dimension → Calculate Total & Grade → Generate Report. Each step has explicit checkboxes and validation criteria. The evaluation protocol is unambiguous. | 3 / 3 |
Progressive Disclosure | Monolithic wall of text with no external references. All content is inline in a single massive file. Content that could be split (e.g., pattern examples, common failure patterns, detailed dimension explanations) is all dumped into SKILL.md. No layered structure despite teaching about progressive disclosure. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (753 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
3027f20
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.