skill-judge

Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.

1.61x

Quality

72%

Does it follow best practices?

Impact

100%

1.61x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./skills/skill-judge/SKILL.md

Quality

Content

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill demonstrates strong actionability and workflow clarity with concrete scoring rubrics and a clear evaluation protocol. However, it severely violates its own principles: it's extremely verbose (~600+ lines), explains concepts Claude already knows (what Skills are, paradigm shifts), and fails to implement the progressive disclosure it advocates for. The irony is palpable—a skill about token efficiency that wastes hundreds of tokens on meta-explanations.

Suggestions

Cut the 'Core Philosophy' section by 80%—Claude doesn't need explanations of what Skills are or the 'paradigm shift' from training to education. Keep only the 'Core Formula' and 'Three Types of Knowledge' as brief reminders.

Move detailed pattern examples, common failure patterns, and the extensive dimension explanations to separate reference files (e.g., patterns.md, failures.md), keeping only the scoring tables and quick reference in SKILL.md.

Delete redundant content like the ASCII art boxes, the 'Self-Evaluation Note' section, and repetitive explanations of the same concepts (knowledge delta is explained 4+ times).

Reduce SKILL.md to <200 lines by keeping only: frontmatter requirements, dimension scoring tables, evaluation protocol steps, and report template. The current length contradicts the skill's own '<500 lines ideal' guidance.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~600+ lines. Contains extensive explanations of concepts Claude already knows (what Skills are, paradigm shifts, basic evaluation principles). The 'Core Philosophy' section alone spends hundreds of tokens explaining meta-concepts rather than providing actionable evaluation criteria.	1 / 3
Actionability	Provides highly concrete, executable guidance: specific scoring rubrics with point values, detailed evaluation protocol with 5 clear steps, report template with exact format, and quick reference checklist. The dimension scoring criteria are specific and actionable.	3 / 3
Workflow Clarity	Excellent multi-step workflow with clear sequence: First Pass (Knowledge Delta Scan) → Structure Analysis → Score Each Dimension → Calculate Total & Grade → Generate Report. Each step has explicit checkboxes and validation criteria. The evaluation protocol is unambiguous.	3 / 3
Progressive Disclosure	Monolithic wall of text with no external references. All content is inline in a single massive file. Content that could be split (e.g., pattern examples, common failure patterns, detailed dimension explanations) is all dumped into SKILL.md. No layered structure despite teaching about progressive disclosure.	1 / 3
	Total	8 / 12 Passed

Description

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured description with explicit 'Use when' guidance and good trigger term coverage. The main weakness is that the capabilities could be more specific about what concrete evaluation actions are performed (e.g., checking frontmatter, validating structure, assessing description quality). Overall, it effectively communicates its purpose and selection criteria.

Suggestions

Add more specific concrete actions like 'validates YAML frontmatter structure', 'checks description trigger terms', or 'verifies file organization' to improve specificity.

Dimension	Reasoning	Score
Specificity	Names the domain (Agent Skill design) and some actions (reviewing, auditing, improving, scoring, suggestions), but lacks comprehensive detail on specific concrete actions like 'validates YAML frontmatter' or 'checks trigger term coverage'.	2 / 3
Completeness	Clearly answers both what ('Evaluate Agent Skill design quality', 'Provides multi-dimensional scoring and actionable improvement suggestions') and when ('Use when reviewing, auditing, or improving SKILL.md files and skill packages').	3 / 3
Trigger Term Quality	Includes natural keywords users would say: 'SKILL.md', 'skill packages', 'reviewing', 'auditing', 'improving', 'scoring', and 'improvement suggestions'. Good coverage of terms someone evaluating skills would use.	3 / 3
Distinctiveness Conflict Risk	Very specific niche targeting 'SKILL.md files' and 'skill packages' with distinct triggers like 'Agent Skill design quality' - unlikely to conflict with general code review or documentation skills.	3 / 3
	Total	11 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (753 lines); consider splitting into references/ and linking	Warning

	Total	10 / 11 Passed

Repository: softaworks/agent-toolkit
Commit: 3027f20

Reviewed: 4 months ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.