CtrlK
BlogDocsLog inGet started
Tessl Logo

skill-judge

Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.

81

1.61x
Quality

72%

Does it follow best practices?

Impact

100%

1.61x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/skill-judge/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured description with explicit 'Use when' guidance and good trigger term coverage. The main weakness is that the capabilities could be more specific about what concrete evaluation actions are performed (e.g., checking frontmatter, validating structure, assessing description quality). Overall, it effectively communicates its purpose and selection criteria.

Suggestions

Add more specific concrete actions like 'validates YAML frontmatter structure', 'checks description trigger terms', or 'verifies file organization' to improve specificity.

DimensionReasoningScore

Specificity

Names the domain (Agent Skill design) and some actions (reviewing, auditing, improving, scoring, suggestions), but lacks comprehensive detail on specific concrete actions like 'validates YAML frontmatter' or 'checks trigger term coverage'.

2 / 3

Completeness

Clearly answers both what ('Evaluate Agent Skill design quality', 'Provides multi-dimensional scoring and actionable improvement suggestions') and when ('Use when reviewing, auditing, or improving SKILL.md files and skill packages').

3 / 3

Trigger Term Quality

Includes natural keywords users would say: 'SKILL.md', 'skill packages', 'reviewing', 'auditing', 'improving', 'scoring', and 'improvement suggestions'. Good coverage of terms someone evaluating skills would use.

3 / 3

Distinctiveness Conflict Risk

Very specific niche targeting 'SKILL.md files' and 'skill packages' with distinct triggers like 'Agent Skill design quality' - unlikely to conflict with general code review or documentation skills.

3 / 3

Total

11

/

12

Passed

Implementation

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill demonstrates strong actionability and workflow clarity with concrete scoring rubrics and a clear evaluation protocol. However, it severely violates its own principles: it's extremely verbose (~600+ lines), explains concepts Claude already knows (what Skills are, paradigm shifts), and fails to implement the progressive disclosure it advocates for. The irony is palpable—a skill about token efficiency that wastes hundreds of tokens on meta-explanations.

Suggestions

Cut the 'Core Philosophy' section by 80%—Claude doesn't need explanations of what Skills are or the 'paradigm shift' from training to education. Keep only the 'Core Formula' and 'Three Types of Knowledge' as brief reminders.

Move detailed pattern examples, common failure patterns, and the extensive dimension explanations to separate reference files (e.g., patterns.md, failures.md), keeping only the scoring tables and quick reference in SKILL.md.

Delete redundant content like the ASCII art boxes, the 'Self-Evaluation Note' section, and repetitive explanations of the same concepts (knowledge delta is explained 4+ times).

Reduce SKILL.md to <200 lines by keeping only: frontmatter requirements, dimension scoring tables, evaluation protocol steps, and report template. The current length contradicts the skill's own '<500 lines ideal' guidance.

DimensionReasoningScore

Conciseness

Extremely verbose at ~600+ lines. Contains extensive explanations of concepts Claude already knows (what Skills are, paradigm shifts, basic evaluation principles). The 'Core Philosophy' section alone spends hundreds of tokens explaining meta-concepts rather than providing actionable evaluation criteria.

1 / 3

Actionability

Provides highly concrete, executable guidance: specific scoring rubrics with point values, detailed evaluation protocol with 5 clear steps, report template with exact format, and quick reference checklist. The dimension scoring criteria are specific and actionable.

3 / 3

Workflow Clarity

Excellent multi-step workflow with clear sequence: First Pass (Knowledge Delta Scan) → Structure Analysis → Score Each Dimension → Calculate Total & Grade → Generate Report. Each step has explicit checkboxes and validation criteria. The evaluation protocol is unambiguous.

3 / 3

Progressive Disclosure

Monolithic wall of text with no external references. All content is inline in a single massive file. Content that could be split (e.g., pattern examples, common failure patterns, detailed dimension explanations) is all dumped into SKILL.md. No layered structure despite teaching about progressive disclosure.

1 / 3

Total

8

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

skill_md_line_count

SKILL.md is long (753 lines); consider splitting into references/ and linking

Warning

Total

10

/

11

Passed

Repository
softaworks/agent-toolkit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.