skill-judge

Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.

1.61x

Quality

72%

Does it follow best practices?

Impact

100%

1.61x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./skills/skill-judge/SKILL.md

Evaluation results

100%

48%

Skill Quality Audit for Publishing Review

Description quality and knowledge delta evaluation

Criteria

Without context

With context

Description WHAT/WHEN/KEYWORDS

50%

100%

D4 score ≤ 10

100%

Description trigger gap

100%

Tutorial section flagged Redundant

50%

100%

Basic code example flagged Redundant

50%

100%

Failure pattern identified

50%

100%

Knowledge ratio reported

100%

Score out of 120

100%

Grade assigned

100%

Critical Issues section

50%

100%

Top improvements listed

100%

Description improvement recommended

100%

39%

Logo Design Skill Review for Design Agency

Freedom calibration and pattern recognition evaluation

Criteria

Without context

With context

Creative task identified

100%

Freedom Mismatch flagged

80%

100%

D6 score ≤ 10

100%

High freedom for creative tasks

90%

100%

Mindset pattern recommended

50%

100%

D7 reflects pattern deviation

100%

Checkbox Procedure flagged

75%

100%

Thinking frameworks recommended

62%

100%

Consequence test applied

12%

100%

No inflated score for completeness

100%

Dimension scores table

66%

100%

Top improvements listed

100%

27%

ML Deployment Skill Critique for Platform Engineering Team

Anti-pattern quality and mindset evaluation

Criteria

Without context

With context

D3 score ≤ 7

70%

100%

Vague Warning pattern flagged

100%

Generic warnings cited as evidence

100%

Specific NEVER list recommended

50%

100%

D2 reflects generic procedures

62%

100%

Checkbox Procedure flagged

75%

100%

Generic vs domain-specific distinguished

100%

D8 reflects missing decision trees

37%

100%

Detailed Analysis present

100%

Knowledge ratio reported

100%

Overall score reflects poor quality

100%

Critical Issue on anti-patterns

83%

100%

Repository: softaworks/agent-toolkit
Commit: 3027f20

Evaluated: 4 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Skill Quality Audit for Publishing Review Logo Design Skill Review for Design Agency ML Deployment Skill Critique for Platform Engineering Team

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.