CtrlK
BlogDocsLog inGet started
Tessl Logo

skill-judge

Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.

81

1.61x
Quality

72%

Does it follow best practices?

Impact

100%

1.61x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/skill-judge/SKILL.md
SKILL.md
Quality
Evals
Security

Evaluation results

100%

48%

Skill Quality Audit for Publishing Review

Description quality and knowledge delta evaluation

Criteria
Without context
With context

Description WHAT/WHEN/KEYWORDS

50%

100%

D4 score ≤ 10

0%

100%

Description trigger gap

100%

100%

Tutorial section flagged Redundant

50%

100%

Basic code example flagged Redundant

50%

100%

Failure pattern identified

50%

100%

Knowledge ratio reported

0%

100%

Score out of 120

0%

100%

Grade assigned

100%

100%

Critical Issues section

50%

100%

Top improvements listed

100%

100%

Description improvement recommended

100%

100%

100%

39%

Logo Design Skill Review for Design Agency

Freedom calibration and pattern recognition evaluation

Criteria
Without context
With context

Creative task identified

100%

100%

Freedom Mismatch flagged

80%

100%

D6 score ≤ 10

0%

100%

High freedom for creative tasks

90%

100%

Mindset pattern recommended

50%

100%

D7 reflects pattern deviation

0%

100%

Checkbox Procedure flagged

75%

100%

Thinking frameworks recommended

62%

100%

Consequence test applied

12%

100%

No inflated score for completeness

100%

100%

Dimension scores table

66%

100%

Top improvements listed

100%

100%

100%

27%

ML Deployment Skill Critique for Platform Engineering Team

Anti-pattern quality and mindset evaluation

Criteria
Without context
With context

D3 score ≤ 7

70%

100%

Vague Warning pattern flagged

100%

100%

Generic warnings cited as evidence

100%

100%

Specific NEVER list recommended

50%

100%

D2 reflects generic procedures

62%

100%

Checkbox Procedure flagged

75%

100%

Generic vs domain-specific distinguished

100%

100%

D8 reflects missing decision trees

37%

100%

Detailed Analysis present

100%

100%

Knowledge ratio reported

0%

100%

Overall score reflects poor quality

100%

100%

Critical Issue on anti-patterns

83%

100%

Repository
softaworks/agent-toolkit
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.