CtrlK
BlogDocsLog inGet started
Tessl Logo

pantheon-ai/skill-quality-auditor

Audit and improve skill collections with an 8-dimension scoring framework, duplication detection, remediation planning, and CI quality gates; use when evaluating skill quality, generating remediation plans, validating report format, or enforcing repository-wide skill artifact conventions.

Does it follow best practices?

Evaluation93%

1.33x

Agent success when using this tile

Validation for skill structure

Overview
Skills
Evals
Files

framework-scoring-rubric.mdreferences/

category:
framework
priority:
CRITICAL
source:
skill-judge evaluation methodology

Skill Scoring Rubric

Detailed scoring methodology for the skill-judge framework. Use this to understand how scores are calculated and ensure consistent evaluation.

Scoring Overview

Total Possible Score: 120 points
Passing Grade: 90 points (75%)
A-Grade Target: 108 points (90%)
Perfect Score: 120 points (100%)

Dimension-by-Dimension Scoring

D1: Knowledge Delta (20 points)

ScoreCriteriaRedundancy Level
18-20Pure expert knowledge<5%
15-17Mostly expert5-15%
12-14Acceptable balance15-30%
9-11Needs improvement30-50%
0-8Failing>50%

Evaluation Method:

  1. Read entire skill content
  2. Identify content AI assistants already know
  3. Calculate: Expert Content / Total Content
  4. Apply scoring threshold

D2: Mindset + Procedures (15 points)

ScoreCriteria
13-15Clear mindset + detailed procedures + when/when-not
10-12Has most elements, minor gaps
7-9Missing key element
0-6Generic or absent

Component Breakdown:

  • Clear Mindset/Philosophy: 5 points
  • Step-by-Step Procedures: 5 points
  • When/When-Not Guidance: 5 points

D3: Anti-Pattern Quality (15 points)

ScoreCriteria
13-15NEVER lists + concrete examples + consequences
10-12Has most elements
7-9Generic warnings
0-6Missing or weak

Component Breakdown:

  • NEVER Lists with WHY: 5 points
  • Concrete Examples: 5 points
  • Consequences Explained: 5 points

D4: Specification Compliance (15 points)

ScoreCriteria
13-15Perfect spec compliance
10-12Minor issues
7-9Missing key elements
0-6Non-compliant

Component Breakdown:

  • Description Field Quality: 10 points (most critical)
  • Proper Frontmatter: 3 points
  • Activation Keywords: 2 points

D5: Progressive Disclosure (15 points)

ScoreCriteria
13-15Navigation hub + references/ + categories
10-12Some organization, could improve
7-9Everything frontloaded, >300 lines
0-6No structure, >500 lines

Component Breakdown:

  • Navigation Hub Approach: 8 points
  • References Directory: 4 points
  • Category Organization: 3 points

D6: Freedom Calibration (15 points)

ScoreCriteria
13-15Appropriate for skill type
10-12Slightly too rigid or loose
7-9Mismatched calibration
0-6Completely wrong

Calibration Types:

  • Rigid (Mindset skills): Strong rules, must follow
  • Balanced (Process skills): Clear steps with flexibility
  • Flexible (Tool skills): Options and trade-offs

D7: Pattern Recognition (10 points)

ScoreCriteria
9-10Rich keywords, comprehensive triggers
7-8Good keywords, could expand
5-6Basic keywords
0-4Missing or poor

Evaluation Method:

  • Count domain keywords in description
  • Check trigger scenarios present
  • Verify activation clarity

D8: Practical Usability (15 points)

ScoreCriteria
13-15Concrete + runnable + clear
10-12Most examples good
7-9Some weak examples
0-6Abstract or missing

Component Breakdown:

  • Concrete Examples: 5 points
  • Runnable Code: 5 points
  • Clear Structure: 5 points

Grade Assignment

GradeScore RangeInterpretation
A+114-120Exceptional quality
A108-113Meets all standards
B+102-107Strong, minor improvements
B96-101Good, some gaps
C+90-95Acceptable, needs work
C84-89Below standard
D78-83Significant issues
F0-77Failing

Scoring Process

Step 1: Read and Understand

Read the entire skill, including all references if present.

Step 2: Score Each Dimension

Apply rubric to each of 8 dimensions independently.

Step 3: Calculate Total

Sum all 8 dimension scores for total out of 120.

Step 4: Assign Grade

Map total score to grade using grade assignment table.

Step 5: Identify Improvements

For scores below A-grade, identify specific improvements needed.

Common Score Patterns

High Knowledge Delta, Low Usability (18, 10): Expert content but lacks examples
Low Knowledge Delta, High Usability (10, 14): Tutorial-heavy, needs expert focus
Perfect Spec, Poor Content (15, 8): Great frontmatter, weak body
Balanced Scores (12-13 each): Consistent but not exceptional

See Also

  • framework-skill-judge-dimensions.md - Dimension definitions
  • framework-quality-standards.md - A-grade requirements

Install with Tessl CLI

npx tessl i pantheon-ai/skill-quality-auditor@0.1.4

SKILL.md

tile.json