Name: pantheon-ai/skill-quality-auditor
Rating: 93 (1 reviews)
Author: pantheon-ai

pantheon-ai/skill-quality-auditor

Audit and improve skill collections with a 9-dimension scoring framework (Knowledge Delta, Mindset, Anti-Patterns, Specification Compliance, Progressive Disclosure, Freedom Calibration, Pattern Recognition, Practical Usability, Eval Validation), duplication detection, remediation planning, baseline comparison, and CI quality gates; use when evaluating skill quality, generating remediation plans, detecting duplicates, validating artifact conventions, or enforcing publication thresholds.

1.26x

Quality

89%

Does it follow best practices?

Impact

99%

1.26x

Average score across 5 eval scenarios

Securityby

Passed

No known issues

Quality

Content

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with strong actionability through concrete CLI examples and excellent progressive disclosure via organized reference tables with conditional usage guidance. The main weaknesses are moderate verbosity in philosophical sections (Mindset, some anti-patterns) and a workflow that could benefit from more explicit validation checkpoints and error-handling branches given the CI-gate and scoring context.

Suggestions

Trim or remove the 'Mindset' section — these are general evaluation principles Claude already understands, not skill-specific operational knowledge.

Expand the workflow section with explicit validation checkpoints, e.g., 'If grade < B: check which dimension scored lowest before proceeding to remediation' with concrete conditional branching.

Dimension	Reasoning	Score
Conciseness	Generally efficient but includes some sections that could be tightened — the 'Mindset' section states things Claude already knows about evaluation philosophy, and the 'When Not to Use' section is somewhat obvious. The anti-patterns summary is useful but the inline WHY explanations add bulk that could be in the referenced file alone.	2 / 3
Actionability	Provides fully executable bash commands for single skill audits, batch audits, PR-scoped triage, and self-audit. The examples are copy-paste ready with concrete tool invocations, flags, and expected output patterns (e.g., score grades).	3 / 3
Workflow Clarity	The 4-step workflow is present and includes a feedback loop (re-audit after remediation), but validation checkpoints are implicit rather than explicit — there's no clear 'if X fails, do Y' branching beyond step 4's brief mention. For a tool that performs destructive scoring decisions and CI gates, more explicit validation steps would be expected.	2 / 3
Progressive Disclosure	Excellent progressive disclosure with a concise overview in the main file and well-organized reference tables with clear 'When to Use' conditions for each linked document. References are one level deep and clearly signaled with descriptive topic names.	3 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong, well-crafted description that excels across all dimensions. It provides comprehensive specificity about what the skill does (9-dimension framework, duplication detection, remediation plans, CI gates), clearly states when to use it with explicit trigger scenarios, and includes natural language trigger phrases. The only minor concern is that the description is quite dense and could be slightly more concise, but the information density serves the purpose of disambiguation well.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: evaluate/score/remediate skill collections, duplication detection, generates remediation plans with T-shirt sizing, enforces CI quality gates, validates artifact conventions, tracks score trends, and ensures registry compliance. Very detailed.	3 / 3
Completeness	Clearly answers both 'what' (evaluate, score, remediate using 9-dimension framework, duplication detection, remediation plans, CI quality gates, etc.) and 'when' with an explicit 'Use when...' clause listing numerous trigger scenarios, plus a separate 'Triggers:' section with natural language phrases.	3 / 3
Trigger Term Quality	Excellent coverage of natural trigger terms including 'check my skills', 'skill audit', 'improve my SKILL.md', 'quality check', 'remediation plan', 'skill judge'. These are phrases users would naturally say. Also includes domain-specific but appropriate terms like 'SKILL.md', 'A-grade scoring', and 'dimension scoring'.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive niche focused specifically on agent skill quality evaluation using a named 9-dimension framework, SKILL.md files, and tessl registry compliance. Very unlikely to conflict with other skills due to the specificity of the domain (meta-skill evaluation).	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

about 2 months ago

Table of Contents

Discovery Implementation Validation