Audit and improve skill collections with a 9-dimension scoring framework (Knowledge Delta, Mindset, Anti-Patterns, Specification Compliance, Progressive Disclosure, Freedom Calibration, Pattern Recognition, Practical Usability, Eval Validation), duplication detection, remediation planning, baseline comparison, and CI quality gates; use when evaluating skill quality, generating remediation plans, detecting duplicates, validating artifact conventions, or enforcing publication thresholds.
93
89%
Does it follow best practices?
Impact
99%
1.26xAverage score across 5 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong, well-crafted description that excels across all dimensions. It provides comprehensive specificity about what the skill does (9-dimension framework, duplication detection, remediation plans, CI gates), clearly states when to use it with explicit trigger scenarios, and includes natural language trigger phrases. The only minor concern is that the description is quite dense and could be slightly more concise, but the information density serves the purpose of disambiguation well.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: evaluate/score/remediate skill collections, duplication detection, generates remediation plans with T-shirt sizing, enforces CI quality gates, validates artifact conventions, tracks score trends, and ensures registry compliance. Very detailed. | 3 / 3 |
Completeness | Clearly answers both 'what' (evaluate, score, remediate using 9-dimension framework, duplication detection, remediation plans, CI quality gates, etc.) and 'when' with an explicit 'Use when...' clause listing numerous trigger scenarios, plus a separate 'Triggers:' section with natural language phrases. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms including 'check my skills', 'skill audit', 'improve my SKILL.md', 'quality check', 'remediation plan', 'skill judge'. These are phrases users would naturally say. Also includes domain-specific but appropriate terms like 'SKILL.md', 'A-grade scoring', and 'dimension scoring'. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche focused specifically on agent skill quality evaluation using a named 9-dimension framework, SKILL.md files, and tessl registry compliance. Very unlikely to conflict with other skills due to the specificity of the domain (meta-skill evaluation). | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
72%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with strong actionability through concrete CLI examples and excellent progressive disclosure via organized reference tables with conditional usage guidance. The main weaknesses are moderate verbosity in philosophical sections (Mindset, some anti-patterns) and a workflow that could benefit from more explicit validation checkpoints and error-handling branches given the CI-gate and scoring context.
Suggestions
Trim or remove the 'Mindset' section — these are general evaluation principles Claude already understands, not skill-specific operational knowledge.
Expand the workflow section with explicit validation checkpoints, e.g., 'If grade < B: check which dimension scored lowest before proceeding to remediation' with concrete conditional branching.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient but includes some sections that could be tightened — the 'Mindset' section states things Claude already knows about evaluation philosophy, and the 'When Not to Use' section is somewhat obvious. The anti-patterns summary is useful but the inline WHY explanations add bulk that could be in the referenced file alone. | 2 / 3 |
Actionability | Provides fully executable bash commands for single skill audits, batch audits, PR-scoped triage, and self-audit. The examples are copy-paste ready with concrete tool invocations, flags, and expected output patterns (e.g., score grades). | 3 / 3 |
Workflow Clarity | The 4-step workflow is present and includes a feedback loop (re-audit after remediation), but validation checkpoints are implicit rather than explicit — there's no clear 'if X fails, do Y' branching beyond step 4's brief mention. For a tool that performs destructive scoring decisions and CI gates, more explicit validation steps would be expected. | 2 / 3 |
Progressive Disclosure | Excellent progressive disclosure with a concise overview in the main file and well-organized reference tables with clear 'When to Use' conditions for each linked document. References are one level deep and clearly signaled with descriptive topic names. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents