Audit and improve skill collections with a 9-dimension scoring framework (Knowledge Delta, Mindset, Anti-Patterns, Specification Compliance, Progressive Disclosure, Freedom Calibration, Pattern Recognition, Practical Usability, Eval Validation), duplication detection, remediation planning, baseline comparison, and CI quality gates; use when evaluating skill quality, generating remediation plans, detecting duplicates, validating artifact conventions, or enforcing publication thresholds.
93
89%
Does it follow best practices?
Impact
99%
1.26xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent uses skill-auditor batch for collection-wide audits, stores results in the correct directory structure, compares against baselines, and produces a trend-aware report.",
"type": "weighted_checklist",
"checklist": [
{
"name": "skill-auditor batch used",
"description": "Uses `skill-auditor batch` (not individual evaluate calls per skill) to audit the collection",
"max_score": 15
},
{
"name": "--store flag used",
"description": "Passes --store so results are written to .context/audits/<skill>/YYYY-MM-DD/",
"max_score": 10
},
{
"name": "--json flag used",
"description": "Passes --json to capture structured output",
"max_score": 8
},
{
"name": "Baseline comparison performed",
"description": "Reads previous audit.json files from .context/audits/ and computes score deltas",
"max_score": 20
},
{
"name": "Grade thresholds applied",
"description": "Uses A>=126, B+>=119, B>=112, C/C+<112 thresholds in the report",
"max_score": 12
},
{
"name": "New skills handled",
"description": "Notes that 4 skills have no baseline and marks them as 'new — no delta available'",
"max_score": 10
},
{
"name": "Trend analysis present",
"description": "baseline-comparison.md identifies at least improvements, regressions, and new skills categories",
"max_score": 15
},
{
"name": "Reproducible commands documented",
"description": "audit-execution.sh contains exact commands that can be re-run to reproduce the audit",
"max_score": 10
}
]
}assets
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
references
scripts