Audit and improve skill collections with a 9-dimension scoring framework (Knowledge Delta, Mindset, Anti-Patterns, Specification Compliance, Progressive Disclosure, Freedom Calibration, Pattern Recognition, Practical Usability, Eval Validation), duplication detection, remediation planning, baseline comparison, and CI quality gates; use when evaluating skill quality, generating remediation plans, detecting duplicates, validating artifact conventions, or enforcing publication thresholds.
93
89%
Does it follow best practices?
Impact
99%
1.26xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent applies similarity thresholds correctly, performs pairwise comparison across all relevant pairs, and produces actionable consolidation recommendations using the Navigation Hub pattern.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Pairwise comparison performed",
"description": "similarity-analysis.md analyses all relevant pairs, not just a subset",
"max_score": 15
},
{
"name": "Similarity percentages calculated",
"description": "Each pair has a numeric similarity percentage (not just 'high/low')",
"max_score": 18
},
{
"name": "20% threshold applied",
"description": "Pairs above 20% similarity are flagged as aggregation candidates",
"max_score": 12
},
{
"name": "35% threshold applied",
"description": "Pairs above 35% are flagged as requiring immediate action",
"max_score": 12
},
{
"name": "NEVER wrong-domain aggregation",
"description": "Does not recommend merging skills from different domains (e.g. terraform + github-actions)",
"max_score": 15
},
{
"name": "Navigation Hub pattern referenced",
"description": "Consolidation recommendations mention the Navigation Hub pattern for aggregation",
"max_score": 10
},
{
"name": "duplication-report.json valid",
"description": "JSON output contains pairs array with skill_a, skill_b, similarity_pct, action fields",
"max_score": 10
},
{
"name": "Specific keep-separate justification",
"description": "For pairs recommended to stay separate, provides a reason (different purpose, domain fit, etc.)",
"max_score": 8
}
]
}assets
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
references
scripts