Audit and improve skill collections with a 9-dimension scoring framework (Knowledge Delta, Mindset, Anti-Patterns, Specification Compliance, Progressive Disclosure, Freedom Calibration, Pattern Recognition, Practical Usability, Eval Validation), duplication detection, remediation planning, baseline comparison, and CI quality gates; use when evaluating skill quality, generating remediation plans, detecting duplicates, validating artifact conventions, or enforcing publication thresholds.
93
89%
Does it follow best practices?
Impact
99%
1.26xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent produces correctly structured anti-patterns with NEVER/WHY/BAD/GOOD format, covering all three original issues and adding at least one new one.",
"type": "weighted_checklist",
"checklist": [
{
"name": "NEVER statements used",
"description": "Each anti-pattern leads with 'NEVER: [action]' phrasing",
"max_score": 15
},
{
"name": "WHY: explanations present",
"description": "Each NEVER statement is followed by a 'WHY:' line explaining the consequence",
"max_score": 18
},
{
"name": "BAD code examples",
"description": "Each anti-pattern includes a fenced BAD code block showing the incorrect pattern",
"max_score": 18
},
{
"name": "GOOD code examples",
"description": "Each anti-pattern includes a fenced GOOD code block showing the correct alternative",
"max_score": 15
},
{
"name": "All 3 original issues covered",
"description": "Secrets hardcoding, parallel dependency issues, and floating action tags are all addressed",
"max_score": 14
},
{
"name": "At least 4 anti-patterns total",
"description": "Section contains 4 or more distinct NEVER entries (original 3 + at least 1 new)",
"max_score": 10
},
{
"name": "Score impact explained",
"description": "before-after-diff.md explains which D3 signals the new content satisfies and estimates score delta",
"max_score": 10
}
]
}assets
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
references
scripts