Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
86
91%
Does it follow best practices?
Impact
86%
1.22xAverage score across 29 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Activation-focused scenario: tests whether a post-edit routing verification request activates a skill in this tile (expected: setup-skill-performance Phase 4a or optimize-skill-instructions). Criteria below are placeholders for future content-eval use; activation eval ignores them.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Points to activation eval as the fast check",
"description": "Response identifies activation eval (--solver=activation) as the fast way to verify routing after description edits, rather than rerunning the slower scored eval.",
"max_score": 50
},
{
"name": "Suggests before/after comparison",
"description": "Response suggests comparing routing results before and after the edit (or compares against a prior activation run) to isolate what changed.",
"max_score": 50
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions