Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
86
91%
Does it follow best practices?
Impact
86%
1.22xAverage score across 29 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Activation-focused scenario: tests whether a routing-diagnosis user request activates a skill in this tile (expected: optimize-skill-performance Phase 0.4 or optimize-skill-instructions). Criteria below are placeholders for future content-eval use; activation eval ignores them.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Distinguishes routing gap from out-of-scope",
"description": "Response separates the `—` rows into routing gaps (skill should have fired) versus tasks legitimately out of scope for the tile.",
"max_score": 50
},
{
"name": "Addresses never-fired skills",
"description": "Response identifies which skills never fired across any scenario and proposes a concrete next action (rewrite description, add scenario, accept).",
"max_score": 50
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions