Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
86
91%
Does it follow best practices?
Impact
86%
1.22xAverage score across 29 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Activation-focused scenario: tests whether a multi-skill collision-diagnosis request activates a skill in this tile (expected: optimize-skill-performance Phase 0.4 or optimize-skill-instructions). Criteria below are placeholders for future content-eval use; activation eval ignores them.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Uses activation eval to surface collisions",
"description": "Response identifies activation eval as the tool for surfacing per-scenario routing and uses its output to identify competing skill descriptions.",
"max_score": 50
},
{
"name": "Proposes description disambiguation",
"description": "Response proposes concrete edits to separate trigger terms between competing skills (e.g., distinct verbs, distinct domain terms).",
"max_score": 50
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions