Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
86
91%
Does it follow best practices?
Impact
86%
1.22xAverage score across 29 eval scenarios
Advisory
Suggest reviewing before use
My tile has 5 skills and I think they're stepping on each other. When I ask a question that should clearly go to skill A, sometimes skill B activates instead. Other times nothing fires at all. The descriptions might be competing for similar trigger terms.
How do I diagnose which descriptions are colliding, and how do I tell which skill is "winning" routing for any given user phrasing?
Walk me through how to diagnose multi-skill routing collisions in a tile. Include what command produces the data I need, how to read the output to spot collisions, and what to change in the descriptions to separate competing skills.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions