Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
91
91%
Does it follow best practices?
Impact
92%
1.10xAverage score across 25 eval scenarios
Passed
No known issues
A developer at Ironwood Systems has been iteratively improving the database-migrator tile based on eval feedback. Over the last few hours, they've made three separate edits to the tile's SKILL.md: adding a detailed rollback procedure with step-by-step instructions, expanding the error handling section with a comprehensive list of failure modes and recovery steps, and adding a rules section containing 40 lines of migration pre-checks.
The developer is concerned that these additions, while accurate, may have significantly increased the number of tokens loaded into every agent context that uses this tile — even for simple tasks that don't need the rollback details or the full pre-check list. They want a script that validates the tile after each of these three change sets and flags whether the token footprint has grown to a point where restructuring is needed.
The tile is at ./database-migrator/.
Produce two files:
validate-tile.sh — a shell script that runs tile validation after each of the three change sets. The script should:
validation-notes.md — a short explanation of:
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions