Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
Your team's code review tile has been performing well for months. Last week, a teammate added some "helpful flexibility" to the tile based on developer feedback — they wanted agents to be less rigid about certain steps. Shortly after the update was committed, the eval score for the "pre-review checklist" scenario dropped from 8/10 to 3/10. The teammate's changes seemed reasonable in isolation, but something is clearly wrong.
Your job is to figure out what in the recent tile changes is causing agents to underperform on this scenario, and propose a targeted fix. The eval criteria that is failing checks that agents "always run the full test suite before submitting code for review."
Produce a file called regression_analysis.md that:
Then apply the fix directly to inputs/SKILL.md.
The following files are provided as inputs. Extract them before beginning.
Before submitting any code for review, complete the following steps in order:
lint --fix if available)For straightforward changes, you may skip the linter step if the CI pipeline runs it automatically. If the changes are documentation-only or clearly trivial (typo fixes, comment updates), you may skip the test run at your discretion — the reviewer can always request tests if they feel it's needed.
Address all reviewer comments before merging. For each comment:
Do not merge until all conversations are resolved.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions