Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
91
91%
Does it follow best practices?
Impact
92%
1.10xAverage score across 25 eval scenarios
Passed
No known issues
The shopify-connector tile already has three evaluation scenarios that the team has been using for several months. These scenarios cover order processing, webhook handling, and authentication flows. The team recently ran a new scenario generation job (run ID: scen-gen-7742) that produced two additional scenarios covering edge cases around rate limiting and bulk import operations. They want to add these new scenarios alongside the existing ones.
A previous intern accidentally ran the download command without specifying any strategy flag and overwrote all the existing scenarios. The team wants to make sure this doesn't happen again — they need a download command that explicitly adds new scenarios without touching the existing three.
After downloading, they want to see a verified list of what's in the evals folder to confirm both old and new scenarios are present.
The tile is located at ./shopify-connector/. The existing scenarios are:
evals/order-processing/evals/webhook-handling/evals/auth-flow/Produce a shell script called download-scenarios.sh that:
scen-gen-7742 into the tile's evals directory without removing existing scenariosThe script should be safe to re-run if something goes wrong mid-download.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions