Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
A developer has spent an afternoon improving their cache-manager skill file based on earlier recommendations. They now have two review outputs — one from before the changes and one from after — and want to document the improvement for their team. This report will be shared in a pull request description to explain why the skill changes are worth merging.
Given the two review outputs below, produce a clear score comparison report that shows what improved, by how much, and explains the significance of each dimension's change. The report should be suitable for a developer audience reviewing the skill changes.
Produce a file score_report.md containing:
The following files are provided as inputs. Extract them before beginning.
=============== FILE: before_review.txt =============== === Skill Review: cache-manager === Overall Score: 58%
Dimension Scores: Completeness: 1/3 (33%) - Description missing "Use when" trigger clause Actionability: 2/3 (66%) - Has some code examples, but they show pseudocode not real commands Conciseness: 2/3 (66%) - Moderate; some known concepts explained Robustness: 2/3 (66%) - Missing eviction strategy examples
Validation Issues: [WARNING] Description could be more specific about trigger conditions [WARNING] Code examples use placeholder values that may confuse =============== END FILE ===============
=============== FILE: after_review.txt =============== === Skill Review: cache-manager === Overall Score: 89%
Dimension Scores: Completeness: 3/3 (100%) - Clear "Use when" clause with specific trigger conditions Actionability: 3/3 (100%) - Real executable commands with actual flag values Conciseness: 2/3 (66%) - Still some minor verbosity Robustness: 3/3 (100%) - Added eviction strategy and TTL configuration examples
Validation Issues: (none) =============== END FILE ===============
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions