Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent recommends and applies progressive disclosure principles: identifying content from SKILL.md that duplicates detail already in REFERENCE.md and recommending linking instead of inlining. The revised SKILL.md should be significantly shorter by leveraging the existing reference file.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Linking over inlining",
"description": "Recommendations explicitly state that content should link to REFERENCE.md rather than being inlined in SKILL.md",
"max_score": 15
},
{
"name": "Reference file identified",
"description": "Recommendations specifically identify REFERENCE.md (by name) as the file to link to for detailed content",
"max_score": 10
},
{
"name": "Severity mappings removed",
"description": "Revised SKILL.md removes inline severity color/urgency mapping tables (they exist in REFERENCE.md) and links there instead",
"max_score": 10
},
{
"name": "Flag tables removed",
"description": "Revised SKILL.md removes detailed flag reference tables (these are in REFERENCE.md) — tables like the Slack/Email/PagerDuty flag lists",
"max_score": 10
},
{
"name": "Template list removed",
"description": "Revised SKILL.md removes the inline list of available email templates (already in REFERENCE.md) and links there instead",
"max_score": 8
},
{
"name": "SKILL.md substantially shorter",
"description": "Output SKILL.md is at least 40% shorter in line count than the input SKILL.md",
"max_score": 12
},
{
"name": "Core examples preserved",
"description": "Output SKILL.md retains the main quickstart command examples for each channel (the bash code blocks showing basic usage)",
"max_score": 10
},
{
"name": "Before/after shown",
"description": "recommendations.md includes before/after text for at least two of the proposed changes",
"max_score": 10
},
{
"name": "WHY explained",
"description": "recommendations.md explains why progressive disclosure (linking vs inlining) improves the skill — not just what to change",
"max_score": 10
},
{
"name": "REFERENCE.md not modified",
"description": "REFERENCE.md content is not changed or recreated — only SKILL.md is produced as modified output",
"max_score": 5
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions