Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
91
91%
Does it follow best practices?
Impact
92%
1.10xAverage score across 25 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly analyzes activation eval results: produces a skill coverage summary identifying skills that never fired, applies zero-activation analysis by cross-referencing scored eval data to distinguish routing gaps from out-of-scope tasks, and auto-suggests minimal description rewrites for confirmed routing gaps.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Routing table present",
"description": "Report includes a table or list showing which skill fired for each of the six scenarios (three with activations, three with no activation)",
"max_score": 8
},
{
"name": "Skill coverage summary correct",
"description": "Report includes a skill coverage summary showing all three skills (markdown-formatter, citation-generator, link-checker) each fired in at least one scenario — the report does NOT claim any skill never fired, since each appears in the activated skills data at least once",
"max_score": 10
},
{
"name": "rewrite-intro out-of-scope determination",
"description": "The zero-activation analysis for 'rewrite-intro-paragraph' concludes it is out of scope (not a routing gap), supported by the high baseline score of 88% showing the agent handles this task well without the skill",
"max_score": 12
},
{
"name": "generate-bibliography routing gap determination",
"description": "The zero-activation analysis for 'generate-bibliography' concludes it IS a routing gap — the agent struggles (31% baseline) and the skill isn't firing despite the task being within the tile's domain (bibliography/citations)",
"max_score": 12
},
{
"name": "fix-heading-hierarchy routing gap determination",
"description": "The zero-activation analysis for 'fix-heading-hierarchy' concludes it IS a routing gap — the agent benefits from the skill (22% baseline → 67% with context, +45 delta) but the skill is not being activated",
"max_score": 12
},
{
"name": "citation-generator description rewrite",
"description": "A proposed description rewrite for citation-generator is present that adds terminology covering bibliography or IEEE format (or reference lists) — expanding beyond APA, MLA, Chicago to cover the missed 'generate-bibliography' scenario",
"max_score": 12
},
{
"name": "markdown-formatter description rewrite",
"description": "A proposed description rewrite for markdown-formatter is present that adds coverage for heading hierarchy correction, heading levels, or heading nesting — expanding beyond table alignment and code blocks",
"max_score": 12
},
{
"name": "Minimal rewrite principle",
"description": "The proposed description rewrites add the missing trigger phrasing without completely replacing the existing description — the core original description language is preserved",
"max_score": 8
},
{
"name": "Rewrites presented together",
"description": "All proposed description changes are presented together in a summary section (not scattered throughout the document), making it easy to review and approve them as a batch",
"max_score": 8
},
{
"name": "Scored eval data cited",
"description": "The zero-activation analysis references specific numbers from the scored eval data (baseline scores and/or deltas) to support the routing gap vs out-of-scope determination",
"max_score": 6
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions