Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
91
91%
Does it follow best practices?
Impact
92%
1.10xAverage score across 25 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent runs tessl tile lint after applying fixes, checks for token cost increases, and recommends moving heavy content to docs (on-demand) rather than rules (always loaded) when front-loaded tokens balloon. This covers the Phase 3.3 lint-after-each-fix instruction from optimize-skill-performance.",
"type": "weighted_checklist",
"checklist": [
{
"name": "tessl tile lint command used",
"description": "Script uses tessl tile lint as the command for tile validation (not tessl lint tile, tessl validate, or other variants)",
"max_score": 22
},
{
"name": "Tile path argument provided",
"description": "Script passes the tile path (./database-migrator/ or equivalent) as an argument to tessl tile lint",
"max_score": 10
},
{
"name": "Lint run after each change set",
"description": "Script runs tessl tile lint at least twice — once after each distinct set of changes (i.e., there is more than one lint invocation, not a single final lint)",
"max_score": 15
},
{
"name": "Token cost ballooning flagged",
"description": "Script or notes include logic or commentary that flags when front-loaded token costs increase significantly — not just that lint passes or fails",
"max_score": 14
},
{
"name": "Move to docs recommended",
"description": "validation-notes.md recommends moving heavy content to docs (or equivalent on-demand storage) when front-loaded tokens increase significantly",
"max_score": 20
},
{
"name": "Docs vs rules distinction",
"description": "validation-notes.md explicitly distinguishes between docs (loaded on-demand) and rules (always loaded) — and states that heavy content should go to docs, not rules",
"max_score": 14
},
{
"name": "Does NOT recommend rules for heavy content",
"description": "Script and notes do NOT recommend keeping large additions in rules or SKILL.md inline when token costs have ballooned",
"max_score": 5
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions