Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent can identify orphaned files in a skill bundle (files that exist but are never referenced in SKILL.md) and provide appropriate recommendations. The bundle contains 8 files: SKILL.md references MIGRATIONS.md and POOLING.md, leaving 5 orphaned files (TRANSACTIONS.md, PERFORMANCE.md, SECURITY.md should be linked; LEGACY_EXAMPLES.md and DRAFT_REPLICATION.md should be removed).",
"type": "weighted_checklist",
"checklist": [
{
"name": "Lists all bundle files",
"description": "Identifies all 8 files in the bundle (SKILL.md, MIGRATIONS.md, POOLING.md, TRANSACTIONS.md, PERFORMANCE.md, SECURITY.md, LEGACY_EXAMPLES.md, DRAFT_REPLICATION.md)",
"max_score": 10
},
{
"name": "Identifies referenced files",
"description": "Correctly identifies that MIGRATIONS.md and POOLING.md are referenced in SKILL.md",
"max_score": 10
},
{
"name": "Identifies orphaned files",
"description": "Correctly identifies all 5 orphaned files (TRANSACTIONS.md, PERFORMANCE.md, SECURITY.md, LEGACY_EXAMPLES.md, DRAFT_REPLICATION.md) - files that exist but are never linked from SKILL.md",
"max_score": 15
},
{
"name": "TRANSACTIONS.md recommendation",
"description": "Recommends linking TRANSACTIONS.md (valuable content about atomic operations) with clear routing signals like 'for atomic operations and rollback patterns' or similar",
"max_score": 10
},
{
"name": "PERFORMANCE.md recommendation",
"description": "Recommends linking PERFORMANCE.md (valuable optimization content) with clear routing signals like 'for indexing strategies and batch operation optimization' or similar",
"max_score": 10
},
{
"name": "SECURITY.md recommendation",
"description": "Recommends linking SECURITY.md (valuable security content) with clear routing signals like 'for credential management and connection security practices' or similar",
"max_score": 10
},
{
"name": "LEGACY_EXAMPLES.md recommendation",
"description": "Recommends REMOVING LEGACY_EXAMPLES.md - it's deprecated content from v1.x that shouldn't be in the bundle anymore",
"max_score": 10
},
{
"name": "DRAFT_REPLICATION.md recommendation",
"description": "Recommends REMOVING DRAFT_REPLICATION.md - it's unfinished/unimplemented content that shouldn't be in a published tile",
"max_score": 10
},
{
"name": "Bloat reduction framing",
"description": "Frames orphaned files as adding bloat to the tile - mentions tile size, unnecessary files, or reducing bundle size",
"max_score": 5
},
{
"name": "Clear routing signals emphasis",
"description": "When recommending links, emphasizes the need for clear routing signals (WHEN to open each file) rather than generic 'see FILE.md for more info' links",
"max_score": 5
},
{
"name": "Link vs remove justification",
"description": "Provides clear reasoning for why some orphaned files should be linked (valuable content) vs removed (deprecated, unfinished, or no longer relevant)",
"max_score": 5
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions