Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent understands progressive disclosure as routing clarity rather than just splitting files. The SKILL.md contains 10 file references: 4 with clear routing signals (AUTHENTICATION.md, RETRIES.md, RATE_LIMITING.md, WEBHOOKS.md, ERROR_HANDLING.md) and 5 with ambiguous signals (CONFIGURATION.md, GUIDE.md, EXAMPLES.md, ADVANCED.md, REFERENCE.md). The agent should identify which references allow confident routing decisions vs force speculative opening.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Identifies good references",
"description": "Correctly identifies at least 3 of the good references (AUTHENTICATION.md with OAuth2/token details, RETRIES.md with 5xx/timeout context, RATE_LIMITING.md with token bucket/429 details, ERROR_HANDLING.md with 4xx/5xx debugging, WEBHOOKS.md with signature verification)",
"max_score": 15
},
{
"name": "Explains why good",
"description": "Explains that good references have clear WHEN signals - the link text tells the agent exactly when the content is relevant (e.g., 'when requests fail with 5xx', 'for OAuth2 flows and token refresh')",
"max_score": 10
},
{
"name": "Identifies poor references",
"description": "Correctly identifies at least 3 of the poor references (CONFIGURATION.md 'for more information', GUIDE.md 'for additional details', EXAMPLES.md 'for examples', ADVANCED.md 'for advanced features', REFERENCE.md 'for the complete API reference')",
"max_score": 15
},
{
"name": "Explains why poor",
"description": "Explains that poor references force speculative opening - the agent can't tell WHEN the content is relevant without reading it ('more information' about what? 'additional details' on which topic?)",
"max_score": 10
},
{
"name": "Token efficiency framing",
"description": "Frames the problem in terms of token efficiency or wasted context - mentions that ambiguous links cause agents to open files 'just in case' or 'defensively', defeating the purpose of splitting content",
"max_score": 10
},
{
"name": "Routing gate test",
"description": "Applies the 'can agent decide WITHOUT opening' test - explicitly asks or checks whether each link allows routing decisions based on link text alone",
"max_score": 10
},
{
"name": "Improves CONFIGURATION.md",
"description": "Provides revised link text for CONFIGURATION.md that specifies what configuration details are covered (e.g., 'for timeout settings, retry limits, and connection pool sizing')",
"max_score": 5
},
{
"name": "Improves GUIDE.md",
"description": "Provides revised link text for GUIDE.md that specifies what guidance is covered or identifies it as redundant/should be inlined",
"max_score": 5
},
{
"name": "Improves EXAMPLES.md",
"description": "Provides revised link text for EXAMPLES.md that specifies what kinds of examples (e.g., 'for complete integration examples with Django, Flask, and FastAPI')",
"max_score": 5
},
{
"name": "Improves ADVANCED.md or REFERENCE.md",
"description": "Provides revised link text for ADVANCED.md or REFERENCE.md with specific content signals, or suggests inlining if scope is too broad",
"max_score": 5
},
{
"name": "Questions blind split recommendation",
"description": "Notes that the rubric rewards progressive disclosure but questions whether splitting helps if routing is unclear - suggests that inlining might be better in some cases when link text can't provide clear signals",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions