Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent applies Phase 4 validation methods correctly: Python syntax using ast.parse, JavaScript syntax using node --check, command flag validity by consulting --help output, and file reference existence checks. The SKILL.md provided contains a Python syntax error, and a broken file reference.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Python via ast.parse",
"description": "Uses Python's ast.parse() (or equivalent ast module usage) to validate the Python code block — not just visual inspection",
"max_score": 15
},
{
"name": "Python error identified",
"description": "Correctly identifies the syntax error in the Python snippet (the missing closing parenthesis in the function signature `def run_ingestion(source_url: str, dest_table: str:`)",
"max_score": 10
},
{
"name": "JavaScript via node --check",
"description": "Uses `node --check` to validate the JavaScript code block — not just visual inspection",
"max_score": 15
},
{
"name": "Command flag validation",
"description": "Attempts to validate command flags from the bash snippet (e.g. checks `pipeline validate --help` or equivalent) rather than assuming flags are correct",
"max_score": 10
},
{
"name": "File reference check",
"description": "Checks whether the files referenced in markdown links (SETUP_GUIDE.md, TROUBLESHOOT.md) exist in the skill bundle directory",
"max_score": 15
},
{
"name": "Broken reference identified",
"description": "Correctly identifies that SETUP_GUIDE.md is referenced but does not exist in the bundle",
"max_score": 10
},
{
"name": "Validation before application",
"description": "Report is framed as checks to perform BEFORE applying any changes (not after) — explicitly validates current state",
"max_score": 5
},
{
"name": "Per-check pass/fail",
"description": "Report shows clear pass/fail (or ✓/✗) for each individual check, not just a summary",
"max_score": 10
},
{
"name": "Fix suggestions",
"description": "Provides specific suggested fixes for each issue found (corrected Python syntax, how to create missing file)",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions