Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent uses the correct `tessl skill review` command in the automation script, runs it both before and after changes, structures the workflow so validation happens before changes are applied, and incorporates multiple validation methods from Phase 4.",
"type": "weighted_checklist",
"checklist": [
{
"name": "tessl skill review command",
"description": "Script uses `tessl skill review <path>` (exact command name) for the evaluation step — not a generic alternative",
"max_score": 15
},
{
"name": "Review before changes",
"description": "Script runs `tessl skill review` BEFORE making any changes to capture the baseline score",
"max_score": 10
},
{
"name": "Review after changes",
"description": "Script runs `tessl skill review` a SECOND TIME after changes are applied to verify improvement",
"max_score": 10
},
{
"name": "Validation before apply",
"description": "Script includes a validation step that is placed BEFORE the change/edit step in the workflow",
"max_score": 10
},
{
"name": "Python ast.parse validation",
"description": "Script includes Python syntax validation using `ast.parse` or `python -c 'import ast; ...'` — specifically the ast module",
"max_score": 8
},
{
"name": "node --check JS validation",
"description": "Script includes JavaScript syntax validation using `node --check <file>` command",
"max_score": 8
},
{
"name": "Command --help flag validation",
"description": "Script validates command flags by consulting the command's `--help` output",
"max_score": 8
},
{
"name": "File reference validation",
"description": "Script checks that file references in the SKILL.md actually exist on disk",
"max_score": 7
},
{
"name": "Before/after score output",
"description": "Script captures or outputs both the before and after scores side-by-side, enabling comparison",
"max_score": 8
},
{
"name": "Script accepts SKILL.md path",
"description": "Script accepts the SKILL.md path as an argument (e.g. `$1` or script parameter), not hardcoded",
"max_score": 8
},
{
"name": "Phases are ordered",
"description": "Script structure follows: baseline review → validation → (change step) → post-change review, in that order",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions