Optimize your skills and plugins: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
87%
Does it follow best practices?
Impact
89%
1.14xAverage score across 29 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent correctly determines which type of eval (activation vs scored) to run first based on plugin structure and existing results, and uses the correct command to detect skill count. The plugin is multi-skill with no existing eval results — the correct strategy is to run activation first.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Skill count detection command",
"description": "Script or notes include the command: ls skills/*/SKILL.md 2>/dev/null | wc -l (or functionally equivalent using find) to count the number of skills in the plugin",
"max_score": 14
},
{
"name": "Activation eval run first",
"description": "Script runs tessl eval run with --skip-forced-context-activation --skip-scoring as the FIRST eval command (before any scored eval run), since the plugin is multi-skill with no existing results",
"max_score": 20
},
{
"name": "Scored eval follows activation",
"description": "Script or notes describe running scored evals (bare tessl eval run, without the --skip-forced-context-activation --skip-scoring flags) AFTER reviewing activation results, not before",
"max_score": 14
},
{
"name": "Routing-clean gate explained",
"description": "Notes file explains that scored evals should only be run for scenarios where routing is clean (activation succeeded), to avoid wasting scored-eval time on scenarios that route to the wrong skill",
"max_score": 12
},
{
"name": "Skip activation condition stated",
"description": "Notes file states that activation would be skipped entirely if the plugin had only a single skill (since there is no routing to test for single-skill plugins)",
"max_score": 14
},
{
"name": "Correct eval run command format",
"description": "Script uses tessl eval run <path/to/plugin> as the base command format (not tessl run eval or other variants)",
"max_score": 8
},
{
"name": "--skip-forced-context-activation --skip-scoring flags used",
"description": "Script uses the --skip-forced-context-activation --skip-scoring flags in the first eval run command (not --type=activation, --solver=activation, or other variant)",
"max_score": 10
},
{
"name": "Plugin path used consistently",
"description": "Script references ./invoice-processor/ (or a variable holding that path) rather than a hardcoded absolute path",
"max_score": 8
}
]
}.tessl-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions