Optimize your skills and plugins: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
89
90%
Does it follow best practices?
Impact
89%
1.14xAverage score across 29 eval scenarios
Passed
No known issues
Improve your skill using tessl review run plus validation and context: it reviews your full skill bundle, validates syntax, explains WHY changes help, and catches mistakes before applying.
For a hands-off, automated improve loop, use tessl review fix instead — see Fast path: automated fix loop.
tessl review flags — derive them from actual review outputtessl review run <path-to-skill> --label "baseline"Pass --threshold <percent> to exit non-zero below a score gate. Re-open the most recent result with tessl review view --last.
Parse output for scores, validation issues, and judge suggestions. Prioritize fixes: Critical (ERRORs) → High (missing "Use when...", low actionability/conciseness) → Medium (other dimensions) → Low (warnings)
Read SKILL.md and list files in its directory. Bundle = SKILL.md + sibling files + referenced files. Check for orphaned files (see Progressive Disclosure section). Use bundle context to improve progressive disclosure.
For each issue, produce a recommendation block:
## [Action verb]: [what to change]
Dimension: [name] [current]/3 → [target]/3 (+Z% overall)
Before:
> [exact current text]
After:
> [exact replacement text]
Why: [one sentence: how this improves routing/clarity/actionability]If bundle has reference files, recommend linking instead of inlining for progressive disclosure.
CRITICAL: Validate before applying changes
Run each validation step and show the output — do not just describe what you would run:
ast.parse on any Python code blocks and show the output (including any SyntaxError details)node --check <file> and show the result--help output to verify flags are validdescription: field: verify it contains a "Use when..." trigger clause (check the YAML header, not the body)See REFERENCE.md for validation code snippets (Python ast.parse, JS node --check, bash file-reference checks). When producing an automation script, include each step as executable code (not comments).
Start with a priority summary table before individual details:
Priority | Recommendation | Score impact | Dimension
---------|--------------------------|-------------|----------
Critical | Add "Use when..." clause | +15% overall | Completeness 0→3
High | Remove HMAC explanation | +8% overall | Conciseness 1→2
Medium | Add retry example | +5% overall | Actionability 2→3Then expand each row using the recommendation block format from Phase 3.
The dimension names above are illustrative — use the actual dimensions from your review output. They come from the active reviewer's judges (the default reviewer scores a description and a content judge); a custom reviewer can define different ones.
When a recommendation has a trade-off (e.g. conciseness gain vs. domain context loss), present both options and ask. Frame changes as proposals; get user approval before applying.
Use the Edit tool to update the SKILL.md and any reference docs the review flagged (e.g. a references/*.md link fix, or moving inlined content out of SKILL.md for progressive disclosure). Keep edits minimal and conservative. For issues in non-prose bundle files (scripts/, assets/), surface them for the user rather than rewriting them here. Track applied recommendations and expected impacts.
Run review again:
tessl review run <path-to-skill> --label "verify"Compare scores:
Before: 72% | After: 89% (+17%)
- Completeness: 2/3 → 3/3 (added "Use when..." clause)
- Actionability: 2/3 → 3/3 (added executable code)
- Conciseness: 1/3 → 2/3 (removed verbose explanations)Explain which dimensions improved and their impact on the overall score.
Re-run validation from Phase 4 on the updated SKILL.md:
Fix any issues, then re-run tessl review run to confirm improvement.
When the user wants hands-off iteration rather than the approval-gated workflow above, use tessl review fix. It runs the review-and-fix loop automatically — improving the skill and re-reviewing up to --max-iterations (default 3, max 10), stopping early once it hits --threshold:
tessl review fix <path-to-skill> --max-iterations 3 --threshold 85It downloads the improved bundle, prints the baseline → final score, and asks for confirmation before applying (pass --yes to auto-apply). Note tessl review fix has no --label flag.
Choose between the two paths:
tessl review fix: when you want speed and are comfortable with the reviewer applying changes automatically.Both tessl review run and tessl review fix use the default reviewer unless you pass --review-plugin <local-dir | workspace/plugin[@version]>. Most users keep the default. To author a custom reviewer that adds or removes judges and scoring dimensions, use the create-review-plugin skill (tessl/review-plugin-creator).
Every link must signal WHEN it's relevant:
Check for orphaned files:
Files in the bundle that are never referenced add bloat without providing value.
# Find files that exist but aren't linked
ls skill_dir/ | grep -v SKILL.md
grep -oE '\[[^]]*\]\(([^)]+\.md)\)' SKILL.md | cut -d'(' -f2 | cut -d')' -f1
# Compare: files that exist but aren't in the grep output = orphanedFor each orphaned file, recommend:
Don't leave unreferenced files in the bundle. They waste space and confuse maintainers.
tessl review run for evaluation, or tessl review fix for the automated loop.tessl-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions