Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
91
91%
Does it follow best practices?
Impact
92%
1.10xAverage score across 25 eval scenarios
Passed
No known issues
Improve your SKILL.md using tessl skill review plus validation and context: reads your full skill bundle, validates syntax, explains WHY changes help, and catches mistakes before applying.
tessl skill review <path-to-SKILL.md>Parse output for scores, validation issues, and judge suggestions. Prioritize fixes: Critical (ERRORs) → High (missing "Use when...", low actionability/conciseness) → Medium (other dimensions) → Low (warnings)
Read SKILL.md and list files in its directory. Bundle = SKILL.md + sibling files + referenced files. Check for orphaned files (see Progressive Disclosure section). Use bundle context to improve progressive disclosure.
For each issue, provide: what to change, why (dimension + score), before/after, impact, educational note explaining WHY it helps. Apply "Don't invent" principle from Guiding Principles—ask user when unsure.
If bundle has reference files (REFERENCE.md, etc.), recommend linking instead of inlining for progressive disclosure.
CRITICAL: Validate before applying changes
Run each validation step and show the output — do not just describe what you would run:
ast.parse on any Python code blocks and show the output (including any SyntaxError details)node --check <file> and show the result--help output to verify flags are validdescription: field: verify it contains a "Use when..." trigger clause (check the YAML header, not the body)See references/REFERENCE.md for examples. When producing an automation script, include each step as executable code (not comments).
Start with a priority summary table before individual details:
Priority | Recommendation | Score impact | Dimension
---------|--------------------------|-------------|----------
Critical | Add "Use when..." clause | +15% overall | Completeness 0→3
High | Remove HMAC explanation | +8% overall | Conciseness 1→2
Medium | Add retry example | +5% overall | Actionability 2→3Then for each recommendation: current dimension score, issue, before/after examples, numeric score impact estimate (e.g. "+8% overall, Actionability 2→3"), and educational WHY.
Discuss trade-offs, not just score gains:
Frame changes as proposals (e.g., "I recommend X" or "I suggest removing Y") rather than imperative instructions. Get user approval before applying.
Use Edit tool to update SKILL.md. Track applied recommendations and expected impacts.
Run review again:
tessl skill review <path-to-SKILL.md>Compare scores:
Before: 72% | After: 89% (+17%)
- Completeness: 2/3 → 3/3 (added "Use when..." clause)
- Actionability: 2/3 → 3/3 (added executable code)
- Conciseness: 1/3 → 2/3 (removed verbose explanations)Explain which dimensions improved and their impact on the overall score.
Re-run validation from Phase 4 on the updated SKILL.md:
Fix any issues, then re-run tessl skill review to confirm improvement.
40 files is excellent IF each link signals WHEN it's relevant. Bad links force agents to open files "just in case."
The gate: Can the agent decide WITHOUT opening?
If routing is unclear, inlining may be more token-efficient than splitting.
Check for orphaned files:
Files in the bundle that are never referenced add bloat without providing value.
# Find files that exist but aren't linked
ls skill_dir/ | grep -v SKILL.md
grep -oE '\[[^]]*\]\(([^)]+\.md)\)' SKILL.md | cut -d'(' -f2 | cut -d')' -f1
# Compare: files that exist but aren't in the grep output = orphanedFor each orphaned file, recommend:
Don't leave unreferenced files in the bundle. They waste space and confuse maintainers.
tessl skill review for evaluationevals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions