Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
Download the generated scenarios using the run ID from Phase 2:
tessl scenario download --last -o <tile-dir>/evals/Use --strategy merge if adding to existing scenarios:
tessl scenario download --last -o <tile-dir>/evals/ --strategy mergeUse --strategy replace only if the user explicitly asked to replace existing scenarios.
ls <tile-dir>/evals/*/task.mdShow the user the downloaded scenario structure:
Downloaded scenarios:
evals/
checkout-flow/
task.md
criteria.json
scenario.json
webhook-setup/
task.md
criteria.json
scenario.jsonBefore asking the user, read each criteria.json and task.md yourself and flag these common problems:
Rubric anti-patterns to catch:
task.md contain specific values (version numbers, URLs, class names) that are also rubric criteria? If a criterion just checks whether the agent copied a value from the task prompt, it's a free point. Remove the value from the task or remove the criterion.no_unrelated_changes included as a criterion? This scores 1 on nearly every solution and doesn't discriminate. Remove it unless the scenario specifically tests scope discipline.Present your findings and offer review options:
"You can also:
- Review task.md — see what the agent will be asked to do
- Review criteria.json — see what the rubric checks for
- Edit criteria weights — adjust which criteria matter most
- Proceed to eval run — use the scenarios as-is"
If the user wants to review, read and display the relevant files. Apply any edits they request.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions