Optimize your skills and plugins: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
87%
Does it follow best practices?
Impact
89%
1.14xAverage score across 29 eval scenarios
Advisory
Suggest reviewing before use
The engineering team at Fieldstone has built an invoice-processor plugin containing several specialized skills for handling different parts of their invoice automation pipeline. The plugin is new — no evals have been run against it yet. Before investing hours of compute time in scored evals, the tech lead wants a clear plan for how to kick off evaluation in the right order.
The team has heard that running the wrong type of eval first can waste time (e.g., running scored evals against scenarios that route to the wrong skill). They want a concrete plan — ideally a shell script — that:
The plugin is already present on disk at ./invoice-processor/.
Produce a file called eval-kickoff-plan.sh (a shell script) and a short eval-notes.md explaining the reasoning behind the eval ordering strategy chosen.
The script should:
The notes file should explain:
.tessl-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions