tessl/skill-optimizer

Optimize your skills and plugins: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

1.14x

Quality

87%

Does it follow best practices?

Impact

89%

1.14x

Average score across 29 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Eval Kickoff Plan for invoice-processor Plugin

Name: tessl/skill-optimizer
Rating: 88.84 (1 reviews)
Author: tessl

Problem Description

The engineering team at Fieldstone has built an invoice-processor plugin containing several specialized skills for handling different parts of their invoice automation pipeline. The plugin is new — no evals have been run against it yet. Before investing hours of compute time in scored evals, the tech lead wants a clear plan for how to kick off evaluation in the right order.

The team has heard that running the wrong type of eval first can waste time (e.g., running scored evals against scenarios that route to the wrong skill). They want a concrete plan — ideally a shell script — that:

Detects what kind of plugin they're working with
Chooses the right eval strategy based on what's found
Describes what to do after the first round of results comes in

The plugin is already present on disk at ./invoice-processor/.

Output Specification

Produce a file called eval-kickoff-plan.sh (a shell script) and a short eval-notes.md explaining the reasoning behind the eval ordering strategy chosen.

The script should:

Detect the number of skills in the plugin
Run the appropriate eval type first given the plugin structure and the absence of existing results
Show what commands to run after the first results are available

The notes file should explain:

Why the chosen eval type was run first
What to do after those results come back
Under what circumstances you would skip the first eval type entirely

.tessl-plugin

evals

scenario-1

scenario-2

scenario-3

scenario-4

scenario-5

scenario-6

scenario-7

scenario-8

scenario-9

scenario-10

scenario-11

scenario-12

scenario-13

scenario-14

scenario-15

scenario-16

scenario-17

scenario-18

scenario-19

scenario-20

scenario-21

scenario-22

scenario-23

criteria.json

task.md

scenario-24

scenario-25

scenario-26

scenario-27

scenario-28

scenario-29

skills

README.md

tile.json

tessl/skill-optimizer

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-23/

Eval Kickoff Plan for invoice-processor Plugin

Problem Description

Output Specification

task.mdevals/scenario-23/