Optimize your skills and plugins: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
85
88%
Does it follow best practices?
Impact
85%
1.08xAverage score across 29 eval scenarios
Advisory
Suggest reviewing before use
This phase covers two distinct eval types. Both apply to every plugin (single-skill and multi-skill alike).
--skip-forced-context-activation --skip-scoring): observes which skill self-activates per scenario. Does NOT force activation. Tests routing/description quality. Fast — completes in ~2–3 min.Ordering: for multi-skill plugins run activation first (catches routing problems before scored time is invested); for single-skill plugins either order is fine, parallel works too. Both are required — the variable is ordering, not whether to run them.
tessl eval run <plugin-path> --skip-forced-context-activation --skip-scoring --label <run-label>This completes in ~2–3 min (no agent execution needed). Note the eval run URL from the output and share it with the user.
Activation results are reviewed in Phase 5 — they do not produce a numeric score; they produce a per-scenario firing pattern (which skill, if any, fired on each scenario). Pair them with content-eval baseline scores to distinguish "no activation but agent handles it fine" from "no activation and agent needs help" (a real routing gap).
Poll for completion as described in §Polling below.
For a first run, recommend keeping it simple:
"For a first run, I recommend a single agent to keep eval time manageable (~10–15 minutes per scenario). Once you've validated the scenarios are good, you can compare more agents.
Want to go with one agent, or compare several now?
Run
tessl eval run --list-agentsto see the supportedagent:modelvalues and the current default. Each additional agent is a separate run, so it multiplies eval time and cost."
--agent takes a single agent:model. To compare agents or models, run tessl eval run once per agent — there is no multi---agent form.
If the skill involves external service calls — APIs, databases, MCPs, third-party tools — and the user has secrets to inject, they can pass an env file to the sandbox:
tessl eval run <plugin-path> --env-file <path-to-env-file> --agent=<agent:model> --label <run-label>Ask the user whether they have an env file when the plugin clearly exercises external services. Scenarios that don't need secrets run fine without it.
Single agent:
tessl eval run <plugin-path> --agent=<agent:model> --label <run-label>Comparing agents — one invocation each, with a distinct --label:
tessl eval run <plugin-path> --agent=claude:claude-sonnet-4-6 --label <run-label-sonnet>
tessl eval run <plugin-path> --agent=claude:claude-opus-4-8 --label <run-label-opus>Note each eval run URL from the output and share it with the user so they can optionally watch progress in the browser.
tessl eval list --mine --limit 1For content evals, runs take ~10–15 minutes per scenario per agent. Each scenario runs twice (baseline without context + with-context). Update the user periodically:
"Evals are running... Status: in_progress. With N scenarios and 1 agent, expect about X–Y minutes total. I'll check again shortly."
For activation evals, expect ~2–3 min total — much faster polling.
Wait until status shows completed. If status shows failed, run:
tessl eval retry <id>.tessl-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions