tessl/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

1.10x

Quality

91%

Does it follow best practices?

Impact

92%

1.10x

Average score across 25 eval scenarios

Securityby

Passed

No known issues

name:: optimize-skill-performance-and-instructions
description:: Run the full optimization cycle for a tile — review best practices, generate eval scenarios, run evals, diagnose gaps, fix, and re-run until scores improve. Use when someone says "optimize my skill", "improve my tile", "run evals", "benchmark my tile", or wants to measure and improve how well a tile helps agents solve tasks.

Optimize

Name: tessl/skill-optimizer
Rating: 91.95 (1 reviews)
Author: tessl

This skill orchestrates optimize-skill-instructions, setup-skill-performance, and optimize-skill-performance into a single end-to-end optimization cycle.

The full cycle takes 1–2 hours depending on how many scenarios and improvement iterations are needed. Set this expectation with the user upfront.

Overview

Review SKILL.md → Apply quick wins → Generate scenarios → Run evals → Analyze → Fix → Re-run → Report
└── optimize-skill-instructions ──┘  └── setup-skill-performance ──┘  └────────── optimize-skill-performance ──────────────┘

Key commands

tessl skill review skills/<name>/SKILL.md     # review a skill (Step 1)
tessl scenario generate <tile-path> --count=5 # generate scenarios (Step 2)
tessl eval run <tile-path> --solver=activation # test skill routing
tessl eval run <tile-path> --agent=claude:claude-sonnet-4-6 # scored eval
tessl eval view --last --json                 # check results

Step 1: Review best practices

Invoke the optimize-skill-instructions skill. This runs tessl skill review on the tile's skill(s), surfaces scoring dimensions and quick wins, and applies approved changes.

Entry criteria: The tile has at least one SKILL.md.

Exit criteria: Review score is presented, approved quick wins are applied. Move to Step 2.

If the review score is already high (>= 85%) and the user is satisfied, skip to Step 2 without changes.

Step 2: Run setup-skill-performance (full pipeline scope)

Invoke the setup-skill-performance skill with scope = "Full pipeline". Skip the scope question — go straight to Phase 1.

Work through all phases of setup-skill-performance (Find Tile → Generate Scenarios → Download & QC → Run Evals → View Results → Next Steps). Key parameters:

Generate 3–5 scenarios from the tile
Quality-check downloaded criteria for anti-patterns before running
Default agent: claude:claude-sonnet-4-6

Decision point after results: If the average eval score is already ≥ 85% with no regressions, stop and report success. Otherwise, continue to Step 3.

Step 3: Classify and prioritize

Before invoking optimize-skill-performance, do a quick triage of the results:

If baseline is ≥ 80% on most scenarios: The scenarios may be too easy. Consider regenerating harder scenarios before trying to improve the tile.
If regressions exist (with-context < baseline): These are highest priority — the tile is actively hurting.
If with-context has room to grow: Proceed to optimize-skill-performance.

Step 4: Run optimize-skill-performance

Invoke the optimize-skill-performance skill starting from Phase 1 (it will detect the existing results).

Work through the improve cycle:

Analyze results — classify every criterion into buckets (working / gap / redundant / regression)
Diagnose root causes by reading the failing criteria and the tile files
Apply targeted, minimal fixes to the appropriate files
Re-run evals
Compare before/after

Iteration rule: Run up to 2 improve iterations. After the second, report results and stop — the user should review before investing more time.

Step 5: Report

Present a final summary:

Optimization Complete

  Tile:         <tile-name>
  Review score: XX% → YY%
  Scenarios:    N scenarios
  Iterations:   X (1 setup + Y improve rounds)

  Eval before (baseline):     XX%
  Eval after (with tile):     YY%  (Δ +ZZpp)

  Criteria improved:  [list]
  Still failing:      [list with brief reason]

  Eval run: [URL to latest run]

If criteria remain stuck after 2 iterations, note whether the gap is addressable via documentation (suggest specific follow-up) or is inherently hard for the agent (suggest accepting or replacing the scenario).

When to stop

Stop when:

Review score is high AND eval average ≥ 85% with no regressions
2 improve iterations have been completed
The user says they're satisfied
Further improvements would require restructuring the tile significantly (suggest this as a separate effort)

skills

compare-skill-model-performance

optimize-skill-instructions

optimize-skill-performance

optimize-skill-performance-and-instructions

SKILL.md

setup-skill-performance

README.md

tile.json

tessl/skill-optimizer

SKILL.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}skills/optimize-skill-performance-and-instructions/

Optimize

Overview

Key commands

Step 1: Review best practices

Step 2: Run setup-skill-performance (full pipeline scope)

Step 3: Classify and prioritize

Step 4: Run optimize-skill-performance

Step 5: Report

When to stop

SKILL.mdskills/optimize-skill-performance-and-instructions/