CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

91

1.10x
Quality

91%

Does it follow best practices?

Impact

92%

1.10x

Average score across 25 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

phase4-run-evals.mdskills/setup-skill-performance/references/

Phase 4: Configure and Run Evals

4.1 Choose agents and models

For a first run, recommend keeping it simple:

"For a first run, I recommend just using claude:claude-sonnet-4-6 to keep eval time manageable (~10–15 minutes per scenario). Once you've validated the scenarios are good, you can add more agents to compare.

Want to go with the default, or test multiple agents now?

Available agents:

AgentModels
claudeclaude-sonnet-4-6 (default), claude-opus-4-6, claude-sonnet-4-5, claude-opus-4-5, claude-haiku-4-5
cursorauto, composer-1.5

Note: Each additional agent multiplies the eval run time and cost."

Build the --agent flags based on their choice. For multi-agent, each agent is a separate --agent flag:

--agent=claude:claude-sonnet-4-6 --agent=cursor:auto

4.2a Run the evals (standard)

tessl eval run <tile-path> \
  --agent=<agent1:model1> \
  [--agent=<agent2:model2>]

Note the eval run URL from the output and share it with the user so they can optionally watch progress in the browser.

4.2b Run an activation check

tessl eval run <tile-path> --solver=activation

This completes in ~2–3 min (no agent execution needed). Poll as in Phase 4.3, but expect a much shorter wait. Then proceed to Phase 5 to view results.

4.3 Poll for completion

tessl eval list --mine --limit 1

Eval runs take ~10–15 minutes per scenario per agent. Each scenario runs twice (baseline without context + with-context). Update the user periodically:

"Evals are running... Status: in_progress. With N scenarios and 1 agent, expect about X–Y minutes total. I'll check again shortly."

Wait until status shows completed. If status shows failed, run:

tessl eval retry <id>

skills

README.md

tile.json