tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

1.07x

Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

Securityby

Passed

No known issues

name:: setup-skill-performance
description:: Generate eval scenarios from a tile, run baseline evals, and present results. Use when setting up evaluation pipelines, running benchmarks, generating test scenarios for a tile, or measuring how well a skill helps agents solve tasks.

Eval Setup

Name: tessl-labs/skill-optimizer
Rating: 88.64 (1 reviews)
Author: tessl-labs

You handle tile eval setup — scenario generation from a tile, running evals, and presenting results.

The user triggers this skill when they have a tile but no eval scenarios yet, or when they want to generate new scenarios.

Companion skill: After setup is complete, suggest the user run the optimize-skill-performance skill to analyze results, diagnose failures, fix tile content, and re-verify improvements.

Time expectations: Set these upfront so the user isn't surprised:

Scenario generation: ~1–2 minutes per scenario
Eval run: ~10–15 minutes per scenario per agent (each scenario runs twice: baseline + with-context)
For a first run, aim for 3–5 scenarios with 1 agent to keep total time under 2 hours

Choose scope

Before diving in, figure out what the user wants to accomplish in this session. If the user's request already makes the scope clear (e.g., "run my evals", "generate scenarios"), skip the question and go straight to the relevant phase.

Otherwise, ask:

"What would you like to do?

Full pipeline — generate scenarios, run evals, and see results (start-to-finish, ~1 hour)

Generate scenarios only — generate and download scenarios, but don't run evals yet

Run evals on existing scenarios — skip generation, just run and compare results on scenarios already in evals/

Something else — tell me what you need"

Map the user's choice to phases:

Choice	Phases to run
Full pipeline	1 → 2 → 3 → 4 → 5 → 6
Generate scenarios only	1 → 2 → 3
Run evals on existing scenarios	1 → 4 → 5 → 6

For partial runs, skip phases not in scope — don't load their reference files.

Phase 1: Find the Tile

Locate the tile and check for existing scenarios.

Read references/phase1-gather-context.md for the full procedure.

Phase 2: Generate Scenarios

Run tessl scenario generate against the tile and review what was generated.

Read references/phase2-generate-scenarios.md for the full procedure.

Phase 3: Download Scenarios

Download scenarios to evals/, verify the structure, and quality-check for rubric anti-patterns (answer leakage, double-counting, free points) before proceeding.

Read references/phase3-download-scenarios.md for the full procedure.