tessl/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

1.22x

Quality

91%

Does it follow best practices?

Impact

86%

1.22x

Average score across 29 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Phase 3: Fixtures and setup scripts

Name: tessl/skill-optimizer
Rating: 86.50999999999999 (1 reviews)
Author: tessl

Eval scenarios start in an empty working directory unless they declare environment preparation. Two kinds exist:

Fixtures — static context installed into the working directory before the agent runs. Declared as a named record under fixtures in scenario.json.
Setup scripts — shell scripts that run after fixtures install and before the agent task begins. Either declared as setup: ["./setup.sh"] in scenario.json, or auto-detected from a setup.sh placed next to the scenario.

This reference covers when each is needed, how to source/generate them silently when possible, and the skip rules that protect user intent.

Schema reminder

A scenario.json with both kinds of preparation looks like:

{
  "fixtures": {
    "codebase": {
      "type": "commit",
      "repoUrl": "https://github.com/acme/example.git",
      "ref": "main"
    }
  },
  "setup": ["./setup.sh"]
}

Fixture types:

commit — { type: "commit", repoUrl, ref, installPath?, exclude? } — git snapshot at a ref.
directory — { type: "directory", path, installPath } — local directory copied in.

Detection rules

For each scenario, read its task.md, criteria.json, and the parent tile's SKILL.md / docs/ to classify needs.

Fixture-warranted signals

Fire if any of these apply:

The tile's SKILL.md talks about modifying or refactoring existing code, or uses phrases like "this codebase", "your repo", or names file paths the user is expected to edit.
The scenario's task.md references files that must already exist — e.g. "fix the bug in src/foo.ts", "update the migration in db/".
The rubric in criteria.json checks for edits to existing files rather than from-scratch creation.

Setup-script-warranted signals

Fire if any of these apply:

The tile mentions an external CLI or dependency the agent must run (npm install, pip install, brew install, cargo build).
The task assumes a running service, database, env var, credentials, or migration state that can't be expressed as static files.

Skip rules (always honour user intent)

Do not generate or overwrite when:

scenario.json already declares a fixtures record — leave it alone.
scenario.json already declares setup, or a setup.sh already exists next to the scenario — leave it alone.

Skip rules apply per-scenario. A tile with five scenarios may need fixture generation on three of them and nothing on the other two.

Fixture sourcing — silent inference first

When the fixture signal fires for a scenario:

Scan the tile silently. Look in SKILL.md, the tile's docs/, and any README.md for:
- A repo URL (anything matching https://github.com/… or a git@github.com:… reference) together with an obvious ref (branch name, tag, or commit), OR
- A sample directory path inside the tile (e.g. examples/, fixtures/).
If a plausible source is found, write it into scenario.json under fixtures.<name> using the schema above. Name the fixture descriptively — codebase for the primary repo snapshot is conventional; use examples or similar for directory fixtures.
If nothing plausible is found, ask the user once:

"Scenario <slug> looks like it needs an existing codebase to operate on, but I couldn't find a repo URL or sample path in the tile. Want to provide one (<repo-url>#<ref> or a local directory path), or skip fixture generation for this scenario?"
If the user declines or can't provide a source, skip fixture generation for that scenario, record it in the pre-run summary, and continue. Do not stop the phase — the user has explicitly opted into a degraded eval.

Be silent when signals are clear. Only ask when sourcing genuinely can't be inferred.

Setup-script generation

When the setup-script signal fires for a scenario:

Generate setup.sh next to the scenario (in the same directory as scenario.json / task.md). The eval system auto-detects this file — there's no need to declare it in scenario.json unless you want to be explicit.
Script content: the minimum init commands implied by the signal. Include a shebang on the first line — the scenario lint rule warns without it.
```
#!/bin/bash
set -euo pipefail

npm install
```
chmod +x setup.sh after writing so the eval runner can execute it.
If the signal isn't specific enough to write a concrete script (e.g. you can see the tile expects a database but not which migrations to run), ask the user to confirm the commands before writing:
"Scenario <slug> looks like it needs setup before the agent runs. My best guess is:
```
#!/bin/bash
set -euo pipefail
npm install
```
Use this, or do you want to provide different commands?"

Pre-run summary

After processing all scenarios — and before kicking off tessl eval run — show the user one concise summary so they can catch wrong inferences early:

Generated 2 fixtures and 1 setup script across 4 scenarios.
  - checkout-flow: fixture (commit: acme/example#main), setup.sh (npm install)
  - webhook-setup: fixture (commit: acme/example#main)
  - custom-scenario: fixture (user-declared), setup.sh (user-declared)
  - empty-state: no preparation needed

For scenarios where skip rules fired (a fixtures record, setup array, or setup.sh was already present), list each as user-declared rather than no preparation needed, so the user can confirm their declarations were respected and catch any mismatches.

If any scenario was skipped because the user declined to provide a source, list it here so the user knows the eval will run that scenario in an empty workdir.