CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

91

1.10x
Quality

91%

Does it follow best practices?

Impact

92%

1.10x

Average score across 25 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-6/

{
  "context": "Tests whether the agent correctly determines which type of eval (activation vs scored) to run first based on tile structure and existing results, and uses the correct command to detect skill count. The tile is multi-skill with no existing eval results — the correct strategy is to run activation first.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Skill count detection command",
      "description": "Script or notes include the command: ls skills/*/SKILL.md 2>/dev/null | wc -l (or functionally equivalent using find) to count the number of skills in the tile",
      "max_score": 14
    },
    {
      "name": "Activation eval run first",
      "description": "Script runs tessl eval run with --solver=activation as the FIRST eval command (before any scored eval run), since the tile is multi-skill with no existing results",
      "max_score": 20
    },
    {
      "name": "Scored eval follows activation",
      "description": "Script or notes describe running scored evals (tessl eval run without --solver=activation) AFTER reviewing activation results, not before",
      "max_score": 14
    },
    {
      "name": "Routing-clean gate explained",
      "description": "Notes file explains that scored evals should only be run for scenarios where routing is clean (activation succeeded), to avoid wasting scored-eval time on scenarios that route to the wrong skill",
      "max_score": 12
    },
    {
      "name": "Skip activation condition stated",
      "description": "Notes file states that activation would be skipped entirely if the tile had only a single skill (since there is no routing to test for single-skill tiles)",
      "max_score": 14
    },
    {
      "name": "Correct eval run command format",
      "description": "Script uses tessl eval run <path/to/tile> as the base command format (not tessl run eval or other variants)",
      "max_score": 8
    },
    {
      "name": "--solver=activation flag used",
      "description": "Script uses --solver=activation flag in the first eval run command (not --type=activation or other variant)",
      "max_score": 10
    },
    {
      "name": "Tile path used consistently",
      "description": "Script references ./invoice-processor/ (or a variable holding that path) rather than a hardcoded absolute path",
      "max_score": 8
    }
  ]
}

evals

README.md

tile.json