CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-24/

{
  "context": "Testing whether an agent following the setup-skill-performance skill correctly guides a new user through the full eval setup pipeline, including prerequisites, commit browsing, context file detection, scenario generation, and running the eval.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "checks_prerequisites",
      "description": "The agent verifies the user is logged in (e.g., runs tessl whoami) before proceeding with setup steps.",
      "max_score": 8
    },
    {
      "name": "browses_commits",
      "description": "The agent runs `tessl repo select-commits acme/backend` to show actual commits from the repo, rather than asking the user to supply commit hashes directly without any browsing step.",
      "max_score": 25
    },
    {
      "name": "auto_detects_context_files",
      "description": "The agent searches the repository for context files (CLAUDE.md, *.mdc, AGENTS.md, tessl.json, etc.) automatically — rather than asking the user to specify them without any investigation.",
      "max_score": 17
    },
    {
      "name": "uses_context_flag",
      "description": "The agent includes a `--context` flag when running `tessl scenario generate`, specifying appropriate glob patterns for the detected context files.",
      "max_score": 17
    },
    {
      "name": "workspace_in_eval_run",
      "description": "For project evals (codebase evaluations using git commits), the agent includes `--workspace=<name>` when running `tessl eval run`. Omitting --workspace would cause the command to fail.",
      "max_score": 17
    },
    {
      "name": "explains_baseline_vs_context",
      "description": "The agent explains that each scenario runs twice — once without context files (baseline) and once with them injected — and that the delta shows whether CLAUDE.md is helping the agent.",
      "max_score": 16
    }
  ]
}

evals

README.md

tile.json