CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-23/

{
  "context": "Tests whether the agent correctly applies the setup-skill-performance skill's commit selection criteria: hard-skip gates to eliminate trivial/docs/config/generated commits, the 7 complexity signals to score survivors, and a final recommendation of only the structurally complex commits.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "skips_trivial_commits",
      "description": "Commits 1 (rename, 1 file 0 lines), 4 (utility, 2 files 40 lines) are rejected by hard-skip gates: fewer than 3 source files changed or fewer than 50 lines of source code changed",
      "max_score": 15
    },
    {
      "name": "skips_docs_and_config_only",
      "description": "Commit 2 (README + CONTRIBUTING) is rejected as docs-only and commit 3 (package.json + lock file) is rejected as config/generated-only",
      "max_score": 10
    },
    {
      "name": "skips_mechanical_generated_commit",
      "description": "Commit 7 (198-line SQL migration in 1 file) is rejected despite its large line count — it is a single auto-generated migration file with no structural complexity",
      "max_score": 10
    },
    {
      "name": "scores_payment_commit_high",
      "description": "Commit 5 (payment processing) receives a high complexity score (4+/7), recognizing signals such as: new abstractions (PaymentRequest types, StripeClient), cross-cutting scope (routes, middleware, services, webhooks), wiring/registration (route + middleware + webhook handler integration), and domain-specific logic (payment flows, idempotency)",
      "max_score": 15
    },
    {
      "name": "scores_auth_refactor_highest",
      "description": "Commit 6 (auth system refactor) receives the highest complexity score among all commits (5+/7), recognizing it hits more signals than commit 5 due to: refactoring existing code (not just adding new), migrating a data store, multiple interdependent changes, and cross-cutting scope across 8 files in auth/config/routes",
      "max_score": 10
    },
    {
      "name": "references_complexity_signals",
      "description": "The analysis explicitly uses at least 5 of the 7 named complexity signals: new abstractions, cross-cutting scope, wiring and registration, non-obvious control flow, domain-specific logic, multiple interdependent changes, no single-point solution",
      "max_score": 15
    },
    {
      "name": "recommends_two_or_three_commits",
      "description": "Final recommendation includes 2 or 3 commits (not 1, not 4+). The selected set must include commits 5 and 6; commit 4 is acceptable as a borderline third pick only if accompanied by a caveat about its lower complexity",
      "max_score": 10
    },
    {
      "name": "explains_selection_rationale",
      "description": "For each recommended commit, provides a specific explanation of why it would produce a challenging eval scenario — not just restating the complexity score but explaining what about the commit would be hard for an agent to reproduce without codebase context",
      "max_score": 15
    }
  ]
}

evals

README.md

tile.json