CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

91

1.10x
Quality

91%

Does it follow best practices?

Impact

92%

1.10x

Average score across 25 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent correctly analyzes activation eval results: produces a skill coverage summary identifying skills that never fired, applies zero-activation analysis by cross-referencing scored eval data to distinguish routing gaps from out-of-scope tasks, and auto-suggests minimal description rewrites for confirmed routing gaps.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Routing table present",
      "description": "Report includes a table or list showing which skill fired for each of the six scenarios (three with activations, three with no activation)",
      "max_score": 8
    },
    {
      "name": "Skill coverage summary correct",
      "description": "Report includes a skill coverage summary showing all three skills (markdown-formatter, citation-generator, link-checker) each fired in at least one scenario — the report does NOT claim any skill never fired, since each appears in the activated skills data at least once",
      "max_score": 10
    },
    {
      "name": "rewrite-intro out-of-scope determination",
      "description": "The zero-activation analysis for 'rewrite-intro-paragraph' concludes it is out of scope (not a routing gap), supported by the high baseline score of 88% showing the agent handles this task well without the skill",
      "max_score": 12
    },
    {
      "name": "generate-bibliography routing gap determination",
      "description": "The zero-activation analysis for 'generate-bibliography' concludes it IS a routing gap — the agent struggles (31% baseline) and the skill isn't firing despite the task being within the tile's domain (bibliography/citations)",
      "max_score": 12
    },
    {
      "name": "fix-heading-hierarchy routing gap determination",
      "description": "The zero-activation analysis for 'fix-heading-hierarchy' concludes it IS a routing gap — the agent benefits from the skill (22% baseline → 67% with context, +45 delta) but the skill is not being activated",
      "max_score": 12
    },
    {
      "name": "citation-generator description rewrite",
      "description": "A proposed description rewrite for citation-generator is present that adds terminology covering bibliography or IEEE format (or reference lists) — expanding beyond APA, MLA, Chicago to cover the missed 'generate-bibliography' scenario",
      "max_score": 12
    },
    {
      "name": "markdown-formatter description rewrite",
      "description": "A proposed description rewrite for markdown-formatter is present that adds coverage for heading hierarchy correction, heading levels, or heading nesting — expanding beyond table alignment and code blocks",
      "max_score": 12
    },
    {
      "name": "Minimal rewrite principle",
      "description": "The proposed description rewrites add the missing trigger phrasing without completely replacing the existing description — the core original description language is preserved",
      "max_score": 8
    },
    {
      "name": "Rewrites presented together",
      "description": "All proposed description changes are presented together in a summary section (not scattered throughout the document), making it easy to review and approve them as a batch",
      "max_score": 8
    },
    {
      "name": "Scored eval data cited",
      "description": "The zero-activation analysis references specific numbers from the scored eval data (baseline scores and/or deltas) to support the routing gap vs out-of-scope determination",
      "max_score": 6
    }
  ]
}

evals

scenario-1

criteria.json

task.md

README.md

tile.json