CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-22/

{
  "context": "Tests whether the agent correctly identifies and applies the three key scoring dimensions with their weights (Completeness 40%, Conciseness 30%, Actionability 30%), correctly identifies the missing 'Use when...' clause as the highest-impact fix, recommends executable code over abstract prose, and flags known-concept content that reduces conciseness.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Completeness weight correct",
      "description": "Plan states that Completeness has the highest weight (40%) among the dimensions",
      "max_score": 10
    },
    {
      "name": "Conciseness weight correct",
      "description": "Plan states Conciseness is 30% weight",
      "max_score": 8
    },
    {
      "name": "Actionability weight correct",
      "description": "Plan states Actionability is 30% weight",
      "max_score": 8
    },
    {
      "name": "Use when clause highest impact",
      "description": "Plan identifies adding a 'Use when...' clause as the highest-priority or highest-impact change",
      "max_score": 15
    },
    {
      "name": "Use when quantified",
      "description": "Plan states that the missing 'Use when...' clause adds approximately +15% to overall score (or is flagged as the most valuable single change)",
      "max_score": 10
    },
    {
      "name": "Revised description includes Use when",
      "description": "The revised description block in the output includes a 'Use when...' clause (e.g. 'Use when you need to retrieve, rotate, or audit secrets...')",
      "max_score": 12
    },
    {
      "name": "Executable code recommended",
      "description": "Plan recommends replacing abstract prose instructions with executable code examples (for actionability)",
      "max_score": 12
    },
    {
      "name": "Known concepts flagged",
      "description": "Plan identifies that the existing prose explains things an agent already knows (e.g. 'don't hardcode secrets', 'use environment variables') as reducing Conciseness",
      "max_score": 10
    },
    {
      "name": "High-impact first ordering",
      "description": "Quick wins are ordered so the highest-impact item (Use when clause) appears first",
      "max_score": 10
    },
    {
      "name": "Dimension coverage",
      "description": "Plan covers all three main scoring dimensions (Completeness, Conciseness, Actionability) with separate assessments",
      "max_score": 5
    }
  ]
}

evals

README.md

tile.json