CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-7/

{
  "context": "Tests whether the agent correctly processes skill review output into prioritized, well-formatted recommendations following the skill-optimizer workflow. Covers priority ordering (Critical first), the required per-recommendation format, and the summary presentation format.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Critical issues first",
      "description": "Recommendations are ordered with Critical issues (ERRORs) presented before High, Medium, and Low priority items",
      "max_score": 12
    },
    {
      "name": "High before Medium/Low",
      "description": "High priority items (missing 'Use when...', low actionability) appear before Medium and Low items in the document",
      "max_score": 8
    },
    {
      "name": "Summary with priorities",
      "description": "Document includes a summary section that groups all recommendations by priority level (Critical/High/Medium/Low) before individual details",
      "max_score": 10
    },
    {
      "name": "Expected improvement in summary",
      "description": "Summary section includes expected overall score improvement (e.g. percentage gain estimate)",
      "max_score": 8
    },
    {
      "name": "Dimension score included",
      "description": "Each recommendation states the current score for the affected dimension (e.g. 'Completeness: 1/3')",
      "max_score": 10
    },
    {
      "name": "Before/after examples",
      "description": "Each recommendation includes a concrete before and after example showing the proposed text change",
      "max_score": 12
    },
    {
      "name": "Impact stated per recommendation",
      "description": "Each recommendation includes the expected score impact (e.g. '+15% overall')",
      "max_score": 8
    },
    {
      "name": "Educational WHY included",
      "description": "Each recommendation explains WHY the change improves the skill — not just what to change, but why it helps the dimension",
      "max_score": 12
    },
    {
      "name": "All four issues addressed",
      "description": "Recommendations cover all key issues from the review: missing Use-when clause, broken file reference, abstract instructions, and known-concepts bloat",
      "max_score": 10
    },
    {
      "name": "Approval framing",
      "description": "Document is framed as proposed changes for review/approval — not already applied changes",
      "max_score": 10
    }
  ]
}

evals

README.md

tile.json