CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-10/

{
  "context": "Tests whether the agent runs all five Phase 8 final accuracy checks: code syntax validity, command flag correctness, file reference existence, 'Use when...' clause in description, and absence of concepts the agent already knows. The SKILL.md has a Python syntax error, a missing Use-when clause, and explanatory text about known concepts (HTTP/TCP).",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Code syntax check included",
      "description": "Report includes a dedicated check for code syntax validity (covering Python, JavaScript, or bash code blocks)",
      "max_score": 10
    },
    {
      "name": "Python syntax error found",
      "description": "Report correctly identifies the Python syntax error (missing closing parenthesis in `json.loads(result.stdout`)",
      "max_score": 15
    },
    {
      "name": "Command flags check included",
      "description": "Report includes a dedicated check for command flag correctness (validating logq or other command flags)",
      "max_score": 8
    },
    {
      "name": "File references check included",
      "description": "Report includes a dedicated check for whether all linked files exist",
      "max_score": 8
    },
    {
      "name": "File reference passes",
      "description": "Report correctly identifies that QUERY_PATTERNS.md reference passes (file exists in the bundle)",
      "max_score": 8
    },
    {
      "name": "Use when clause check included",
      "description": "Report includes a dedicated check for whether the description contains a 'Use when...' trigger clause",
      "max_score": 12
    },
    {
      "name": "Use when clause fails",
      "description": "Report correctly identifies that the description is MISSING a 'Use when...' clause (current description doesn't start with or include 'Use when')",
      "max_score": 12
    },
    {
      "name": "Known concepts check included",
      "description": "Report includes a dedicated check for whether the skill explains concepts that an agent already knows",
      "max_score": 10
    },
    {
      "name": "Known concepts issue found",
      "description": "Report flags the HTTP/TCP explanation ('HTTP is a stateless request/response protocol...') as content an agent already knows",
      "max_score": 12
    },
    {
      "name": "Readiness summary",
      "description": "Report ends with a clear summary of whether the skill is ready to publish or lists required fixes first",
      "max_score": 5
    }
  ]
}

evals

README.md

tile.json