CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-14/

{
  "context": "Tests whether the agent understands progressive disclosure as routing clarity rather than just splitting files. The SKILL.md contains 10 file references: 4 with clear routing signals (AUTHENTICATION.md, RETRIES.md, RATE_LIMITING.md, WEBHOOKS.md, ERROR_HANDLING.md) and 5 with ambiguous signals (CONFIGURATION.md, GUIDE.md, EXAMPLES.md, ADVANCED.md, REFERENCE.md). The agent should identify which references allow confident routing decisions vs force speculative opening.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Identifies good references",
      "description": "Correctly identifies at least 3 of the good references (AUTHENTICATION.md with OAuth2/token details, RETRIES.md with 5xx/timeout context, RATE_LIMITING.md with token bucket/429 details, ERROR_HANDLING.md with 4xx/5xx debugging, WEBHOOKS.md with signature verification)",
      "max_score": 15
    },
    {
      "name": "Explains why good",
      "description": "Explains that good references have clear WHEN signals - the link text tells the agent exactly when the content is relevant (e.g., 'when requests fail with 5xx', 'for OAuth2 flows and token refresh')",
      "max_score": 10
    },
    {
      "name": "Identifies poor references",
      "description": "Correctly identifies at least 3 of the poor references (CONFIGURATION.md 'for more information', GUIDE.md 'for additional details', EXAMPLES.md 'for examples', ADVANCED.md 'for advanced features', REFERENCE.md 'for the complete API reference')",
      "max_score": 15
    },
    {
      "name": "Explains why poor",
      "description": "Explains that poor references force speculative opening - the agent can't tell WHEN the content is relevant without reading it ('more information' about what? 'additional details' on which topic?)",
      "max_score": 10
    },
    {
      "name": "Token efficiency framing",
      "description": "Frames the problem in terms of token efficiency or wasted context - mentions that ambiguous links cause agents to open files 'just in case' or 'defensively', defeating the purpose of splitting content",
      "max_score": 10
    },
    {
      "name": "Routing gate test",
      "description": "Applies the 'can agent decide WITHOUT opening' test - explicitly asks or checks whether each link allows routing decisions based on link text alone",
      "max_score": 10
    },
    {
      "name": "Improves CONFIGURATION.md",
      "description": "Provides revised link text for CONFIGURATION.md that specifies what configuration details are covered (e.g., 'for timeout settings, retry limits, and connection pool sizing')",
      "max_score": 5
    },
    {
      "name": "Improves GUIDE.md",
      "description": "Provides revised link text for GUIDE.md that specifies what guidance is covered or identifies it as redundant/should be inlined",
      "max_score": 5
    },
    {
      "name": "Improves EXAMPLES.md",
      "description": "Provides revised link text for EXAMPLES.md that specifies what kinds of examples (e.g., 'for complete integration examples with Django, Flask, and FastAPI')",
      "max_score": 5
    },
    {
      "name": "Improves ADVANCED.md or REFERENCE.md",
      "description": "Provides revised link text for ADVANCED.md or REFERENCE.md with specific content signals, or suggests inlining if scope is too broad",
      "max_score": 5
    },
    {
      "name": "Questions blind split recommendation",
      "description": "Notes that the rubric rewards progressive disclosure but questions whether splitting helps if routing is unclear - suggests that inlining might be better in some cases when link text can't provide clear signals",
      "max_score": 10
    }
  ]
}

evals

README.md

tile.json