Name: tessl-labs/skill-optimizer
Rating: 88.64 (1 reviews)
Author: tessl-labs

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

1.07x

Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent correctly classifies eval criteria into four distinct performance buckets using the 80% threshold rules, and whether the analysis includes the right diagnosis, priority signals, and action recommendations for each bucket type.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Bucket A: idempotency key",
      "description": "The 'Stripe idempotency key' criterion (13/15 = 87% with-context, significantly above 2/15 baseline) is classified as working well / no action needed",
      "max_score": 8
    },
    {
      "name": "Bucket B: webhook signature",
      "description": "The 'Webhook signature validation' criterion (5/10 = 50% with-context, low baseline 1/10) is classified as a tile gap that needs a fix",
      "max_score": 8
    },
    {
      "name": "Bucket C: HTTP status codes",
      "description": "The 'HTTP status code handling' criterion (9/10 baseline already high, >= 80% without tile) is classified as redundant / agents already know this",
      "max_score": 8
    },
    {
      "name": "Bucket B: currency precision",
      "description": "The 'Currency precision' criterion (7/15 = 47% with-context, low baseline 3/15) is classified as a tile gap",
      "max_score": 8
    },
    {
      "name": "Bucket D: API version pinning",
      "description": "The 'API version pinning' criterion (with-context 4/10 is LOWER than baseline 6/10) is classified as a regression",
      "max_score": 15
    },
    {
      "name": "Bucket D highest priority",
      "description": "The regression criterion (API version pinning) is explicitly marked as highest priority or most urgent to fix",
      "max_score": 10
    },
    {
      "name": "Bucket B diagnosis present",
      "description": "Both Bucket B criteria include a diagnosis (what the tile is likely missing) and a reference to which tile file should be updated",
      "max_score": 15
    },
    {
      "name": "Bucket C action suggested",
      "description": "The Bucket C criterion (HTTP status codes) is flagged for potential removal from criteria.json or for making the eval task harder",
      "max_score": 10
    },
    {
      "name": "Bucket A no-action",
      "description": "The Bucket A criterion (idempotency key) is noted as leave alone / no action needed / tile's strength",
      "max_score": 8
    },
    {
      "name": "80% threshold applied",
      "description": "Classifications are consistent with the 80% of max threshold (e.g., 87% passes Bucket A; 50%, 47% fall into Bucket B; 90% baseline triggers Bucket C)",
      "max_score": 10
    }
  ]
}

skills

README.md

tile.json

tessl-labs/skill-optimizer

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/