tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

1.07x

Quality

94%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

Securityby

Passed

No known issues

Payments Tile Eval Analysis

Name: tessl-labs/skill-optimizer
Rating: 88.64 (1 reviews)
Author: tessl-labs

Problem Description

Your team has been running evals on your payments-gateway tile for three months. A senior engineer is planning the next round of tile improvements and wants a clear picture of where the tile is actually earning its weight versus where it's redundant or causing problems.

Specifically, they want to know:

Which criteria is the tile genuinely helping with (agents perform much better with it than without)?
Which criteria are agents already handling on their own, even without the tile?
Which criteria should be top priorities because agents are doing worse with the tile than without it?
For anything problematic: what's a plausible explanation and where in the tile should someone look?

The engineer doesn't want raw scores — they want an actionable breakdown that tells them exactly what to do next.

Output Specification

Produce a file called analysis_report.md that:

Categorizes each criterion into a status group with a short label
For each criterion, shows its scores (baseline and with-tile) and the percentage of max it achieves
For criteria that need action, includes a short diagnosis (1-2 sentences on what's likely missing or wrong) and which file in the tile to look at
Ends with a prioritized list of recommended next steps

Input Files

The following files are provided as inputs. Extract them before beginning.

=============== FILE: inputs/eval_results.json =============== { "tile": "payments-gateway", "eval_id": "eval_2026_03_15_payments", "scenarios": [ { "name": "checkout-flow", "criteria": [ { "name": "Stripe idempotency key", "description": "Uses idempotency key in Stripe charge requests to prevent duplicate charges", "max_score": 15, "baseline_score": 2, "with_context_score": 13 }, { "name": "Webhook signature validation", "description": "Validates Stripe webhook signatures using the signing secret before processing events", "max_score": 10, "baseline_score": 1, "with_context_score": 5 }, { "name": "HTTP status code handling", "description": "Returns appropriate HTTP status codes (200, 400, 422, 500) in API responses", "max_score": 10, "baseline_score": 9, "with_context_score": 10 }, { "name": "Currency precision", "description": "Represents all currency values as integer cents rather than floating point dollars", "max_score": 15, "baseline_score": 3, "with_context_score": 7 }, { "name": "API version pinning", "description": "Pins the Stripe API version string (e.g. '2023-10-16') in all API requests", "max_score": 10, "baseline_score": 6, "with_context_score": 4 } ] } ] }

skills

README.md

tile.json

tessl-labs/skill-optimizer

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

Payments Tile Eval Analysis

Problem Description

Output Specification

Input Files

task.mdevals/scenario-2/