Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly applies the setup-skill-performance skill's commit selection criteria: hard-skip gates to eliminate trivial/docs/config/generated commits, the 7 complexity signals to score survivors, and a final recommendation of only the structurally complex commits.",
"type": "weighted_checklist",
"checklist": [
{
"name": "skips_trivial_commits",
"description": "Commits 1 (rename, 1 file 0 lines), 4 (utility, 2 files 40 lines) are rejected by hard-skip gates: fewer than 3 source files changed or fewer than 50 lines of source code changed",
"max_score": 15
},
{
"name": "skips_docs_and_config_only",
"description": "Commit 2 (README + CONTRIBUTING) is rejected as docs-only and commit 3 (package.json + lock file) is rejected as config/generated-only",
"max_score": 10
},
{
"name": "skips_mechanical_generated_commit",
"description": "Commit 7 (198-line SQL migration in 1 file) is rejected despite its large line count — it is a single auto-generated migration file with no structural complexity",
"max_score": 10
},
{
"name": "scores_payment_commit_high",
"description": "Commit 5 (payment processing) receives a high complexity score (4+/7), recognizing signals such as: new abstractions (PaymentRequest types, StripeClient), cross-cutting scope (routes, middleware, services, webhooks), wiring/registration (route + middleware + webhook handler integration), and domain-specific logic (payment flows, idempotency)",
"max_score": 15
},
{
"name": "scores_auth_refactor_highest",
"description": "Commit 6 (auth system refactor) receives the highest complexity score among all commits (5+/7), recognizing it hits more signals than commit 5 due to: refactoring existing code (not just adding new), migrating a data store, multiple interdependent changes, and cross-cutting scope across 8 files in auth/config/routes",
"max_score": 10
},
{
"name": "references_complexity_signals",
"description": "The analysis explicitly uses at least 5 of the 7 named complexity signals: new abstractions, cross-cutting scope, wiring and registration, non-obvious control flow, domain-specific logic, multiple interdependent changes, no single-point solution",
"max_score": 15
},
{
"name": "recommends_two_or_three_commits",
"description": "Final recommendation includes 2 or 3 commits (not 1, not 4+). The selected set must include commits 5 and 6; commit 4 is acceptable as a borderline third pick only if accompanied by a caveat about its lower complexity",
"max_score": 10
},
{
"name": "explains_selection_rationale",
"description": "For each recommended commit, provides a specific explanation of why it would produce a challenging eval scenario — not just restating the complexity score but explaining what about the commit would be hard for an agent to reproduce without codebase context",
"max_score": 15
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions