Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
{
"context": "Testing whether an agent following the setup-skill-performance skill correctly guides a new user through the full eval setup pipeline, including prerequisites, commit browsing, context file detection, scenario generation, and running the eval.",
"type": "weighted_checklist",
"checklist": [
{
"name": "checks_prerequisites",
"description": "The agent verifies the user is logged in (e.g., runs tessl whoami) before proceeding with setup steps.",
"max_score": 8
},
{
"name": "browses_commits",
"description": "The agent runs `tessl repo select-commits acme/backend` to show actual commits from the repo, rather than asking the user to supply commit hashes directly without any browsing step.",
"max_score": 25
},
{
"name": "auto_detects_context_files",
"description": "The agent searches the repository for context files (CLAUDE.md, *.mdc, AGENTS.md, tessl.json, etc.) automatically — rather than asking the user to specify them without any investigation.",
"max_score": 17
},
{
"name": "uses_context_flag",
"description": "The agent includes a `--context` flag when running `tessl scenario generate`, specifying appropriate glob patterns for the detected context files.",
"max_score": 17
},
{
"name": "workspace_in_eval_run",
"description": "For project evals (codebase evaluations using git commits), the agent includes `--workspace=<name>` when running `tessl eval run`. Omitting --workspace would cause the command to fail.",
"max_score": 17
},
{
"name": "explains_baseline_vs_context",
"description": "The agent explains that each scenario runs twice — once without context files (baseline) and once with them injected — and that the delta shows whether CLAUDE.md is helping the agent.",
"max_score": 16
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions