Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
91
91%
Does it follow best practices?
Impact
92%
1.10xAverage score across 25 eval scenarios
Passed
No known issues
The content-tools tile has three skills: markdown-formatter, citation-generator, and link-checker. The team recently ran a quick eval run to check whether each skill is getting activated when it should be, and the results are concerning. Several scenarios produced no skill activation at all, and the team isn't sure whether those are gaps in how the skills describe themselves or whether the scenarios are simply out of scope for the tile.
The tile has also had scored evals run previously, so there's comparison data available to help distinguish routing problems from out-of-scope tasks. The team wants a complete routing health report they can act on, including concrete suggestions for fixing any description mismatches they find.
Produce a file called routing-health-report.md containing:
– if none fired)–), an analysis of whether it represents a routing gap or an out-of-scope task for this tileThe following files are provided as inputs. Extract them before beginning.
=============== FILE: inputs/activation-results.json =============== { "eval_run_id": "act-run-881", "solver": "activation", "tile": "content-tools", "scenarios": [ { "name": "format-markdown-table", "task_summary": "Convert a CSV dataset into a well-formatted Markdown table with aligned columns", "activated_skills": ["markdown-formatter"] }, { "name": "add-apa-citations", "task_summary": "Add APA-style citations to a research summary document", "activated_skills": ["citation-generator"] }, { "name": "audit-broken-links", "task_summary": "Scan a docs site and report any broken hyperlinks or anchor references", "activated_skills": ["link-checker"] }, { "name": "rewrite-intro-paragraph", "task_summary": "Rewrite the introduction paragraph of a blog post for clarity and conciseness", "activated_skills": [] }, { "name": "generate-bibliography", "task_summary": "Produce a bibliography from a list of academic sources in IEEE format", "activated_skills": [] }, { "name": "fix-heading-hierarchy", "task_summary": "Correct the heading levels in a Markdown document so H1 through H4 nest properly", "activated_skills": [] } ] }
=============== FILE: inputs/scored-eval-results.json =============== { "eval_run_id": "scored-run-445", "solver": "default", "tile": "content-tools", "scenarios": [ { "name": "rewrite-intro-paragraph", "baseline_score_pct": 88, "with_context_score_pct": 87, "delta": -1 }, { "name": "generate-bibliography", "baseline_score_pct": 31, "with_context_score_pct": 29, "delta": -2 }, { "name": "fix-heading-hierarchy", "baseline_score_pct": 22, "with_context_score_pct": 67, "delta": 45 } ] }
Applies consistent Markdown style: table alignment, heading normalization, code block language tagging, and list indentation.
Produces correctly formatted citations from DOI, URL, or manual metadata input.
Scans documents for hyperlinks and reports broken or unreachable ones.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions