Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
91
91%
Does it follow best practices?
Impact
92%
1.10xAverage score across 25 eval scenarios
Passed
No known issues
Routing table present
100%
100%
Skill coverage summary correct
100%
100%
rewrite-intro out-of-scope determination
100%
100%
generate-bibliography routing gap determination
100%
0%
fix-heading-hierarchy routing gap determination
100%
100%
citation-generator description rewrite
100%
0%
markdown-formatter description rewrite
100%
100%
Minimal rewrite principle
100%
100%
Rewrites presented together
100%
100%
Scored eval data cited
100%
100%
SKILL.md not modified
100%
100%
Explicit approval request
100%
100%
Trade-off discussion
100%
100%
Risk assessment per recommendation
100%
100%
Grouped presentation
100%
100%
All key issues addressed
100%
100%
Priority summary present
80%
100%
Current score per recommendation
70%
100%
Retry count contradiction found
100%
100%
Auth failure contradiction found
100%
100%
All three files referenced
100%
100%
File attribution per contradiction
100%
100%
Auth contradiction despite scope
100%
100%
Verbatim quotes included
100%
100%
Bucket A: idempotency key
100%
100%
Bucket B: webhook signature
100%
100%
Bucket C: HTTP status codes
100%
100%
Bucket B: currency precision
100%
100%
Bucket D: API version pinning
100%
100%
Bucket D highest priority
100%
100%
Bucket B diagnosis present
100%
100%
Bucket C action suggested
60%
100%
Bucket A no-action
75%
100%
80% threshold applied
100%
100%
Overall summary table
100%
100%
Per-scenario breakdown
100%
100%
Per-criterion table
100%
100%
Correct symbol thresholds
0%
100%
Baseline interpretation
100%
100%
Universal Failure identified
100%
100%
Capability Gradient identified
100%
100%
Regression identified
100%
100%
Fix before publish recommendation
100%
100%
eval-improve mentioned
100%
0%
Re-run offer
100%
100%
Skill count detection command
78%
100%
Activation eval run first
25%
100%
Scored eval follows activation
85%
100%
Routing-clean gate explained
91%
100%
Skip activation condition stated
100%
100%
Correct eval run command format
0%
100%
--solver=activation flag used
0%
100%
Tile path used consistently
100%
100%
REFERENCE.md not recreated
100%
100%
No REFERENCE.md changes proposed
100%
100%
SKILL.md produced
100%
100%
Use when clause added
100%
100%
Inline duplication removed
100%
100%
REFERENCE.md linked
100%
100%
Core examples retained
100%
100%
SKILL.md shorter
100%
100%
Change log documents SKILL.md changes
100%
100%
Change log explains why
100%
100%
Regression identified
100%
100%
Regression is highest priority
100%
100%
High baseline warning present
0%
100%
Scenario regeneration suggested
0%
100%
Tile is actively hurting
100%
100%
Per-criterion regression analysis
100%
100%
Correct prioritization order
100%
100%
Lists all bundle files
100%
100%
Identifies referenced files
100%
100%
Identifies orphaned files
100%
100%
TRANSACTIONS.md recommendation
100%
100%
PERFORMANCE.md recommendation
100%
100%
SECURITY.md recommendation
100%
100%
LEGACY_EXAMPLES.md recommendation
100%
100%
DRAFT_REPLICATION.md recommendation
100%
100%
Bloat reduction framing
60%
80%
Clear routing signals emphasis
100%
100%
Link vs remove justification
100%
100%
Python via ast.parse
53%
86%
Python error identified
100%
100%
JavaScript via node --check
33%
40%
Command flag validation
40%
60%
File reference check
100%
100%
Broken reference identified
100%
100%
Validation before application
100%
100%
Per-check pass/fail
100%
100%
Fix suggestions
100%
100%
Overall before/after format
100%
100%
Percentage delta shown
100%
100%
Per-dimension breakdown
100%
100%
Arrow notation or equivalent
100%
100%
Dimension change labelled
100%
100%
Dimensions impact explained
100%
100%
Correct overall scores
100%
100%
Completeness improvement noted
100%
100%
Actionability improvement noted
100%
100%
Conciseness unchanged noted
100%
100%
Robustness improvement noted
100%
100%
Code syntax check included
100%
100%
Python syntax error found
100%
100%
Command flags check included
100%
100%
File references check included
100%
100%
File reference passes
100%
100%
Use when clause check included
0%
100%
Use when clause fails
0%
100%
Known concepts check included
70%
100%
Known concepts issue found
83%
100%
Readiness summary
100%
100%
Excludes .tessl cache
0%
100%
.tessl/tiles warning
0%
100%
Scenario existence check
40%
100%
Scenario generation guidance
0%
100%
Login verification
20%
100%
No --workspace flag
100%
100%
Default model names
0%
100%
Model subset confirmation
0%
37%
Time estimate provided
33%
100%
Run count option
0%
0%
Critical issues first
100%
100%
High before Medium/Low
100%
100%
Summary with priorities
50%
100%
Expected improvement in summary
100%
100%
Dimension score included
80%
100%
Before/after examples
75%
100%
Impact stated per recommendation
12%
100%
Educational WHY included
100%
100%
All four issues addressed
100%
100%
Approval framing
40%
100%
Identifies good references
100%
100%
Explains why good
100%
100%
Identifies poor references
100%
80%
Explains why poor
100%
100%
Token efficiency framing
70%
100%
Routing gate test
100%
100%
Improves CONFIGURATION.md
100%
100%
Improves GUIDE.md
100%
100%
Improves EXAMPLES.md
100%
100%
Improves ADVANCED.md or REFERENCE.md
100%
100%
Questions blind split recommendation
0%
0%
Linking over inlining
100%
100%
Reference file identified
100%
100%
Severity mappings removed
100%
80%
Flag tables removed
100%
100%
Template list removed
100%
100%
SKILL.md substantially shorter
100%
100%
Core examples preserved
100%
100%
Before/after shown
100%
100%
WHY explained
100%
100%
REFERENCE.md not modified
100%
100%
All redundant criteria identified
100%
100%
Options presented per criterion
100%
100%
Useful criteria preserved
100%
100%
Weight redistribution correct
100%
100%
80% threshold applied
100%
100%
Non-redundant scores unchanged
100%
100%
Below-threshold excluded
100%
100%
Removal option named explicitly
100%
100%
Contradicting clause identified
100%
100%
Contradiction mechanism explained
100%
100%
Remove/clarify approach taken
100%
100%
Specific text targeted
100%
100%
No compensating additions
100%
100%
Other sections preserved
100%
100%
Pre-review list intact
100%
100%
Uses --strategy merge
100%
100%
Does NOT use --strategy replace
100%
100%
Correct base command
100%
100%
Output directory specified
100%
100%
Verification step present
100%
100%
Run ID or --last used
100%
100%
Existing scenarios preserved
50%
100%
identifies_scenario_1_acceptable
0%
0%
detects_answer_leakage
100%
95%
explains_leakage_impact
100%
80%
detects_double_counting
100%
90%
detects_free_point_criterion
100%
93%
proposes_specific_fixes
100%
86%
no_false_positives_scenario_1
0%
20%
Completeness weight correct
100%
100%
Conciseness weight correct
100%
100%
Actionability weight correct
100%
100%
Use when clause highest impact
100%
100%
Use when quantified
100%
100%
Revised description includes Use when
100%
100%
Executable code recommended
100%
100%
Known concepts flagged
90%
100%
High-impact first ordering
100%
100%
Dimension coverage
100%
100%
Correct base command
100%
100%
--agent flag format
20%
100%
All three default models
40%
100%
Sequential execution
100%
100%
Run ID capture
100%
100%
Model-to-ID mapping
100%
100%
Monitoring URL output
12%
100%
Polls with tessl eval view
100%
100%
Retry on failure
100%
100%
Waits for all to complete
100%
100%
No --workspace flag
100%
100%
Explicit retry intervals
100%
100%
Rubric language used
100%
100%
HMAC section unchanged
100%
100%
TLS section unchanged
100%
100%
Observability section unchanged
100%
100%
Processing section unchanged
100%
100%
Retry section only changed
100%
100%
Concise addition
100%
60%
Max retry count preserved
100%
100%
Fast acknowledgement preserved
100%
100%
tessl skill review command
0%
100%
Review before changes
0%
100%
Review after changes
0%
100%
Validation before apply
100%
100%
Python ast.parse validation
0%
0%
node --check JS validation
0%
0%
Command --help flag validation
0%
0%
File reference validation
0%
100%
Before/after score output
100%
100%
Script accepts SKILL.md path
100%
100%
Phases are ordered
100%
100%
tessl tile lint command used
0%
0%
Tile path argument provided
0%
50%
Lint run after each change set
33%
46%
Token cost ballooning flagged
100%
92%
Move to docs recommended
100%
95%
Docs vs rules distinction
78%
64%
Does NOT recommend rules for heavy content
100%
100%
Table of Contents