Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
Retry count contradiction found
100%
100%
Auth failure contradiction found
100%
100%
All three files referenced
100%
100%
File attribution per contradiction
100%
100%
Auth contradiction despite scope
100%
100%
Verbatim quotes included
100%
100%
Bucket A: idempotency key
100%
100%
Bucket B: webhook signature
100%
100%
Bucket C: HTTP status codes
100%
100%
Bucket B: currency precision
100%
100%
Bucket D: API version pinning
100%
100%
Bucket D highest priority
100%
100%
Bucket B diagnosis present
100%
100%
Bucket C action suggested
70%
100%
Bucket A no-action
75%
100%
80% threshold applied
100%
100%
All redundant criteria identified
100%
100%
Options presented per criterion
100%
100%
Useful criteria preserved
100%
100%
Weight redistribution correct
100%
100%
80% threshold applied
100%
100%
Non-redundant scores unchanged
100%
100%
Below-threshold excluded
100%
100%
Removal option named explicitly
100%
100%
Contradicting clause identified
100%
100%
Contradiction mechanism explained
100%
100%
Remove/clarify approach taken
100%
100%
Specific text targeted
100%
100%
No compensating additions
100%
100%
Other sections preserved
100%
100%
Pre-review list intact
100%
100%
Explicit retry intervals
100%
100%
Rubric language used
100%
100%
HMAC section unchanged
100%
100%
TLS section unchanged
100%
100%
Observability section unchanged
100%
100%
Processing section unchanged
100%
100%
Retry section only changed
100%
100%
Concise addition
0%
100%
Max retry count preserved
100%
100%
Fast acknowledgement preserved
100%
100%
Phase 4 syntax and reference validation
Python via ast.parse
100%
100%
Python error identified
100%
100%
JavaScript via node --check
0%
100%
Command flag validation
40%
20%
File reference check
100%
100%
Broken reference identified
100%
100%
Validation before application
100%
100%
Per-check pass/fail
100%
100%
Fix suggestions
100%
100%
Prioritized recommendation generation from review output
Critical issues first
100%
100%
High before Medium/Low
100%
100%
Summary with priorities
20%
100%
Expected improvement in summary
100%
75%
Dimension score included
80%
100%
Before/after examples
75%
75%
Impact stated per recommendation
25%
62%
Educational WHY included
100%
100%
All four issues addressed
100%
100%
Approval framing
70%
80%
Progressive disclosure via reference file linking
Linking over inlining
100%
100%
Reference file identified
100%
100%
Severity mappings removed
100%
100%
Flag tables removed
60%
100%
Template list removed
100%
100%
SKILL.md substantially shorter
25%
100%
Core examples preserved
100%
100%
Before/after shown
100%
100%
WHY explained
90%
100%
REFERENCE.md not modified
100%
100%
Phase 7 before/after score comparison
Overall before/after format
100%
100%
Percentage delta shown
100%
100%
Per-dimension breakdown
100%
100%
Arrow notation or equivalent
100%
100%
Dimension change labelled
100%
100%
Dimensions impact explained
100%
100%
Correct overall scores
100%
100%
Completeness improvement noted
100%
100%
Actionability improvement noted
100%
100%
Conciseness unchanged noted
100%
100%
Robustness improvement noted
100%
100%
Phase 8 final accuracy check (5 criteria)
Code syntax check included
100%
100%
Python syntax error found
100%
100%
Command flags check included
100%
100%
File references check included
100%
100%
File reference passes
100%
100%
Use when clause check included
0%
0%
Use when clause fails
0%
0%
Known concepts check included
80%
100%
Known concepts issue found
100%
100%
Readiness summary
100%
100%
tessl skill review command and workflow scripting
tessl skill review command
0%
100%
Review before changes
0%
100%
Review after changes
0%
100%
Validation before apply
0%
100%
Python ast.parse validation
0%
100%
node --check JS validation
0%
100%
Command --help flag validation
0%
0%
File reference validation
0%
100%
Before/after score output
75%
100%
Script accepts SKILL.md path
100%
100%
Phases are ordered
37%
100%
Only modify SKILL.md, not other bundle files
REFERENCE.md not recreated
100%
100%
No REFERENCE.md changes proposed
100%
100%
SKILL.md produced
100%
100%
Use when clause added
100%
100%
Inline duplication removed
100%
100%
REFERENCE.md linked
100%
100%
Core examples retained
100%
100%
SKILL.md shorter
100%
100%
Change log documents SKILL.md changes
100%
100%
Change log explains why
100%
100%
Sequential multi-model eval execution
Correct base command
100%
100%
--agent flag format
80%
100%
All three default models
70%
100%
Sequential execution
100%
100%
Run ID capture
100%
100%
Model-to-ID mapping
50%
100%
Monitoring URL output
25%
100%
Polls with tessl eval view
100%
100%
Retry on failure
100%
100%
Waits for all to complete
100%
100%
No --workspace flag
100%
100%
Identifies good references
93%
100%
Explains why good
100%
100%
Identifies poor references
80%
100%
Explains why poor
100%
100%
Token efficiency framing
70%
70%
Routing gate test
90%
80%
Improves CONFIGURATION.md
100%
100%
Improves GUIDE.md
100%
100%
Improves EXAMPLES.md
100%
100%
Improves ADVANCED.md or REFERENCE.md
100%
100%
Questions blind split recommendation
0%
0%
Lists all bundle files
100%
100%
Identifies referenced files
100%
100%
Identifies orphaned files
100%
100%
TRANSACTIONS.md recommendation
100%
100%
PERFORMANCE.md recommendation
100%
100%
SECURITY.md recommendation
100%
100%
LEGACY_EXAMPLES.md recommendation
100%
100%
DRAFT_REPLICATION.md recommendation
100%
100%
Bloat reduction framing
40%
100%
Clear routing signals emphasis
80%
100%
Link vs remove justification
100%
100%
SKILL.md not modified
100%
100%
Explicit approval request
100%
100%
Trade-off discussion
100%
100%
Risk assessment per recommendation
100%
100%
Grouped presentation
100%
100%
All key issues addressed
100%
100%
Priority summary present
100%
80%
Current score per recommendation
30%
100%
identifies_mdc_files
100%
100%
identifies_claude_md
100%
100%
identifies_agents_md
100%
100%
identifies_tessl_json
100%
100%
excludes_tessl_cache
100%
100%
excludes_generic_docs
100%
100%
excludes_source_and_build_config
100%
100%
constructs_valid_context_flag
100%
100%
does_not_use_last_only
16%
0%
finds_generation_ids
60%
0%
downloads_each_separately
40%
0%
explains_why
0%
0%
Overall summary table
100%
100%
Per-scenario breakdown
100%
100%
Per-criterion table
100%
100%
Correct symbol thresholds
0%
100%
Baseline interpretation
100%
100%
Universal Failure identified
80%
100%
Capability Gradient identified
80%
100%
Regression identified
100%
100%
Fix before publish recommendation
100%
100%
eval-improve mentioned
100%
0%
Re-run offer
80%
100%
Excludes .tessl cache
0%
80%
.tessl/tiles warning
0%
90%
Scenario existence check
70%
100%
Scenario generation guidance
0%
100%
Login verification
10%
100%
No --workspace flag
100%
100%
Default model names
0%
100%
Model subset confirmation
0%
50%
Time estimate provided
41%
100%
Run count option
0%
0%
identifies_scenario_1_acceptable
100%
100%
detects_answer_leakage
100%
80%
explains_leakage_impact
100%
100%
detects_double_counting
100%
100%
detects_free_point_criterion
100%
100%
proposes_specific_fixes
100%
100%
no_false_positives_scenario_1
100%
100%
Completeness weight correct
100%
100%
Conciseness weight correct
100%
100%
Actionability weight correct
100%
100%
Use when clause highest impact
100%
100%
Use when quantified
100%
100%
Revised description includes Use when
100%
100%
Executable code recommended
100%
100%
Known concepts flagged
100%
100%
High-impact first ordering
100%
100%
Dimension coverage
100%
100%
skips_trivial_commits
100%
100%
skips_docs_and_config_only
100%
100%
skips_mechanical_generated_commit
100%
100%
scores_payment_commit_high
100%
100%
scores_auth_refactor_highest
80%
70%
references_complexity_signals
100%
100%
recommends_two_or_three_commits
100%
100%
explains_selection_rationale
100%
100%
checks_prerequisites
100%
100%
browses_commits
0%
0%
auto_detects_context_files
0%
0%
uses_context_flag
58%
0%
workspace_in_eval_run
0%
0%
explains_baseline_vs_context
100%
100%
Table of Contents