CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/skill-optimizer

Optimize your skills and plugins: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

85

1.06x
Quality

88%

Does it follow best practices?

Impact

85%

1.06x

Average score across 29 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Evaluation results

95%

-2%

Approval-Gated Skill Change Proposal

Criteria
Without context
With context

SKILL.md not modified

100%

100%

Explicit approval request

100%

100%

Trade-off discussion

100%

100%

Risk assessment per recommendation

100%

100%

Grouped presentation

100%

100%

All key issues addressed

100%

100%

Priority summary present

100%

100%

Current score per recommendation

70%

50%

84%

-16%

Skill Quality Improvement

Criteria
Without context
With context

REFERENCE.md not recreated

100%

100%

No REFERENCE.md changes proposed

100%

100%

SKILL.md produced

100%

100%

Use when clause added

100%

100%

Inline duplication removed

100%

33%

REFERENCE.md linked

100%

100%

Core examples retained

100%

100%

SKILL.md shorter

100%

0%

Change log documents SKILL.md changes

100%

100%

Change log explains why

100%

100%

100%

Skill Optimization Results Report

Criteria
Without context
With context

Overall before/after format

100%

100%

Percentage delta shown

100%

100%

Per-dimension breakdown

100%

100%

Arrow notation or equivalent

100%

100%

Dimension change labelled

100%

100%

Dimensions impact explained

100%

100%

Correct overall scores

100%

100%

Completeness improvement noted

100%

100%

Actionability improvement noted

100%

100%

Conciseness unchanged noted

100%

100%

Robustness improvement noted

100%

100%

63%

-8%

Skill Post-Edit Quality Audit

Criteria
Without context
With context

Code syntax check included

100%

100%

Python syntax error found

100%

100%

Command flags check included

100%

12%

File references check included

100%

100%

File reference passes

100%

100%

Use when clause check included

0%

0%

Use when clause fails

0%

0%

Known concepts check included

70%

60%

Known concepts issue found

83%

83%

Readiness summary

100%

100%

96%

7%

Skill Improvement Recommendations

Criteria
Without context
With context

Critical issues first

100%

100%

High before Medium/Low

100%

100%

Summary with priorities

50%

100%

Expected improvement in summary

100%

100%

Dimension score included

100%

100%

Before/after examples

83%

83%

Impact stated per recommendation

50%

100%

Educational WHY included

100%

100%

All four issues addressed

100%

100%

Approval framing

100%

80%

88%

8%

Progressive Disclosure Evaluation

Criteria
Without context
With context

Identifies good references

100%

100%

Explains why good

90%

100%

Identifies poor references

73%

100%

Explains why poor

90%

100%

Token efficiency framing

70%

80%

Routing gate test

90%

100%

Improves CONFIGURATION.md

100%

100%

Improves GUIDE.md

100%

100%

Improves EXAMPLES.md

100%

100%

Improves ADVANCED.md or REFERENCE.md

100%

100%

Questions blind split recommendation

0%

0%

100%

5%

Skill Length Reduction

Criteria
Without context
With context

Linking over inlining

100%

100%

Reference file identified

100%

100%

Severity mappings removed

100%

100%

Flag tables removed

100%

100%

Template list removed

100%

100%

SKILL.md substantially shorter

83%

100%

Core examples preserved

100%

100%

Before/after shown

100%

100%

WHY explained

70%

100%

REFERENCE.md not modified

100%

100%

41%

-43%

Eval Scenario Quality Review

Criteria
Without context
With context

identifies_scenario_1_acceptable

70%

100%

detects_answer_leakage

100%

0%

explains_leakage_impact

90%

0%

detects_double_counting

75%

80%

detects_free_point_criterion

100%

0%

proposes_specific_fixes

86%

33%

no_false_positives_scenario_1

50%

100%

100%

Skill Score Maximization

Criteria
Without context
With context

Completeness weight correct

100%

100%

Conciseness weight correct

100%

100%

Actionability weight correct

100%

100%

Use when clause highest impact

100%

100%

Use when quantified

100%

100%

Revised description includes Use when

100%

100%

Executable code recommended

100%

100%

Known concepts flagged

100%

100%

High-impact first ordering

100%

100%

Dimension coverage

100%

100%

100%

69%

Skill Optimization Automation

Criteria
Without context
With context

tessl skill review command

0%

100%

Review before changes

0%

100%

Review after changes

0%

100%

Validation before apply

100%

100%

Python ast.parse validation

0%

100%

node --check JS validation

0%

100%

Command --help flag validation

0%

100%

File reference validation

14%

100%

Before/after score output

100%

100%

Script accepts SKILL.md path

100%

100%

Phases are ordered

50%

100%

100%

Webhook Processor Plugin: Retry Reliability Fix

Criteria
Without context
With context

Explicit retry intervals

100%

100%

Rubric language used

100%

100%

HMAC section unchanged

100%

100%

TLS section unchanged

100%

100%

Observability section unchanged

100%

100%

Processing section unchanged

100%

100%

Retry section only changed

100%

100%

Concise addition

100%

100%

Max retry count preserved

100%

100%

Fast acknowledgement preserved

100%

100%

92%

Payments Plugin Eval Analysis

Criteria
Without context
With context

Bucket A: idempotency key

100%

100%

Bucket B: webhook signature

100%

100%

Bucket C: HTTP status codes

100%

100%

Bucket B: currency precision

100%

100%

Bucket D: API version pinning

100%

100%

Bucket D highest priority

100%

100%

Bucket B diagnosis present

100%

100%

Bucket C action suggested

20%

40%

Bucket A no-action

100%

100%

80% threshold applied

100%

80%

68%

23%

Multi-Skill Routing Collision Diagnosis

Criteria
Without context
With context

Uses activation eval to surface collisions

6%

40%

Proposes description disambiguation

84%

96%

100%

API Integration Plugin: Eval Rubric Review

Criteria
Without context
With context

All redundant criteria identified

100%

100%

Options presented per criterion

100%

100%

Useful criteria preserved

100%

100%

Weight redistribution correct

100%

100%

80% threshold applied

100%

100%

Non-redundant scores unchanged

100%

100%

Below-threshold excluded

100%

100%

Removal option named explicitly

100%

100%

77%

27%

Pre-Publish Skill Reachability Check

Criteria
Without context
With context

Recommends activation eval first

30%

90%

Defines pass/fail criteria

70%

64%

100%

Code Review Plugin: Regression Investigation

Criteria
Without context
With context

Contradicting clause identified

100%

100%

Contradiction mechanism explained

100%

100%

Remove/clarify approach taken

100%

100%

Specific text targeted

100%

100%

No compensating additions

100%

100%

Other sections preserved

100%

100%

Pre-review list intact

100%

100%

100%

Bundle File Audit

Criteria
Without context
With context

Lists all bundle files

100%

100%

Identifies referenced files

100%

100%

Identifies orphaned files

100%

100%

TRANSACTIONS.md recommendation

100%

100%

PERFORMANCE.md recommendation

100%

100%

SECURITY.md recommendation

100%

100%

LEGACY_EXAMPLES.md recommendation

100%

100%

DRAFT_REPLICATION.md recommendation

100%

100%

Bloat reduction framing

100%

100%

Clear routing signals emphasis

100%

100%

Link vs remove justification

100%

100%

76%

-24%

Activation Zero-Firing Diagnosis

Criteria
Without context
With context

Distinguishes routing gap from out-of-scope

100%

88%

Addresses never-fired skills

100%

64%

0%

-20%

Verify Description Edits Still Route Correctly

Criteria
Without context
With context

Points to activation eval as the fast check

0%

0%

Suggests before/after comparison

40%

0%

100%

Data Pipeline Plugin: Consistency Audit

Criteria
Without context
With context

Retry count contradiction found

100%

100%

Auth failure contradiction found

100%

100%

All three files referenced

100%

100%

File attribution per contradiction

100%

100%

Auth contradiction despite scope

100%

100%

Verbatim quotes included

100%

100%

87%

Skill Bundle Validation

Criteria
Without context
With context

Python via ast.parse

73%

80%

Python error identified

100%

100%

JavaScript via node --check

80%

86%

Command flag validation

40%

30%

File reference check

100%

100%

Broken reference identified

100%

100%

Validation before application

100%

100%

Per-check pass/fail

100%

100%

Fix suggestions

100%

90%

100%

Routing Health Report for content-tools Plugin

Criteria
Without context
With context

Routing table present

100%

100%

Skill coverage summary correct

100%

100%

rewrite-intro out-of-scope determination

100%

100%

generate-bibliography routing gap determination

100%

100%

fix-heading-hierarchy routing gap determination

100%

100%

citation-generator description rewrite

100%

100%

markdown-formatter description rewrite

100%

100%

Minimal rewrite principle

100%

100%

Rewrites presented together

100%

100%

Scored eval data cited

100%

100%

77%

36%

Eval Kickoff Plan for invoice-processor Plugin

Criteria
Without context
With context

Skill count detection command

35%

28%

Activation eval run first

0%

100%

Scored eval follows activation

42%

100%

Routing-clean gate explained

83%

100%

Skip activation condition stated

85%

7%

Correct eval run command format

0%

100%

--skip-forced-context-activation --skip-scoring flags used

0%

100%

Plugin path used consistently

100%

100%

67%

Optimization Decision Point for pull-request-reviewer Plugin

Criteria
Without context
With context

Regression identified

100%

100%

Regression is highest priority

100%

100%

High baseline warning present

0%

0%

Scenario regeneration suggested

0%

0%

Plugin is actively hurting

100%

100%

Per-criterion regression analysis

100%

100%

Correct prioritization order

100%

100%

100%

Expand Eval Coverage for shopify-connector Plugin

Criteria
Without context
With context

Uses --strategy merge

100%

100%

Does NOT use --strategy replace

100%

100%

Correct base command

100%

100%

Output directory specified

100%

100%

Verification step present

100%

100%

Run ID or --last used

100%

100%

Existing scenarios preserved

100%

100%

Pending

Post-Fix Validation for database-migrator Plugin

81%

67%

Plugin Eval Readiness Checker

Criteria
Without context
With context

Excludes .tessl cache

0%

80%

.tessl/plugins warning

0%

100%

Scenario existence check

20%

100%

Scenario generation guidance

0%

100%

Login verification

0%

100%

No --workspace flag

100%

100%

Default model names

0%

100%

Model subset confirmation

0%

37%

Time estimate provided

33%

100%

Run count option

0%

0%

100%

11%

Multi-Model Plugin Benchmark Automation

Criteria
Without context
With context

Correct base command

100%

100%

--agent flag format

100%

100%

All three default models

70%

100%

Sequential execution

100%

100%

Run ID capture

100%

100%

Model-to-ID mapping

87%

100%

Monitoring URL output

12%

100%

Polls with tessl eval view

100%

100%

Retry on failure

100%

100%

Waits for all to complete

100%

100%

No --workspace flag

100%

100%

100%

9%

Model Benchmark Comparison Report

Criteria
Without context
With context

Overall summary table

100%

100%

Per-scenario breakdown

100%

100%

Per-criterion table

100%

100%

Correct symbol thresholds

30%

100%

Baseline interpretation

100%

100%

Universal Failure identified

100%

100%

Capability Gradient identified

100%

100%

Regression identified

100%

100%

Fix before publish recommendation

100%

100%

eval-improve mentioned

100%

100%

Re-run offer

80%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6