Name: tessl/skill-optimizer
Rating: 89.16 (1 reviews)
Author: tessl

tessl/skill-optimizer

Optimize your skills and plugins: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

1.14x

Quality

90%

Does it follow best practices?

Impact

89%

1.14x

Average score across 29 eval scenarios

Securityby

Passed

No findings from the security scan

Evaluation results

100%

Activation Zero-Firing Diagnosis

Criteria

Baseline

With context

Distinguishes routing gap from out-of-scope

100%

Addresses never-fired skills

100%

API Integration Plugin: Eval Rubric Review

Criteria

Baseline

With context

All redundant criteria identified

100%

Options presented per criterion

100%

Useful criteria preserved

100%

Weight redistribution correct

100%

80% threshold applied

100%

Non-redundant scores unchanged

100%

Below-threshold excluded

100%

Removal option named explicitly

100%

98%

Approval-Gated Skill Change Proposal

Criteria

Baseline

With context

SKILL.md not modified

100%

Explicit approval request

100%

Trade-off discussion

100%

Risk assessment per recommendation

100%

Grouped presentation

100%

All key issues addressed

100%

Priority summary present

70%

80%

Current score per recommendation

100%

Bundle File Audit

Criteria

Baseline

With context

Lists all bundle files

100%

Identifies referenced files

100%

Identifies orphaned files

100%

TRANSACTIONS.md recommendation

100%

PERFORMANCE.md recommendation

100%

SECURITY.md recommendation

100%

LEGACY_EXAMPLES.md recommendation

100%

DRAFT_REPLICATION.md recommendation

100%

Bloat reduction framing

100%

Clear routing signals emphasis

100%

Link vs remove justification

100%

Code Review Plugin: Regression Investigation

Criteria

Baseline

With context

Contradicting clause identified

100%

Contradiction mechanism explained

100%

Remove/clarify approach taken

100%

Specific text targeted

100%

No compensating additions

100%

Other sections preserved

100%

Pre-review list intact

100%

Data Pipeline Plugin: Consistency Audit

Criteria

Baseline

With context

Retry count contradiction found

100%

Auth failure contradiction found

100%

All three files referenced

100%

File attribution per contradiction

100%

Auth contradiction despite scope

100%

Verbatim quotes included

100%

56%

Eval Kickoff Plan for invoice-processor Plugin

Criteria

Baseline

With context

Skill count detection command

21%

100%

Activation eval run first

100%

Scored eval follows activation

50%

100%

Routing-clean gate explained

100%

Skip activation condition stated

100%

Correct eval run command format

100%

--skip-forced-context-activation --skip-scoring flags used

100%

Plugin path used consistently

100%

78%

Eval Scenario Quality Review

Criteria

Baseline

With context

identifies_scenario_1_acceptable

detects_answer_leakage

90%

100%

explains_leakage_impact

80%

100%

detects_double_counting

85%

100%

detects_free_point_criterion

100%

proposes_specific_fixes

86%

no_false_positives_scenario_1

100%

39%

Expand Eval Coverage for shopify-connector Plugin

Criteria

Baseline

With context

Uses --strategy merge

100%

Does NOT use --strategy replace

100%

Correct base command

100%

Output directory specified

66%

100%

Verification step present

100%

Run ID or --last used

100%

Existing scenarios preserved

100%

17%

Model Benchmark Comparison Report

Criteria

Baseline

With context

Overall summary table

100%

Per-scenario breakdown

100%

Per-criterion table

100%

Correct symbol thresholds

100%

Baseline interpretation

100%

Universal Failure identified

80%

100%

Capability Gradient identified

80%

100%

Regression identified

100%

Fix before publish recommendation

62%

100%

eval-improve mentioned

100%

Re-run offer

100%

95%

Multi-Model Plugin Benchmark Automation

Criteria

Baseline

With context

Correct base command

100%

--agent flag format

90%

100%

All three default models

70%

100%

Sequential execution

100%

Run ID capture

100%

Model-to-ID mapping

100%

37%

Monitoring URL output

25%

100%

Polls with tessl eval view

100%

Retry on failure

100%

Waits for all to complete

100%

No --workspace flag

100%

47%

Multi-Skill Routing Collision Diagnosis

Criteria

Baseline

With context

Uses activation eval to surface collisions

16%

100%

Proposes description disambiguation

90%

100%

33%

Optimization Decision Point for pull-request-reviewer Plugin

Criteria

Baseline

With context

Regression identified

100%

Regression is highest priority

100%

High baseline warning present

100%

Scenario regeneration suggested

100%

Plugin is actively hurting

100%

Per-criterion regression analysis

100%

Correct prioritization order

100%

94%

Payments Plugin Eval Analysis

Criteria

Baseline

With context

Bucket A: idempotency key

100%

Bucket B: webhook signature

100%

Bucket C: HTTP status codes

100%

Bucket B: currency precision

100%

87%

Bucket D: API version pinning

100%

Bucket D highest priority

100%

Bucket B diagnosis present

100%

Bucket C action suggested

50%

Bucket A no-action

87%

100%

80% threshold applied

100%

67%

47%

Plugin Eval Readiness Checker

Criteria

Baseline

With context

Excludes .tessl cache

20%

.tessl/plugins warning

100%

Scenario existence check

60%

100%

Scenario generation guidance

100%

20%

100%

No --workspace flag

100%

Default model names

50%

Model subset confirmation

Time estimate provided

33%

100%

Run count option

100%

58%

Post-Fix Validation for database-migrator Plugin

Criteria

Baseline

With context

tessl plugin lint command used

100%

Plugin path argument provided

20%

100%

Lint run after each change set

33%

100%

Token cost ballooning flagged

85%

100%

Move to docs recommended

70%

100%

Docs vs rules distinction

28%

100%

Does NOT recommend rules for heavy content

100%

0%

-65%

Pre-Publish Skill Reachability Check

Criteria

Baseline

With context

Recommends activation eval first

40%

Defines pass/fail criteria

90%

89%

-1%

Progressive Disclosure Evaluation

Criteria

Baseline

With context

Identifies good references

100%

Explains why good

100%

Identifies poor references

100%

Explains why poor

100%

Token efficiency framing

100%

90%

Routing gate test

100%

Improves CONFIGURATION.md

100%

Improves GUIDE.md

100%

Improves EXAMPLES.md

100%

Improves ADVANCED.md or REFERENCE.md

100%

Questions blind split recommendation

100%

Routing Health Report for content-tools Plugin

Criteria

Baseline

With context

Routing table present

100%

Skill coverage summary correct

100%

rewrite-intro out-of-scope determination

100%

generate-bibliography routing gap determination

100%

fix-heading-hierarchy routing gap determination

100%

citation-generator description rewrite

100%

markdown-formatter description rewrite

100%

Minimal rewrite principle

100%

Rewrites presented together

100%

Scored eval data cited

100%

70%

Skill Bundle Validation

Criteria

Baseline

With context

Python via ast.parse

40%

Python error identified

100%

JavaScript via node --check

20%

Command flag validation

20%

File reference check

100%

Broken reference identified

100%

Validation before application

100%

Per-check pass/fail

100%

Fix suggestions

100%

90%

100%

12%

Skill Improvement Recommendations

Criteria

Baseline

With context

Critical issues first

100%

High before Medium/Low

100%

Summary with priorities

40%

100%

Expected improvement in summary

100%

Dimension score included

100%

Before/after examples

91%

100%

Impact stated per recommendation

50%

100%

Educational WHY included

100%

All four issues addressed

100%

Approval framing

90%

100%

Skill Length Reduction

Criteria

Baseline

With context

Linking over inlining

100%

Reference file identified

100%

Severity mappings removed

100%

Flag tables removed

100%

Template list removed

100%

SKILL.md substantially shorter

100%

Core examples preserved

100%

Before/after shown

100%

WHY explained

100%

REFERENCE.md not modified

100%

62%

42%

Skill Optimization Automation

Criteria

Baseline

With context

tessl skill review command

Review before changes

10%

20%

Review after changes

10%

20%

Validation before apply

100%

Python ast.parse validation

100%

node --check JS validation

100%

Command --help flag validation

12%

File reference validation

100%

Before/after score output

100%

Script accepts SKILL.md path

100%

Phases are ordered

25%

100%

Skill Optimization Results Report

Criteria

Baseline

With context

Overall before/after format

100%

Percentage delta shown

100%

Per-dimension breakdown

100%

Arrow notation or equivalent

100%

Dimension change labelled

100%

Dimensions impact explained

100%

Correct overall scores

100%

Completeness improvement noted

100%

Actionability improvement noted

100%

Conciseness unchanged noted

100%

Robustness improvement noted

100%

76%

Skill Post-Edit Quality Audit

Criteria

Baseline

With context

Code syntax check included

100%

Python syntax error found

100%

Command flags check included

100%

File references check included

100%

File reference passes

100%

Use when clause check included

Use when clause fails

Known concepts check included

90%

100%

Known concepts issue found

100%

Readiness summary

100%

99%

-1%

Skill Quality Improvement

Criteria

Baseline

With context

REFERENCE.md not recreated

100%

No REFERENCE.md changes proposed

100%

SKILL.md produced

100%

Use when clause added

100%

Inline duplication removed

100%

91%

REFERENCE.md linked

100%

Core examples retained

100%

SKILL.md shorter

100%

Change log documents SKILL.md changes

100%

Change log explains why

100%

Skill Score Maximization

Criteria

Baseline

With context

Completeness weight correct

100%

Conciseness weight correct

100%

Actionability weight correct

100%

Use when clause highest impact

100%

Use when quantified

100%

Revised description includes Use when

100%

Executable code recommended

100%

Known concepts flagged

60%

100%

High-impact first ordering

100%

Dimension coverage

100%

60%

12%

Verify Description Edits Still Route Correctly

Criteria

Baseline

With context

Points to activation eval as the fast check

16%

90%

Suggests before/after comparison

80%

30%

100%

Webhook Processor Plugin: Retry Reliability Fix

Criteria

Baseline

With context

Explicit retry intervals

100%

Rubric language used

100%

HMAC section unchanged

100%

TLS section unchanged

100%

Observability section unchanged

100%

Processing section unchanged

100%

Retry section only changed

100%

Concise addition

100%

Max retry count preserved

100%

Fast acknowledgement preserved

100%

Evaluated: about 1 month ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Approval-Gated Skill Change Proposal Skill Quality Improvement Skill Optimization Results Report Skill Post-Edit Quality Audit Skill Improvement Recommendations Progressive Disclosure Evaluation Skill Length Reduction Eval Scenario Quality Review Skill Score Maximization Skill Optimization Automation Webhook Processor Plugin: Retry Reliability Fix Payments Plugin Eval Analysis Multi-Skill Routing Collision Diagnosis API Integration Plugin: Eval Rubric Review Pre-Publish Skill Reachability Check Code Review Plugin: Regression Investigation Bundle File Audit Activation Zero-Firing Diagnosis Verify Description Edits Still Route Correctly Data Pipeline Plugin: Consistency Audit Skill Bundle Validation Routing Health Report for content-tools Plugin Eval Kickoff Plan for invoice-processor Plugin Optimization Decision Point for pull-request-reviewer Plugin Expand Eval Coverage for shopify-connector Plugin Post-Fix Validation for database-migrator Plugin Plugin Eval Readiness Checker Multi-Model Plugin Benchmark Automation Model Benchmark Comparison Report