CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/skill-optimizer

Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.

88

1.07x
Quality

93%

Does it follow best practices?

Impact

88%

1.07x

Average score across 24 eval scenarios

SecuritybySnyk

Passed

No known issues

This plugin was archived by the owner on May 19, 2026

Reason: Tile archived: Superceded by tessl/skill-optimizer - go to https://tessl.io/registry/tessl/skill-optimizer

Overview
Quality
Evals
Security
Files

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted skill description that excels across all dimensions. It provides specific concrete actions, includes natural trigger terms users would actually say, explicitly addresses both what and when, and carves out a distinct niche around evaluation debugging and iteration workflows.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'Run task evals, analyze results, diagnose failures, apply targeted fixes, and re-run to verify improvements.' These are clear, actionable capabilities.

3 / 3

Completeness

Clearly answers both what (run evals, analyze, diagnose, fix, re-run) AND when with explicit 'Use when...' clause covering debugging scores, fixing failures, improving content, and iterating on test results.

3 / 3

Trigger Term Quality

Includes natural keywords users would say: 'evaluation scores', 'failing', 'regressed criteria', 'eval run', 'agent performance test results', 'debugging'. Good coverage of domain-specific terms.

3 / 3

Distinctiveness Conflict Risk

Clear niche focused on task evaluation debugging and iteration. Terms like 'eval run', 'regressed criteria', 'agent performance test results' are distinct and unlikely to conflict with general debugging or testing skills.

3 / 3

Total

12

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill with excellent workflow clarity and validation checkpoints throughout the improvement cycle. The main weakness is length—at ~300 lines, some content could be more concise or split into reference files. The explicit bucket classification system and before/after reporting patterns are particularly strong.

Suggestions

Tighten the bucket definitions in Phase 1.2—the explanations after each bullet are somewhat redundant with the classification logic itself

Consider moving Phase 5 (Scenario Quality Review) to a separate reference file since it's marked as 'Bonus' and adds significant length

DimensionReasoningScore

Conciseness

The skill is comprehensive but includes some redundant explanations (e.g., explaining what each bucket means multiple times, verbose example outputs). Some sections could be tightened without losing clarity, though it avoids explaining concepts Claude already knows.

2 / 3

Actionability

Provides fully executable bash commands throughout, specific classification criteria with exact thresholds (>=80%), concrete example outputs, and copy-paste ready code blocks. The guidance is precise and immediately usable.

3 / 3

Workflow Clarity

Excellent multi-phase workflow with clear sequencing (Phases 0-5), explicit validation checkpoints (lint after each fix, poll for completion, before/after comparison), and feedback loops (re-run and verify cycle, 'Want me to take another pass?').

3 / 3

Progressive Disclosure

Content is well-organized with clear phase headers, but the entire workflow is in one monolithic file. Advanced content like scenario quality review could be split to a separate reference file. The skill does reference a companion skill appropriately.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

Table of Contents