CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/best-practice-skill-improver

Eval-driven process for improving best-practice skills — analyse eval results, research what agents get wrong, rewrite for maximum uplift, and measure improvement with scenarios.

84

Quality

84%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Quality

Discovery

85%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured description that clearly articulates both capabilities and usage triggers. The main weakness is reliance on domain-specific terminology ('tessl', 'verifiers', 'eval-driven') which may not match natural user language. The explicit 'Use when...' clause and comprehensive action list are strong points.

Suggestions

Add more natural language trigger terms that users might say, such as 'make better', 'refine', 'enhance', or 'fix' alongside the existing technical terms

Consider adding common user phrasings like 'skill isn't working well' or 'improve skill performance' to capture troubleshooting scenarios

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'analysing eval results', 'identifying high-uplift practices', 'rewriting skills', 'updating verifiers', and 'measuring improvement with scenarios'.

3 / 3

Completeness

Clearly answers both what ('Eval-driven process for improving tessl skills...Covers analysing eval results, identifying high-uplift practices, rewriting skills, updating verifiers, and measuring improvement') AND when ('Use when asked to improve, optimize, or iterate on a tessl tile or skill, or when creating a new best-practice skill from scratch').

3 / 3

Trigger Term Quality

Includes some natural keywords like 'improve', 'optimize', 'iterate', 'skill', 'tile', but uses domain-specific jargon ('tessl', 'verifiers', 'eval-driven') that users may not naturally say. Missing common variations like 'make better', 'refine', 'enhance'.

2 / 3

Distinctiveness Conflict Risk

Clear niche focused specifically on 'tessl skills' and 'tessl tile' with distinct domain terminology. The combination of eval-driven improvement and tessl-specific context makes it unlikely to conflict with general skill improvement or other evaluation skills.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a high-quality, actionable skill for an eval-driven improvement workflow. Its greatest strengths are the clear phase-based structure with explicit exit criteria and the concrete CLI commands throughout. The main weakness is length—the document tries to be comprehensive which makes it token-heavy, and some content (like the extensive anti-patterns section and detailed leakage examples) could be moved to reference files.

Suggestions

Extract Phase 8 (Audit Scenario Quality) into a separate SCENARIO_QUALITY.md file and reference it—this section alone is ~100 lines and could stand alone as a reference

Move the anti-patterns section to a separate ANTI_PATTERNS.md file, keeping only a brief summary with a link in the main skill

Trim the 'good task formula' and 'proactive application' explanations—these are valuable but could be condensed to examples with one-line explanations rather than multi-paragraph prose

DimensionReasoningScore

Conciseness

The skill is comprehensive but could be tightened. Some sections like the anti-patterns list and the detailed explanations of leakage patterns are valuable but verbose. The skill assumes Claude's competence in most areas but occasionally over-explains concepts like what uplift means.

2 / 3

Actionability

Excellent actionability with specific CLI commands (tessl install, eval run, scenario generate), concrete code examples for verifier JSON structure, and copy-paste ready templates. Every phase has explicit commands and expected outputs.

3 / 3

Workflow Clarity

Outstanding workflow structure with 9 clearly sequenced phases, each with explicit goals and exit criteria. Validation checkpoints are built into the process (Phase 6 checks for regressions, Phase 8 audits scenario quality). The feedback loop of eval → diagnose → fix → re-eval is explicit throughout.

3 / 3

Progressive Disclosure

The skill is a monolithic document (~400 lines) that could benefit from splitting into separate files for each phase or topic area. While internally well-organized with clear headers, there are no references to external files for detailed content like the verifier JSON schema or scenario writing guidelines.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents