CtrlK
BlogDocsLog inGet started
Tessl Logo

skill-creator

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

90

1.87x
Quality

Does it follow best practices?

Impact

88%

1.87x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A dense, highly actionable skill body with an excellent sequenced eval workflow, held back by chatty filler that inflates token use and by file references that don't match the actual bundle structure. Tightening prose and fixing the broken paths would lift both weak dimensions.

Suggestions

Remove the editorial asides (e.g. "Cool? Cool.", the plumbers/grandparents anecdote, and the "billions a year in economic value" line) and consolidate the three restatements of the core loop into one, to cut tokens without losing guidance.

Fix the broken file paths: the viewer generator lives at scripts/generate_review.py (not eval-viewer/generate_review.py), and the agents/grader.md, agents/comparator.md, and agents/analyzer.md references point to an agents/ directory that is not in the bundle — either add the files or correct the paths.

Split the large monolithic body — e.g. move the platform-specific (Claude.ai / Cowork) instructions and the Description Optimization workflow into reference files — to get comfortably under the 500-line target and improve navigation.

DimensionReasoningScore

Conciseness

Mostly efficient and actionable, but padded with chatty editorializing that earns no information (e.g. "Cool? Cool.", the "plumbers... grandparents googling how to install npm" anecdote, and "we are trying to create billions a year in economic value here!") and the core loop is restated three times, keeping it short of lean.

2 / 3

Actionability

Concrete executable commands (e.g. `python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>`), exact JSON structures, and explicit field-name requirements ("must use the fields `text`, `passed`, and `evidence`... the viewer depends on these exact field names") make the guidance copy-paste ready.

3 / 3

Workflow Clarity

The eval process is a clearly numbered Step 1–5 sequence with explicit checkpoints (spawn all runs at once, capture timing as notifications arrive, grade → aggregate → analyst pass → viewer) and a feedback loop (read feedback.json, focus on complaints, iterate until satisfied).

3 / 3

Progressive Disclosure

Sections are well-organized and real references are one level deep (references/schemas.md, assets/eval_review.html), but the ~485-line body is near its own 500-line cap and points to paths absent from the bundle (eval-viewer/generate_review.py is actually scripts/generate_review.py; the agents/ directory with grader.md/comparator.md/analyzer.md does not exist), which breaks navigation.

2 / 3

Total

10

/

12

Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong, third-person description that states concrete capabilities and provides an explicit, well-coverage "Use when" trigger clause covering creation, editing, optimization, evals, and benchmarking. It cleanly answers both what and when with minimal fluff.

DimensionReasoningScore

Specificity

Lists multiple concrete actions — "Create new skills, modify and improve existing skills, and measure skill performance" plus "run evals... benchmark skill performance... optimize a skill's description" — matching the multi-action anchor rather than the single-domain anchor below.

3 / 3

Completeness

Explicitly answers both what ("Create new skills, modify and improve existing skills, and measure skill performance") and when via a clear "Use when users want to..." clause with enumerated triggers, rather than leaving the when implied.

3 / 3

Trigger Term Quality

Good coverage of natural phrasings a user would say ("create a skill from scratch", "edit", "optimize an existing skill", "test a skill", "benchmark"), with only mild jargon ("variance analysis", "triggering accuracy") that does not outweigh the natural terms.

3 / 3

Distinctiveness Conflict Risk

Occupies a clear niche (skill authoring, eval-running, benchmarking, and description optimization) with specific triggers unlikely to fire for unrelated skills, rather than the generic "Helps with code and documents" profile.

3 / 3

Total

12

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository
netlify/context-and-tools
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.