CtrlK
BlogDocsLog inGet started
Tessl Logo

skill-creator

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch for Claude Code or Cursor, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

68

Quality

81%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that clearly articulates specific capabilities (creating, modifying, evaluating, and benchmarking skills) and provides explicit trigger guidance via a well-structured 'Use when...' clause. It uses third-person voice consistently, includes natural trigger terms users would employ, and occupies a distinct niche that minimizes conflict risk with other skills.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'Create new skills', 'modify and improve existing skills', 'measure skill performance', 'run evals to test a skill', 'benchmark skill performance with variance analysis', 'optimize a skill's description for better triggering accuracy'.

3 / 3

Completeness

Clearly answers both 'what' (create, modify, measure, benchmark, optimize skills) and 'when' with an explicit 'Use when...' clause listing multiple specific trigger scenarios.

3 / 3

Trigger Term Quality

Includes strong natural keywords users would say: 'create a skill', 'Claude Code', 'Cursor', 'update or optimize', 'run evals', 'benchmark', 'variance analysis', 'triggering accuracy', 'skill description'. These cover a good range of terms a user working with skills would naturally use.

3 / 3

Distinctiveness Conflict Risk

The description carves out a very clear niche around skill creation, modification, evaluation, and benchmarking specifically for Claude Code and Cursor. This is a distinct domain unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides an impressively thorough and actionable workflow for creating, testing, and iterating on skills, with concrete commands, JSON schemas, and clear step sequencing. However, it is severely undermined by excessive verbosity — conversational asides, motivational commentary, explanations of obvious concepts, and philosophical tangents inflate the token cost significantly. The content would benefit greatly from aggressive trimming and splitting large sections into referenced files.

Suggestions

Cut all conversational filler ('Cool? Cool.', 'Good luck!', motivational asides about billions in economic value) and meta-commentary about communication style — these waste tokens without adding actionable guidance.

Move the 'Communicating with the user' section, the 'How to think about improvements' philosophy section, and the 'Cursor-Specific Instructions' into separate reference files to keep SKILL.md under 300 lines.

Remove the repeated summary of the core loop at the end — it's already clear from the structure and adds ~15 lines of redundancy.

Tighten the 'How skill triggering works' explanation — Claude doesn't need a paragraph explaining that agents decide whether to consult skills; just state the implication for eval query design.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~400+ lines with significant conversational padding ('Cool? Cool.'), unnecessary meta-commentary about communication style, extensive explanations of concepts Claude already knows (what JSON is, how to think about improvements), and motivational asides ('we are trying to create billions a year in economic value here!'). Much of this could be cut without losing actionable content.

1 / 3

Actionability

Despite the verbosity, the skill provides highly concrete, executable guidance: specific CLI commands for running scripts, exact JSON schemas for eval files and feedback, precise directory structures, copy-paste ready code blocks for spawning runs, grading, aggregation, and launching the viewer. The workflow is thoroughly specified with real commands.

3 / 3

Workflow Clarity

The multi-step workflow is clearly sequenced with explicit numbered steps, validation checkpoints (grade before generating viewer, validate assertions exist before proceeding), feedback loops (iterate until user is satisfied), and clear error recovery patterns. The iteration loop is well-defined with explicit stopping criteria.

3 / 3

Progressive Disclosure

The skill references external files appropriately (agents/grader.md, agents/comparator.md, agents/analyzer.md, references/schemas.md) with clear guidance on when to read them. However, the SKILL.md body itself is monolithic and contains substantial content that could be split into reference files — the description optimization section, Cursor-specific instructions, and the detailed improvement philosophy could all be separate documents to keep the main file leaner.

2 / 3

Total

9

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
cognitedata/builder-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.