skill-creator

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch for Claude Code or Cursor, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A thorough, highly actionable skill body with clear sequenced workflows and explicit validation gates, but it is over-long for the budget it itself prescribes and contains several dangling references to bundle paths (agents/*.md, eval-viewer/generate_review.py) that are absent from the actual bundle.

Suggestions

Fix dangling references: the bundle has no agents/ directory, so 'agents/grader.md', 'agents/comparator.md', and 'agents/analyzer.md' should either be added under references/ (or wherever they live) or the paths corrected; likewise replace 'eval-viewer/generate_review.py' with the actual 'scripts/generate_report.py'.

Trim conversational filler to respect the token budget the skill itself advocates — remove 'Cool? Cool.', the plumbers/parents anecdote, the 'billions a year in economic value' aside, 'Good luck!', and the final verbatim re-statement of the core loop already covered above.

Consolidate the repeated core-loop summaries (opening list, mid-body restatements, and closing 'Repeating one more time' section) into a single concise statement to reduce redundancy.

Dimension	Reasoning	Score
Conciseness	The ~480-line body is mostly efficient and assumes Claude's competence (no explaining what libraries/PDFs are), but carries notable conversational padding — 'Cool? Cool.', the 'plumbers opening terminals / parents googling npm' anecdote, '(we are trying to create billions a year in economic value here!)', 'Good luck!', and a verbatim re-statement of the core loop at the end. Not a 3 because these tokens do not earn their place; not a 1 because it never explains basic concepts Claude already knows.	2 / 3
Actionability	Provides copy-paste-ready, fully specified commands — 'python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>', 'nohup python <skill-creator-path>/eval-viewer/generate_review.py ... --static <output_path>', 'python -m scripts.run_loop --eval-set ... --max-iterations 5' — plus exact JSON shapes and exact field names ('text', 'passed', 'evidence'). Not a 2 because the guidance is concrete and executable, not pseudocode or abstract.	3 / 3
Workflow Clarity	The eval/iteration workflows are clearly sequenced (Step 1–5: spawn runs, draft assertions, capture timing, grade, aggregate+launch viewer) with explicit gating checkpoints — 'Do NOT generate the viewer or benchmark until grading.json exists for every run', 'always run the grader first', and a repeat loop with explicit stop conditions. Not a 2 because validation checkpoints and feedback loops are explicit rather than implicit.	3 / 3
Progressive Disclosure	Content is split across real, one-level-deep, clearly signaled files (references/schemas.md, assets/eval_review.html, scripts/*), but the body repeatedly points at paths that do not exist in the bundle — 'agents/grader.md', 'agents/comparator.md', 'agents/analyzer.md' (no agents/ directory) and 'eval-viewer/generate_review.py' (the actual file is scripts/generate_report.py). Not a 3 because following this navigation would fail; not a 1 because genuine structure exists and several referenced files do resolve.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong, third-person description that explicitly pairs concrete capabilities with an explicit 'Use when' trigger clause and good coverage of natural user phrasings. Minor technical jargon ('variance analysis', 'triggering accuracy') slightly tempers trigger-term naturalness but does not undermine overall clarity.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions — 'Create new skills, modify and improve existing skills, and measure skill performance', 'run evals', 'benchmark skill performance with variance analysis', 'optimize a skill's description' — rather than vague language. Not a 2 because the action set is comprehensive and concrete, not just a domain plus a few actions.	3 / 3
Completeness	Explicitly answers both 'what' (create/modify/improve/measure skills) and 'when' via an explicit 'Use when users want to...' clause. Not a 2 because the trigger guidance is explicit, not merely implied.	3 / 3
Trigger Term Quality	The 'Use when' clause covers natural user phrasings — 'create a skill from scratch for Claude Code or Cursor', 'update or optimize an existing skill', 'run evals to test a skill', 'benchmark skill performance'. Not a 2 because it spans multiple natural variations a user would actually say, though minor jargon ('variance analysis', 'triggering accuracy') keeps it from being flawless.	3 / 3
Distinctiveness Conflict Risk	Occupies a clear niche — skill meta-tooling (authoring, evaluating, optimizing other skills) — with distinct triggers unlikely to overlap with domain skills. Not a 2 because its scope (skills-about-skills) is unambiguous and unlikely to fire for the wrong skill.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository: cognitedata/builder-skills
Commit: ab7b5f8

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.