Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
88
85%
Does it follow best practices?
Impact
88%
1.87xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates what the skill does (create, modify, evaluate, and optimize skills) and when to use it (with explicit trigger scenarios). It uses third-person voice, includes natural trigger terms, and occupies a distinct niche that minimizes conflict risk with other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Create new skills', 'modify and improve existing skills', 'measure skill performance', 'run evals', 'benchmark skill performance with variance analysis', 'optimize a skill's description for better triggering accuracy'. | 3 / 3 |
Completeness | Clearly answers both 'what' (create, modify, improve, measure skills) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios like creating from scratch, editing, running evals, benchmarking, and optimizing descriptions. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'create a skill', 'edit', 'optimize', 'evals', 'benchmark', 'skill performance', 'triggering accuracy', 'description'. These cover a good range of terms a user would naturally use when wanting to work with skills. | 3 / 3 |
Distinctiveness Conflict Risk | The meta-skill domain (creating/editing/evaluating skills themselves) is a clear niche that is unlikely to conflict with other skills. Terms like 'skill', 'evals', 'triggering accuracy', and 'variance analysis' are highly distinctive to this particular capability. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a comprehensive, highly actionable skill with excellent workflow clarity and progressive disclosure. Its main weakness is significant verbosity — conversational asides, repeated statements of the core loop (3 times), explanations of things Claude already knows, and stylistic padding that inflates the token count substantially. The content would benefit greatly from aggressive trimming while preserving its strong actionable core.
Suggestions
Remove the three redundant restatements of the core loop (intro, end of 'Running and evaluating test cases', and the final section) — state it once clearly at the top
Cut conversational filler ('Cool? Cool.', 'Good luck!', 'Sorry in advance but I'm gonna go all caps here') and the extended discussion about user communication styles — these waste tokens without adding actionable guidance
Consolidate the Claude.ai-specific and Cowork-specific sections into a single 'Environment Adaptations' table or compact section rather than repeating modified versions of the same instructions
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~500+ lines with significant conversational padding ('Cool? Cool.', 'Good luck!'), repeated instructions (the core loop is stated 3 times), explanations of concepts Claude knows (what JSON is, how subagents work), and lengthy asides about user communication style. Much of this could be cut without losing actionability. | 1 / 3 |
Actionability | The skill provides highly concrete, executable guidance throughout — specific CLI commands, exact JSON schemas, file path conventions, script invocations with arguments, and detailed step-by-step procedures. Code examples are copy-paste ready and the workflow is thoroughly specified. | 3 / 3 |
Workflow Clarity | The multi-step workflow is clearly sequenced with explicit steps (Step 1 through Step 5), validation checkpoints (grading, user review via viewer, feedback loops), error recovery patterns (iteration loops), and clear sequencing of parallel operations. The feedback loop of draft → test → review → improve is well-defined with explicit validation at each stage. | 3 / 3 |
Progressive Disclosure | The skill effectively uses progressive disclosure with clear references to external files (agents/grader.md, agents/comparator.md, agents/analyzer.md, references/schemas.md) that are one level deep and well-signaled with context about when to read them. The directory structure is clearly documented and bundled resources are appropriately separated. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
0f7c287
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.