Build new Claude skills from scratch or supercharge existing ones through rigorous evaluation and iterative improvement. Use when the user wants to create, build, improve, evaluate, audit, enhance, benchmark, test, or package a skill. Also trigger for "turn this into a skill", "make this reusable", "I keep repeating this workflow", or references to SKILL.md, skill frontmatter, description optimization, or skill packaging. Do NOT use for general coding tasks, document creation, or other non-skill workflows. Even if the user just says "skill" in the context of Claude capabilities, this is likely the right skill to load.
94
92%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong, well-crafted description that excels across all dimensions. It provides specific capabilities, extensive natural trigger terms, explicit 'Use when' and 'Do NOT use' clauses, and clear boundaries that distinguish it from general coding or document skills. The inclusion of quoted user phrases like 'turn this into a skill' and 'I keep repeating this workflow' is particularly effective for matching real user intent.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete actions: build new skills, improve existing ones through evaluation, iterative improvement, benchmarking, testing, packaging. Also specifies what NOT to use it for, adding further specificity. | 3 / 3 |
Completeness | Clearly answers both 'what' (build new skills, improve existing ones through evaluation and iteration) and 'when' (explicit 'Use when...' clause with extensive trigger terms, plus a 'Do NOT use' clause for disambiguation). Both dimensions are thoroughly addressed. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms: 'create', 'build', 'improve', 'evaluate', 'audit', 'enhance', 'benchmark', 'test', 'package', 'turn this into a skill', 'make this reusable', 'I keep repeating this workflow', 'SKILL.md', 'skill frontmatter', 'description optimization', 'skill packaging'. These are terms users would naturally say. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche around skill creation/improvement. The explicit 'Do NOT use for general coding tasks, document creation, or other non-skill workflows' clause actively reduces conflict risk with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted meta-skill with strong actionability and excellent progressive disclosure. The two-mode structure (Create/Enhance) is clearly delineated with concrete phases, validation checkpoints, and specific tooling references. The main weakness is moderate verbosity — some principles are repeated across sections, and the Improvement Philosophy section partially duplicates guidance already given in the phases — but the content density is generally high enough to justify its length.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is lengthy (~300+ lines) but most content is structural and actionable. Some sections could be tightened — the Improvement Philosophy section restates points already made earlier (e.g., description importance, gotchas), and some guidance like 'explain why over MUST' appears multiple times. However, it avoids explaining concepts Claude already knows and stays focused on novel workflow instructions. | 2 / 3 |
Actionability | Highly actionable throughout: concrete CLI commands (python scripts/aggregate_benchmark.py, python scripts/package_skill.py), specific JSON schemas for eval files, exact file paths for references, clear phase-by-phase instructions with specific deliverables at each step. The test case design guidance includes concrete categories and a complete JSON example. | 3 / 3 |
Workflow Clarity | Both Create and Enhance modes follow clearly numbered phases with explicit sequencing. Phase 5 (Run & Evaluate) has a particularly well-structured 5-step workflow with validation checkpoints ('Wait for user feedback before making changes'), feedback loops ('fix and re-validate'), and explicit ordering ('Launch everything at once', 'Don't wait idle'). The iterate phase includes clear stop conditions. | 3 / 3 |
Progressive Disclosure | Excellent progressive disclosure: SKILL.md serves as the orchestration overview, with detailed content delegated to clearly-signaled reference files (references/skill-anatomy.md, references/writing-guide.md, etc.). The reference tables at the bottom provide clear navigation with 'When to read' guidance. Agent files, scripts, and hooks are each in their own well-organized tables. No deeply nested references. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
95142b6
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.