Guide for creating effective skills. This command should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations. Use when creating new skills, editing existing skills, or verifying skills work before deployment - applies TDD to process documentation by testing with subagents before writing, iterating until bulletproof against rationalization
69
62%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/customaize-agent/skills/create-skill/SKILL.mdThis command provides guidance for creating effective skills.
Writing skills IS Test-Driven Development applied to process documentation.
Personal skills live in agent-specific directories (~/.claude/skills for Claude Code, ~/.codex/skills for Codex)
You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
REQUIRED BACKGROUND: You MUST understand Test-Driven Development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
Official guidance: The Anthropic's official skill authoring best practices provided at the /apply-anthropic-skill-best-practices command, they enhance prompt-engineering skill. Use skill and the document, as they not copy but add to each other. These document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
Skills are modular, self-contained packages that extend Claude's capabilities by providing specialized knowledge, workflows, and tools. Think of them as "onboarding guides" for specific domains or tasks—they transform Claude from a general-purpose agent into a specialized agent equipped with procedural knowledge that no model can fully possess.
A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are NOT: Narratives about how you solved a problem once
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill (baseline) |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario BEFORE writing skill |
| Watch it fail | Document exact rationalizations agent uses |
| Minimal code | Write skill addressing those specific violations |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
The entire skill creation process follows RED-GREEN-REFACTOR.
Create when:
Don't create for:
Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
Way of thinking about problems (flatten-with-flags, test-invariants)
API docs, syntax guides, tool documentation (office docs)
skills/
skill-name/
SKILL.md # Main reference (required)
supporting-file.* # Only if neededFlat namespace - all skills in one searchable namespace
Separate files for:
Keep inline:
Every skill consists of a required SKILL.md file and optional bundled resources:
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter metadata (required)
│ │ ├── name: (required)
│ │ └── description: (required)
│ └── Markdown instructions (required)
└── Bundled Resources (optional)
├── scripts/ - Executable code (Python/Bash/etc.)
├── references/ - Documentation intended to be loaded into context as needed
└── assets/ - Files used in output (templates, icons, fonts, etc.)Metadata Quality: The name and description in YAML frontmatter determine when Claude will use the skill. Be specific about what the skill does and when to use it. Use the third-person (e.g. "This skill should be used when..." instead of "Use this skill when...").
Frontmatter (YAML):
name and descriptionname: Use letters, numbers, and hyphens only (no parentheses, special chars)description: Third-person, includes BOTH what it does AND when to use it
---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms] - [what the skill does and how it helps, written in third person]
---
# Skill Name
## Overview
What is this? Core principle in 1-2 sentences.
## When to Use
[Small inline flowchart IF decision non-obvious]
Bullet list with SYMPTOMS and use cases
When NOT to use
## Core Pattern (for techniques/patterns)
Before/after code comparison
## Quick Reference
Table or bullets for scanning common operations
## Implementation
Inline code for simple patterns
Link to file for heavy reference or reusable tools
## Common Mistakes
What goes wrong + fixes
## Real-World Impact (optional)
Concrete resultsscripts/)Executable code (Python/Bash/etc.) for tasks that require deterministic reliability or are repeatedly rewritten.
scripts/rotate_pdf.py for PDF rotation tasksreferences/)Documentation and reference material intended to be loaded as needed into context to inform Claude's process and thinking.
references/finance.md for financial schemas, references/mnda.md for company NDA template, references/policies.md for company policies, references/api_docs.md for API specificationsassets/)Files not intended to be loaded into context, but rather used within the output Claude produces.
assets/logo.png for brand assets, assets/slides.pptx for PowerPoint templates, assets/frontend-template/ for HTML/React boilerplate, assets/font.ttf for typographySkills use a three-level loading system to manage context efficiently:
*Unlimited because scripts can be executed without reading into context window.
Critical for discovery: Future Claude needs to FIND your skill
Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
Format: Start with "Use when..." to focus on triggering conditions, then explain what it does
Content:
# ❌ BAD: Too abstract, vague, doesn't include when to use
description: For async testing
# ❌ BAD: First person
description: I can help you with async tests when they're flaky
# ❌ BAD: Mentions technology but skill isn't specific to it
description: Use when tests use setTimeout/sleep and are flaky
# ✅ GOOD: Starts with "Use when", describes problem, then what it does
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
# ✅ GOOD: Technology-specific skill with explicit trigger
description: Use when using React Router and handling authentication redirects - provides patterns for protected routes and auth state managementUse words Claude would search for:
Use active voice, verb-first:
creating-skills not skill-creationtesting-skills-with-subagents not subagent-skill-testingProblem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
Target word counts:
Techniques:
Move details to tool help:
# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.Use cross-references:
# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.Compress examples:
# ❌ BAD: Verbose example (42 words)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
# ✅ GOOD: Minimal example (20 words)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]Eliminate redundancy:
Verification:
wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 totalName by what you DO or core insight:
condition-based-waiting > async-test-helpersusing-skills not skill-usageflatten-with-flags > data-structure-refactoringroot-cause-tracing > debugging-techniquesGerunds (-ing) work well for processes:
creating-skills, testing-skills, debugging-with-logsWhen writing documentation that references other skills:
Use skill name only, with explicit requirement markers:
**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debuggingSee skills/testing/test-driven-development (unclear if required)@skills/testing/test-driven-development/SKILL.md (force-loads, burns context)Why no @ links: @ syntax force-loads files immediately, consuming 200k+ context before you need them.
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}Use flowcharts ONLY for:
Never use flowcharts for:
See graphviz-conventions.dot for graphviz style rules.
One excellent example beats many mediocre ones
Choose most relevant language:
Good example:
Don't:
You're good at porting - one great example is enough.
defense-in-depth/
SKILL.md # Everything inlineWhen: All content fits, no heavy reference needed
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adaptWhen: Tool is reusable code, not just narrative
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable toolsWhen: Reference material too large for inline
Different skill types need different test approaches:
Examples: TDD, verification-before-completion, designing-before-coding
Test with:
Success criteria: Agent follows rule under maximum pressure
Examples: condition-based-waiting, root-cause-tracing, defensive-programming
Test with:
Success criteria: Agent successfully applies technique to new scenario
Examples: reducing-complexity, information-hiding concepts
Test with:
Success criteria: Agent correctly identifies when/how to apply pattern
Examples: API documentation, command references, library guides
Test with:
Success criteria: Agent finds and correctly applies reference information
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
Don't just state the rule - forbid specific workarounds:
<Bad> ```markdown Write code before test? Delete it. ``` </Bad> <Good> ```markdown Write code before test? Delete it. Start over.No exceptions:
</Good>
### Address "Spirit vs Letter" Arguments
Add foundational principle early:
```markdown
**Violating the letter of the rules is violating the spirit of the rules.**This cuts off entire class of "I'm following the spirit" rationalizations.
Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |Make it easy for agents to self-check when rationalizing:
## Red Flags - STOP and Start Over
- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**Add to description: symptoms of when you're ABOUT to violate the rule:
description: use when implementing any feature or bugfix, before writing implementation codeFollow the TDD cycle:
Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
Run same scenarios WITH skill. Agent should now comply.
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
REQUIRED SUB-SKILL: Use superpowers:testing-skills-with-subagents for the complete testing methodology:
"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable
example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden
step1 [label="import fs"];
step2 [label="read file"];Why bad: Can't copy-paste, hard to read
helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning
After writing ANY skill, you MUST STOP and complete the deployment process.
Do NOT:
The deployment checklist below is MANDATORY for EACH skill.
Deploying untested skills = deploying untested code. It's a violation of quality standards.
IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.
RED Phase - Write Failing Test:
GREEN Phase - Write Minimal Skill:
REFACTOR Phase - Close Loopholes:
Quality Checks:
Deployment:
How future Claude finds your skill:
Optimize for this flow - put searchable terms early and often.
Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.
If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
To create a skill, follow the "Skill Creation Process" in order, skipping steps only if there is a clear reason why they are not applicable.
Skip this step only when the skill's usage patterns are already clearly understood. It remains valuable even when working with an existing skill.
To create an effective skill, clearly understand concrete examples of how the skill will be used. This understanding can come from either direct user examples or generated examples that are validated with user feedback.
For example, when building an image-editor skill, relevant questions include:
To avoid overwhelming users, avoid asking too many questions in a single message. Start with the most important questions and follow up as needed for better effectiveness.
Conclude this step when there is a clear sense of the functionality the skill should support.
To turn concrete examples into an effective skill, analyze each example by:
Example: When building a pdf-editor skill to handle queries like "Help me rotate this PDF," the analysis shows:
scripts/rotate_pdf.py script would be helpful to store in the skillExample: When designing a frontend-webapp-builder skill for queries like "Build me a todo app" or "Build me a dashboard to track my steps," the analysis shows:
assets/hello-world/ template containing the boilerplate HTML/React project files would be helpful to store in the skillExample: When building a big-query skill to handle queries like "How many users have logged in today?" the analysis shows:
references/schema.md file documenting the table schemas would be helpful to store in the skillTo establish the skill's contents, analyze each concrete example to create a list of the reusable resources to include: scripts, references, and assets.
Skip this step if the skill already exists and only needs iteration.
Create the skill directory and required files:
mkdir -p skills/<skill-name>SKILL.md with YAML frontmatter:---
name: skill-name
description: Use when [triggering conditions] - [what the skill does]
---
# Skill Name
## Critical Guidlines
[Core principle in 1-2 sentences. Each start wit "You MUST ..."]
## How to Use
[Think in steps, use problem decomposition, etc.]
## Guide
[Procedures, patterns]
## Examples
[Examples of how to use the skill, include agent input and output]
## Troubleshooting
[Common mistakes and how to avoid them]
## Resources
[Scripts, references, assets]scripts/ — reusable executable codereferences/ — documentation loaded on demandassets/ — files used in output (templates, images)When editing the (newly-generated or existing) skill, remember that the skill is being created for another instance of Claude to use. Focus on including information that would be beneficial and non-obvious to Claude. Consider what procedural knowledge, domain-specific details, or reusable assets would help another Claude instance execute these tasks more effectively.
To begin implementation, start with the reusable resources identified above: scripts/, references/, and assets/ files. Note that this step may require user input. For example, when implementing a brand-guidelines skill, the user may need to provide brand assets or templates to store in assets/, or documentation to store in references/.
Remove any resource subdirectories not needed for the skill. Most skills need only SKILL.md.
Writing Style: Write the entire skill using imperative/infinitive form (verb-first instructions), not second person. Use objective, instructional language (e.g., "To accomplish X, do Y" rather than "You should do X" or "If you need to do X"). This maintains consistency and clarity for AI consumption.
To complete SKILL.md, answer the following questions:
Before deploying, verify the skill meets requirements:
name and description (max 1024 chars total)SKILL.md exists at skills/<skill-name>/SKILL.mdAfter testing the skill, users may request improvements. Often this happens right after using the skill, with fresh context of how the skill performed.
Iteration workflow:
dedca19
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.