Behavioral guidelines to reduce common LLM coding mistakes. Use when writing, reviewing, or refactoring code to avoid overcomplication, make surgical changes, surface assumptions, and define verifiable success criteria.
85
80%
Does it follow best practices?
Impact
92%
1.14xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/karpathy-guidelines/SKILL.mdQuality
Discovery
59%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has good structural completeness with an explicit 'Use when' clause and explains its purpose clearly. However, its main weakness is that the triggers are so broad (any code writing, reviewing, or refactoring) that it would conflict with almost every other coding skill. The capabilities listed are more like abstract principles than concrete actions, making it harder to distinguish from general coding assistance.
Suggestions
Narrow the trigger conditions to be more specific — e.g., 'Use when Claude is generating code and needs to follow best practices for minimal, focused changes' rather than triggering on all code writing/reviewing/refactoring.
Add distinctiveness by clarifying this is a meta-skill or behavioral overlay (e.g., 'Provides coding discipline guidelines that complement other coding skills') to reduce conflict with task-specific coding skills.
Make the capabilities more concrete — instead of 'surface assumptions', specify actions like 'flags implicit assumptions in requirements, requests clarification before implementing ambiguous features, limits changes to only affected lines'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a domain (LLM coding mistakes) and lists some actions ('avoid overcomplication, make surgical changes, surface assumptions, define verifiable success criteria'), but these are more like principles/guidelines than concrete discrete actions. They describe behavioral qualities rather than specific operations. | 2 / 3 |
Completeness | The description clearly answers both 'what' (behavioral guidelines to reduce common LLM coding mistakes) and 'when' (Use when writing, reviewing, or refactoring code), with an explicit 'Use when...' clause and additional context about the specific goals. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'writing', 'reviewing', 'refactoring code', but these are extremely broad and would match nearly any coding task. Missing more specific natural trigger terms that would help distinguish when this skill is uniquely needed versus general coding assistance. | 2 / 3 |
Distinctiveness Conflict Risk | This skill would trigger on virtually any coding task since 'writing, reviewing, or refactoring code' covers nearly all code-related requests. It would heavily conflict with any other coding-related skills, as its triggers are extremely broad and non-distinctive. | 1 / 3 |
Total | 8 / 12 Passed |
Implementation
100%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is an excellent behavioral guideline skill that is concise, actionable, and well-structured. Each section delivers clear, specific instructions with concrete examples and decision heuristics. The skill respects Claude's intelligence by stating rules directly without over-explaining, and the verification loop in section 4 provides a strong workflow pattern.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Every section is lean and purposeful. No unnecessary explanations of concepts Claude already knows. The guidelines are stated as direct imperatives with concrete examples of what to avoid, and the tradeoff note at the top is a single line. | 3 / 3 |
Actionability | Each guideline provides specific, concrete behavioral instructions rather than vague advice. Section 4 transforms abstract tasks into verifiable patterns with a concrete plan template. The 'ask yourself' heuristics and the 'the test' line in section 3 give clear decision criteria. | 3 / 3 |
Workflow Clarity | Section 4 explicitly defines a verification loop pattern (define criteria → execute → verify) with a concrete template. The overall structure follows a logical sequence: think first → keep it simple → make surgical changes → verify results. For a behavioral guideline skill (not a destructive/batch operation), this level of workflow clarity is appropriate and complete. | 3 / 3 |
Progressive Disclosure | For a skill under 50 lines that is self-contained behavioral guidance, the content is well-organized into four clearly labeled sections with bold summaries. No external references are needed, and the structure supports quick scanning. | 3 / 3 |
Total | 12 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
2c60614
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.