karpathy-guidelines

Behavioral guidelines to reduce common LLM coding mistakes. Use when writing, reviewing, or refactoring code to avoid overcomplication, make surgical changes, surface assumptions, and define verifiable success criteria.

1.14x

Quality

83%

Does it follow best practices?

Impact

92%

1.14x

Average score across 3 eval scenarios

Securityby

Passed

No findings from the security scan

Quality

Content

100%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is an excellent behavioral guideline skill that is concise, well-structured, and highly actionable. Each section follows a consistent pattern of a bold one-line principle followed by concrete, specific rules. The skill avoids common pitfalls like over-explaining concepts Claude already knows, and the transformation examples in Section 4 are particularly effective at making abstract advice concrete.

Dimension	Reasoning	Score
Conciseness	Every section is lean and purposeful. No unnecessary explanations of concepts Claude already knows. The guidelines are expressed as crisp imperatives with concrete examples of what to avoid, and the tradeoff note at the top is a single sentence.	3 / 3
Actionability	Despite being an instruction-only skill (no code libraries to demonstrate), the guidance is highly concrete: specific do/don't rules, concrete transformation examples ('Add validation' → 'Write tests for invalid inputs, then make them pass'), and a clear plan template. Each guideline has actionable checks rather than vague advice.	3 / 3
Workflow Clarity	Section 4 provides an explicit verify-after-each-step workflow pattern with a template. The overall skill is sequenced logically (think → simplify → change surgically → verify), and the 'trace every changed line to the user's request' test in Section 3 serves as a validation checkpoint for surgical changes.	3 / 3
Progressive Disclosure	For a standalone behavioral guideline skill under 50 lines with no need for external references, the content is well-organized into four clearly labeled sections with bold summaries. No monolithic walls of text, no unnecessary nesting. The structure is appropriate for the content's scope.	3 / 3
	Total	12 / 12 Passed

Description

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is reasonably well-structured with a clear 'Use when' clause and identifies its purpose as behavioral guidelines for reducing LLM coding mistakes. Its main weaknesses are that the capabilities listed are more like abstract principles than concrete actions, and the trigger terms are broad enough to potentially conflict with other coding-related skills. The LLM-specific framing provides some distinctiveness but could be sharper.

Suggestions

Add more natural trigger terms users would say, such as 'code quality', 'best practices', 'code review', 'clean code', or 'debugging' to improve discoverability.

Increase distinctiveness by clarifying what makes this different from general coding skills — e.g., specify that this is a meta-guideline/checklist skill rather than a skill that writes code directly.

Dimension	Reasoning	Score
Specificity	The description names the domain (coding) and lists some actions ('writing, reviewing, or refactoring code') along with goals ('avoid overcomplication, make surgical changes, surface assumptions, define verifiable success criteria'), but these are more like behavioral principles than concrete, discrete actions. It's between 2 and 3 — it names multiple things but they're guidelines/goals rather than specific executable actions like 'extract text' or 'fill forms'.	2 / 3
Completeness	The description clearly answers both 'what' (behavioral guidelines to reduce common LLM coding mistakes) and 'when' (Use when writing, reviewing, or refactoring code), with an explicit 'Use when...' clause and specific trigger scenarios.	3 / 3
Trigger Term Quality	Includes relevant terms like 'writing code', 'reviewing', 'refactoring', and 'coding mistakes', which users might naturally say. However, it misses common variations like 'debugging', 'code quality', 'best practices', 'code review', 'clean code', or 'technical debt'. The terms 'surface assumptions' and 'verifiable success criteria' are not phrases users would naturally use.	2 / 3
Distinctiveness Conflict Risk	The description targets a somewhat specific niche — behavioral guidelines for LLM coding quality — but 'writing, reviewing, or refactoring code' is extremely broad and could overlap with virtually any coding-related skill. The 'LLM coding mistakes' angle adds some distinctiveness, but the triggers are generic enough to conflict with general coding skills.	2 / 3
	Total	9 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: forrestchang/andrej-karpathy-skills
Path: skills/karpathy-guidelines/SKILL.md
Commit: 2c60614

Reviewed: about 16 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.