Search-first troubleshooting with a diagnostic phase — use when an error, bug, or unexpected behaviour is reported.
75
75%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
72%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description excels at trigger term coverage and completeness by clearly stating when to use the skill and providing explicit trigger keywords. However, it is weak on specificity—it describes a methodology ('search-first troubleshooting with diagnostic phase') rather than concrete actions the skill performs. Adding specific capabilities would make this significantly stronger.
Suggestions
Add specific concrete actions the skill performs, e.g., 'Analyzes error messages, searches codebase for root causes, checks logs, suggests fixes, and verifies solutions.'
Clarify what 'search-first troubleshooting with diagnostic phase' means in practice—list the diagnostic steps or outputs the skill provides.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description does not list any concrete actions or capabilities. It mentions 'search-first troubleshooting with diagnostic phase' which is a vague methodology description rather than specific actions like 'analyze stack traces, check logs, identify root causes'. | 1 / 3 |
Completeness | It explicitly answers both 'what' (search-first troubleshooting with diagnostic phase) and 'when' (when user reports an error, bug, or something not working), with explicit trigger terms listed. The 'what' is thin but present, and the 'when' is clearly stated. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would actually say: 'debug, error, broken, not working, failing, crash, exception'. These are highly natural and cover common variations of how users report problems. | 3 / 3 |
Distinctiveness Conflict Risk | The debugging/troubleshooting domain is fairly broad and could overlap with language-specific debugging skills, testing skills, or code review skills. However, the explicit trigger terms help narrow it somewhat. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured troubleshooting skill with excellent workflow clarity and progressive disclosure. Its main weaknesses are verbosity in the anti-patterns/philosophy/when-to-use sections (which explain reasoning Claude already possesses) and a lack of concrete, executable examples — the usage examples are pseudocode comments rather than actionable demonstrations. The duplicated references section also wastes tokens.
Suggestions
Remove or significantly condense the 'When to Use', 'When Not to Use', 'Anti-Patterns', and 'Philosophy' sections — these explain reasoning Claude already understands and consume significant tokens.
Replace the pseudocode usage examples with concrete, executable examples showing actual commands, search queries, or diagnostic outputs that Claude would produce.
Consolidate the duplicated 'Refs' and 'References' sections into a single section to save tokens.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient with good use of shorthand and structured steps, but includes some redundancy: the 'When to Use'/'When Not to Use'/'Anti-Patterns' sections are verbose and explain reasoning Claude already understands. The Philosophy section and Anti-Patterns explanations with 'Why:' annotations add bulk. The Refs section is duplicated (listed twice with slightly different formatting). | 2 / 3 |
Actionability | The workflow steps provide structured guidance but lack concrete executable examples. The 'Usage Examples' section shows only comments/pseudocode rather than actual commands or code. Key steps like 'WebSearch: [error] [stack] [framework]' are templates rather than executable instructions. The AskUserQuestion guard is concrete and actionable, but most diagnostic steps remain abstract. | 2 / 3 |
Workflow Clarity | The workflow is clearly sequenced (0-6) with explicit validation checkpoints: search before diagnosing, reproduce before theorizing, confirm root cause before persisting, and the OODA loop has clear exit conditions. The completion checklist and mandatory persist step with preconditions demonstrate good feedback loops. The AskUserQuestion guard adds an important error-recovery mechanism. | 3 / 3 |
Progressive Disclosure | The skill provides a clear overview with well-signaled one-level-deep references to diagnose.md, search-multi-source.md, and reference.md. Step 3 explicitly defers to 'references/protocols/diagnose.md for details.' The References section clearly describes what each linked file contains. Content is appropriately split between the overview and detailed reference files. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
Reviewed
Table of Contents