Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
56
56%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
17%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a niche domain (AI agent self-debugging) but relies heavily on abstract process terminology rather than concrete actions or natural user language. It lacks an explicit 'Use when...' clause and misses common trigger terms users would actually say when encountering errors or needing debugging help.
Suggestions
Add a 'Use when...' clause with natural trigger terms like 'Use when the agent encounters an error, failure, unexpected behavior, or needs to debug its own actions'.
Replace abstract phase names with concrete actions, e.g., 'Captures error context and stack traces, diagnoses root causes, applies targeted fixes, and generates post-mortem reports'.
Include natural keywords users would say such as 'error', 'bug', 'fix', 'debug', 'troubleshoot', 'failure', 'crash', or 'stuck'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names a domain (AI agent self-debugging) and lists phases (capture, diagnosis, contained recovery, introspection reports), but these are abstract process labels rather than concrete actions like 'analyze error logs' or 'generate fix suggestions'. | 2 / 3 |
Completeness | Describes what at a high level (structured self-debugging workflow) but completely lacks a 'Use when...' clause or any explicit trigger guidance, which per the rubric caps completeness at 2, and the 'what' itself is vague enough to warrant a 1. | 1 / 3 |
Trigger Term Quality | Uses technical jargon like 'contained recovery', 'introspection reports', and 'capture' that users would not naturally say. Missing natural trigger terms like 'error', 'bug', 'fix', 'debug', 'failure', 'crash', or 'troubleshoot'. | 1 / 3 |
Distinctiveness Conflict Risk | The 'AI agent failures' and 'self-debugging' framing provides some niche specificity, but the broad terms 'debugging' and 'failures' could overlap with general debugging or error-handling skills. | 2 / 3 |
Total | 6 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured workflow skill with clear phased progression, concrete templates, and good diagnostic guidance. Its main strengths are the actionable four-phase loop with validation checkpoints and the specific pattern-matching diagnostic table. Minor weaknesses include some verbosity in framing sections and the monolithic structure that could benefit from splitting the diagnostic table or templates into referenced files.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably efficient but includes some sections that could be tightened—e.g., the 'Scope Boundaries' section has some redundant framing, and the 'When to Activate' list overlaps with the intro paragraph. However, it mostly avoids explaining things Claude already knows and the tables/checklists are information-dense. | 2 / 3 |
Actionability | The skill provides concrete, copy-paste-ready markdown templates for each phase, a specific diagnostic table mapping patterns to causes and checks, an ordered heuristic list, and explicit good/bad pattern examples. The guidance is specific and directly executable as a workflow. | 3 / 3 |
Workflow Clarity | The four-phase loop is clearly sequenced (Capture → Diagnosis → Recovery → Report) with explicit validation at each step—diagnosis questions before acting, a recovery checklist requiring evidence of success, and a final report requiring result status. The recovery heuristics provide a prioritized sequence with feedback loops (verify before retry). | 3 / 3 |
Progressive Disclosure | The skill references other skills (verification-loop, continuous-learning-v2, council, workspace-surface-audit) in the Integration section, which is good navigation. However, the content is somewhat monolithic—the diagnostic table and templates could potentially be split out. For a skill of this length (~120 lines of content), it's borderline but the inline content is all relevant enough to justify staying in one file. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents