Name: tdg-personal/agent-introspection-debugging
Rating: 45.6 (1 reviews)
Author: tdg-personal

tdg-personal/agent-introspection-debugging

Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.

Quality

56%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Discovery

17%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a niche domain (AI agent self-debugging) but relies heavily on abstract process terminology rather than concrete actions or natural user language. It lacks an explicit 'Use when...' clause and misses common trigger terms users would actually say when encountering errors or needing debugging help.

Suggestions

Add a 'Use when...' clause with natural trigger terms like 'Use when the agent encounters an error, failure, unexpected behavior, or needs to debug its own actions'.

Replace abstract phase names with concrete actions, e.g., 'Captures error context and stack traces, diagnoses root causes, applies targeted fixes, and generates post-mortem reports'.

Include natural keywords users would say such as 'error', 'bug', 'fix', 'debug', 'troubleshoot', 'failure', 'crash', or 'stuck'.

Dimension	Reasoning	Score
Specificity	Names a domain (AI agent self-debugging) and lists phases (capture, diagnosis, contained recovery, introspection reports), but these are abstract process labels rather than concrete actions like 'analyze error logs' or 'generate fix suggestions'.	2 / 3
Completeness	Describes what at a high level (structured self-debugging workflow) but completely lacks a 'Use when...' clause or any explicit trigger guidance, which per the rubric caps completeness at 2, and the 'what' itself is vague enough to warrant a 1.	1 / 3
Trigger Term Quality	Uses technical jargon like 'contained recovery', 'introspection reports', and 'capture' that users would not naturally say. Missing natural trigger terms like 'error', 'bug', 'fix', 'debug', 'failure', 'crash', or 'troubleshoot'.	1 / 3
Distinctiveness Conflict Risk	The 'AI agent failures' and 'self-debugging' framing provides some niche specificity, but the broad terms 'debugging' and 'failures' could overlap with general debugging or error-handling skills.	2 / 3
	Total	6 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured workflow skill with clear phased progression, concrete templates, and good diagnostic guidance. Its main strengths are the actionable four-phase loop with validation checkpoints and the specific pattern-matching diagnostic table. Minor weaknesses include some verbosity in framing sections and the monolithic structure that could benefit from splitting the diagnostic table or templates into referenced files.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but includes some sections that could be tightened—e.g., the 'Scope Boundaries' section has some redundant framing, and the 'When to Activate' list overlaps with the intro paragraph. However, it mostly avoids explaining things Claude already knows and the tables/checklists are information-dense.	2 / 3
Actionability	The skill provides concrete, copy-paste-ready markdown templates for each phase, a specific diagnostic table mapping patterns to causes and checks, an ordered heuristic list, and explicit good/bad pattern examples. The guidance is specific and directly executable as a workflow.	3 / 3
Workflow Clarity	The four-phase loop is clearly sequenced (Capture → Diagnosis → Recovery → Report) with explicit validation at each step—diagnosis questions before acting, a recovery checklist requiring evidence of success, and a final report requiring result status. The recovery heuristics provide a prioritized sequence with feedback loops (verify before retry).	3 / 3
Progressive Disclosure	The skill references other skills (verification-loop, continuous-learning-v2, council, workspace-surface-audit) in the Integration section, which is good navigation. However, the content is somewhat monolithic—the diagnostic table and templates could potentially be split out. For a skill of this length (~120 lines of content), it's borderline but the inline content is all relevant enough to justify staying in one file.	2 / 3
	Total	10 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Reviewed

about 1 month ago

Table of Contents

Discovery Implementation Validation