CtrlK
BlogDocsLog inGet started
Tessl Logo

tdg-personal/agent-introspection-debugging

Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.

56

Quality

56%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

17%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a niche domain (AI agent self-debugging) but relies heavily on abstract process terminology rather than concrete actions or natural user language. It lacks an explicit 'Use when...' clause and misses common trigger terms users would actually say when encountering errors or needing debugging help.

Suggestions

Add a 'Use when...' clause with natural trigger terms like 'Use when the agent encounters an error, failure, unexpected behavior, or needs to debug its own actions'.

Replace abstract phase names with concrete actions, e.g., 'Captures error context and stack traces, diagnoses root causes, applies targeted fixes, and generates post-mortem reports'.

Include natural keywords users would say such as 'error', 'bug', 'fix', 'debug', 'troubleshoot', 'failure', 'crash', or 'stuck'.

DimensionReasoningScore

Specificity

Names a domain (AI agent self-debugging) and lists phases (capture, diagnosis, contained recovery, introspection reports), but these are abstract process labels rather than concrete actions like 'analyze error logs' or 'generate fix suggestions'.

2 / 3

Completeness

Describes what at a high level (structured self-debugging workflow) but completely lacks a 'Use when...' clause or any explicit trigger guidance, which per the rubric caps completeness at 2, and the 'what' itself is vague enough to warrant a 1.

1 / 3

Trigger Term Quality

Uses technical jargon like 'contained recovery', 'introspection reports', and 'capture' that users would not naturally say. Missing natural trigger terms like 'error', 'bug', 'fix', 'debug', 'failure', 'crash', or 'troubleshoot'.

1 / 3

Distinctiveness Conflict Risk

The 'AI agent failures' and 'self-debugging' framing provides some niche specificity, but the broad terms 'debugging' and 'failures' could overlap with general debugging or error-handling skills.

2 / 3

Total

6

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured workflow skill with clear phased progression, concrete templates, and good diagnostic guidance. Its main strengths are the actionable four-phase loop with validation checkpoints and the specific pattern-matching diagnostic table. Minor weaknesses include some verbosity in framing sections and the monolithic structure that could benefit from splitting the diagnostic table or templates into referenced files.

DimensionReasoningScore

Conciseness

The skill is reasonably efficient but includes some sections that could be tightened—e.g., the 'Scope Boundaries' section has some redundant framing, and the 'When to Activate' list overlaps with the intro paragraph. However, it mostly avoids explaining things Claude already knows and the tables/checklists are information-dense.

2 / 3

Actionability

The skill provides concrete, copy-paste-ready markdown templates for each phase, a specific diagnostic table mapping patterns to causes and checks, an ordered heuristic list, and explicit good/bad pattern examples. The guidance is specific and directly executable as a workflow.

3 / 3

Workflow Clarity

The four-phase loop is clearly sequenced (Capture → Diagnosis → Recovery → Report) with explicit validation at each step—diagnosis questions before acting, a recovery checklist requiring evidence of success, and a final report requiring result status. The recovery heuristics provide a prioritized sequence with feedback loops (verify before retry).

3 / 3

Progressive Disclosure

The skill references other skills (verification-loop, continuous-learning-v2, council, workspace-surface-audit) in the Integration section, which is good navigation. However, the content is somewhat monolithic—the diagnostic table and templates could potentially be split out. For a skill of this length (~120 lines of content), it's borderline but the inline content is all relevant enough to justify staying in one file.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents