Comprehensive DAG failure diagnosis and root cause analysis. Use for complex debugging requests requiring deep investigation like "diagnose and fix the pipeline", "full root cause analysis", "why is this failing and how to prevent it". For simple debugging ("why did dag fail", "show logs"), the airflow entrypoint skill handles it directly. This skill provides structured investigation and prevention recommendations.
86
83%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-crafted description that excels at completeness and distinctiveness by explicitly defining when to use it versus a related simpler skill. The trigger terms are natural and varied. The main weakness is that the specific capabilities could be more concrete—listing actual investigation steps or analysis techniques rather than the somewhat abstract 'structured investigation and prevention recommendations.'
Suggestions
Add more concrete action verbs describing what the skill actually does, e.g., 'traces task dependency chains, analyzes scheduler logs, identifies upstream data issues, and recommends retry/alerting strategies.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (DAG failure diagnosis, root cause analysis) and mentions 'structured investigation and prevention recommendations,' but doesn't list multiple concrete actions like 'analyze task logs, trace dependency failures, check scheduler health, review XCom data.' | 2 / 3 |
Completeness | Clearly answers both 'what' (comprehensive DAG failure diagnosis and root cause analysis with structured investigation and prevention recommendations) and 'when' (explicit 'Use for' clause with trigger phrases, plus boundary clarification distinguishing it from the simpler airflow entrypoint skill). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms users would say: 'diagnose and fix the pipeline', 'full root cause analysis', 'why is this failing and how to prevent it', plus differentiates from simpler queries like 'why did dag fail' and 'show logs'. Good coverage of natural language variations. | 3 / 3 |
Distinctiveness Conflict Risk | Explicitly distinguishes itself from the 'airflow entrypoint skill' for simple debugging, clearly carving out a niche for complex/deep investigation requests. The boundary between simple and complex debugging is well-defined with example phrases for each. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured diagnostic skill with a clear multi-step workflow and concrete CLI commands that make it highly actionable. The sequential investigation process with decision branches is well-designed. Minor weaknesses include some slightly verbose sections (Prevention suggestions, impact assessment prompts) that Claude could generate independently, and all content being inline rather than leveraging progressive disclosure for platform-specific guidance.
Suggestions
Consider moving the Astro-specific and OSS-specific diagnosis sections into separate reference files to keep the main skill leaner and improve progressive disclosure.
Trim the Prevention section to just 'Recommend specific prevention measures based on the failure category' since Claude can generate appropriate suggestions contextually without the example bullet list.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient but includes some unnecessary elaboration. Phrases like 'Be specific - not "the task failed" but "the task failed because column X was null..."' are helpful examples but the Prevention section lists somewhat obvious suggestions (add alerting, update documentation) that Claude would naturally produce. The Astro/OSS sections add useful context without excessive padding. | 2 / 3 |
Actionability | Provides specific, executable CLI commands throughout (af runs diagnose, af tasks logs, af runs clear, etc.) with concrete argument patterns. The failure categorization taxonomy is specific and actionable, and the output structure template gives Claude a clear format to follow. | 3 / 3 |
Workflow Clarity | Clear 4-step sequential workflow with logical progression from identification → error details → context gathering → actionable output. Each step has explicit sub-steps and decision points (e.g., 'If a specific DAG was mentioned' vs 'If no DAG was specified'). The workflow naturally includes validation through context-checking in Step 3 before providing diagnosis in Step 4. | 3 / 3 |
Progressive Disclosure | Content is well-structured with clear headers and logical sections, but everything is inline in a single file. The Astro-specific and OSS-specific sections could be split into separate reference files. For a skill of this length (~80 lines of content), some separation would improve scannability, though the current organization is reasonable. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
166c98a
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.