Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues". For simple test requests ("test dag", "run dag"), the airflow entrypoint skill handles it directly. This skill is for iterative test-debug-fix cycles.
86
83%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-crafted description that excels at completeness and distinctiveness by explicitly defining when to use it versus a related simpler skill. The trigger terms are natural and varied. The main weakness is that the specific capabilities could be more concrete—listing specific actions like 'analyzes error logs, modifies task code, re-runs failed tasks' would strengthen specificity.
Suggestions
Add more concrete action verbs describing what the skill actually does during the debug-fix cycle, e.g., 'analyzes task error logs, identifies failure points, modifies DAG code, and re-runs failed tasks'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (DAG testing workflows) and mentions actions like debugging and fixing cycles, but doesn't list multiple specific concrete actions (e.g., parsing error logs, modifying task definitions, re-running specific tasks). 'Complex DAG testing workflows with debugging and fixing cycles' is somewhat abstract. | 2 / 3 |
Completeness | Clearly answers both 'what' (complex DAG testing workflows with debugging and fixing cycles) and 'when' (explicit 'Use for' clause with trigger phrases, plus a clear boundary distinguishing it from the simpler airflow entrypoint skill). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger phrases users would actually say: 'test this dag and fix it if it fails', 'test and debug', 'run the pipeline and troubleshoot issues', 'iterative test-debug-fix cycles'. Also clearly distinguishes from simpler triggers like 'test dag' or 'run dag'. | 3 / 3 |
Distinctiveness Conflict Risk | Explicitly differentiates itself from the 'airflow entrypoint skill' for simple test requests, clearly carving out its niche as the iterative test-debug-fix cycle skill. This boundary-setting significantly reduces conflict risk. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable skill with an excellent iterative workflow structure and clear feedback loops. Its main weakness is verbosity — the document repeats concepts across the workflow description, scenarios, and debugging tips sections, and includes some explanatory content Claude doesn't need (e.g., what task logs contain). Trimming redundant scenarios and moving debugging tips to a separate reference file would significantly improve token efficiency.
Suggestions
Consolidate the 6 testing scenarios into 2-3 that cover distinct paths (happy path, failure+fix, DAG-not-found) — scenarios 4 and 6 largely repeat patterns already shown
Move the 'Debugging Tips' and 'Common Error Patterns' sections to a separate DEBUGGING_REFERENCE.md file and link to it, reducing the main skill's token footprint
Remove the 'Reading Task Logs' subsection — Claude already knows how to read stack traces and log output
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably well-structured but includes unnecessary verbosity: the ASCII workflow diagram, redundant scenario examples that repeat the same commands, explanations of what task logs 'typically show' (Claude knows this), and the 'Why this is the preferred method' section. The response interpretation JSON examples are useful but could be more compact. The 'DO NOT' list is valuable but the overall document could be ~30% shorter. | 2 / 3 |
Actionability | Excellent actionability throughout — every command is fully executable with concrete examples, specific flags (--timeout, --conf, --try), real command patterns, and a comprehensive quick reference table. The common fixes table maps issues to specific actions. All commands follow the consistent `af` pattern with clear arguments. | 3 / 3 |
Workflow Clarity | The test → debug → fix cycle is clearly sequenced with explicit validation checkpoints. The workflow has a clear feedback loop ('Repeat the test → debug → fix loop until the DAG succeeds'), handles multiple outcome branches (success, timeout, failure), and each phase has clear entry/exit conditions. The 'FIRST ACTION' directive is an excellent guardrail against unnecessary pre-flight checks. | 3 / 3 |
Progressive Disclosure | The document references related skills at the bottom (authoring-dags, debugging-dags, deploying-airflow) but includes a lot of inline content that could be split out — the 6 testing scenarios, debugging tips, and common error patterns sections make this quite long. The CLI quick reference table is well-placed but the scenarios section largely duplicates the workflow already described above. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
0642adb
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.