Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues". For simple test requests ("test dag", "run dag"), the airflow entrypoint skill handles it directly. This skill is for iterative test-debug-fix cycles.
86
83%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-crafted description that excels at completeness and distinctiveness by explicitly defining both when to use it and when NOT to use it (deferring simple requests to another skill). The trigger terms are natural and varied. The main weakness is that the specific capabilities could be more concrete—listing specific actions like error log analysis, task modification, or retry logic would strengthen it further.
Suggestions
Add more specific concrete actions beyond 'debugging and fixing cycles', such as 'parses error logs, identifies failing tasks, applies fixes, and re-runs tests' to improve specificity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (DAG testing workflows) and mentions actions like debugging and fixing cycles, but doesn't list multiple specific concrete actions (e.g., parsing error logs, modifying task definitions, re-running specific tasks). It stays at a moderate level of specificity. | 2 / 3 |
Completeness | Clearly answers both 'what' (complex DAG testing workflows with debugging and fixing cycles) and 'when' (explicit 'Use for' clause with trigger phrases, plus explicit boundary distinguishing it from the simpler airflow entrypoint skill). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger phrases users would actually say: 'test this dag and fix it if it fails', 'test and debug', 'run the pipeline and troubleshoot issues'. Also distinguishes from simpler requests like 'test dag' or 'run dag', which is helpful for routing. | 3 / 3 |
Distinctiveness Conflict Risk | Explicitly differentiates itself from the simpler 'airflow entrypoint skill' for basic test requests, clearly carving out its niche as the iterative test-debug-fix cycle handler. This boundary-setting significantly reduces conflict risk. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with excellent actionability and workflow clarity — the iterative test-debug-fix cycle is clearly defined with concrete commands and explicit feedback loops. The main weaknesses are verbosity (redundant scenarios, explanations of concepts Claude already knows like how to read stack traces) and a monolithic structure that could benefit from splitting debugging tips and advanced scenarios into separate referenced files.
Suggestions
Move 'Debugging Tips', 'Common Error Patterns', and 'Reading Task Logs' sections into a separate DEBUGGING-REFERENCE.md file and link to it, keeping SKILL.md focused on the core workflow.
Reduce the six testing scenarios to 2-3 representative ones — scenarios 1, 2, and 3 cover the key paths; scenarios 4 and 6 are minor variations that add length without new concepts.
Remove the 'Reading Task Logs' section entirely — Claude already knows how to read stack traces and log output.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably well-structured but includes unnecessary verbosity: the ASCII workflow diagram, redundant scenario examples that repeat the same commands, explanations of what task logs 'typically show' (Claude knows this), and the 'Philosophy' statement. The response interpretation JSON examples are helpful but could be more compact. The 'DO NOT' list is useful but the overall document could be ~30-40% shorter. | 2 / 3 |
Actionability | Excellent actionability throughout — every phase has concrete, copy-paste-ready commands with real arguments. The CLI quick reference table, common fixes table, and scenario walkthroughs all provide specific executable guidance. Commands include actual flags like `--timeout 300` and `--conf '{...}'`. | 3 / 3 |
Workflow Clarity | The test → debug → fix cycle is clearly sequenced with explicit validation checkpoints. The workflow has a clear feedback loop ('Repeat the test → debug → fix loop until the DAG succeeds'), handles multiple outcome branches (success, timeout, failure), and each phase has clear entry/exit conditions. The 'FIRST ACTION' directive is unambiguous. | 3 / 3 |
Progressive Disclosure | The document references related skills at the bottom (authoring-dags, debugging-dags, deploying-airflow) which is good, but the main content is monolithic — the testing scenarios, debugging tips, and Astro-specific content could be split into separate files. The inline content is quite long for a SKILL.md overview, with sections like 'Debugging Tips' and 'Reading Task Logs' that could be in a referenced file. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
166c98a
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.