testing-dags

Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues". For simple test requests ("test dag", "run dag"), the airflow entrypoint skill handles it directly. This skill is for iterative test-debug-fix cycles.

Quality

83%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted description that excels at completeness and distinctiveness by explicitly defining both when to use it and when NOT to use it, drawing a clear boundary with a related skill. The trigger terms are natural and varied. The main weakness is that the specific capabilities could be more concretely enumerated beyond the general 'test-debug-fix' pattern.

Dimension	Reasoning	Score
Specificity	The description names the domain (DAG testing workflows) and mentions actions like debugging and fixing cycles, but doesn't list multiple specific concrete actions beyond 'test', 'debug', and 'fix'. It describes the workflow pattern rather than enumerating distinct capabilities.	2 / 3
Completeness	Clearly answers both 'what' (complex DAG testing workflows with debugging and fixing cycles) and 'when' (multi-step testing requests with explicit example phrases). Also explicitly states when NOT to use it (simple test requests), which adds clarity.	3 / 3
Trigger Term Quality	Includes strong natural trigger terms users would say: 'test this dag and fix it if it fails', 'test and debug', 'run the pipeline and troubleshoot issues', 'iterative test-debug-fix cycles'. Also clearly distinguishes from simpler triggers like 'test dag' or 'run dag' which go elsewhere.	3 / 3
Distinctiveness Conflict Risk	Explicitly differentiates itself from the 'airflow entrypoint skill' for simple test requests, clearly carving out its niche as the iterative test-debug-fix cycle handler. The boundary between this skill and the simpler one is well-defined.	3 / 3
	Total	11 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill with a clear iterative workflow and good error recovery paths. Its main weakness is length — the content includes verbose response examples, an ASCII diagram, explanations of concepts Claude already understands (like what task logs contain), and extensive scenarios that could be offloaded to separate files. Trimming redundancy and splitting detailed reference material would significantly improve token efficiency.

Suggestions

Remove or significantly trim the 'Reading Task Logs' section and 'Why this is the preferred method' explanation — Claude already understands log structure and can infer method benefits from the feature list.

Move the detailed JSON response interpretation examples and testing scenarios (5-6) into a separate reference file to reduce the main skill's token footprint.

Condense the ASCII workflow diagram into a single line like 'Workflow: trigger-wait → if failed: diagnose → fix → re-trigger (loop until success)' to save tokens.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but includes some unnecessary verbosity: the ASCII workflow diagram, the 'Why this is the preferred method' explanation, the 'Reading Task Logs' section explaining what logs contain (Claude knows this), and the response interpretation JSON examples are quite lengthy. The 'DO NOT' list and repeated philosophy statements add bulk. Could be tightened by ~30%.	2 / 3
Actionability	Excellent actionability throughout — every phase has concrete, copy-paste-ready CLI commands with specific flags and arguments. The common fixes table maps issues to specific actions. The testing scenarios provide complete executable command sequences for different situations.	3 / 3
Workflow Clarity	The test → debug → fix cycle is clearly sequenced with explicit phases. Validation checkpoints are built into the workflow (trigger-wait returns success/failure, diagnose provides comprehensive diagnosis). The feedback loop is explicit: 'Repeat the test → debug → fix loop until the DAG succeeds.' Error recovery paths are well-defined for each failure mode (failed, timed out, DAG not found).	3 / 3
Progressive Disclosure	The content is well-structured with clear sections and a CLI quick reference table, but it's quite long (~300 lines) and monolithic. The response interpretation JSON examples, debugging tips, and multiple testing scenarios could be split into referenced files. The 'Related Skills' section at the end provides good cross-references, but the main content itself could benefit from offloading detailed scenarios and debugging patterns.	2 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: astronomer/agents
Commit: 535a040

Reviewed: 14 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.