CtrlK
BlogDocsLog inGet started
Tessl Logo

testing-dags

Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues". For simple test requests ("test dag", "run dag"), the airflow entrypoint skill handles it directly. This skill is for iterative test-debug-fix cycles.

86

Quality

83%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted description that excels at completeness and distinctiveness by explicitly defining both when to use it and when NOT to use it (deferring simple requests to another skill). The trigger terms are natural and varied. The main weakness is that the specific capabilities could be more concrete—listing specific actions like error log analysis, task modification, or retry logic would strengthen it further.

Suggestions

Add more specific concrete actions beyond 'debugging and fixing cycles', such as 'parses error logs, identifies failing tasks, applies fixes, and re-runs tests' to improve specificity.

DimensionReasoningScore

Specificity

The description names the domain (DAG testing workflows) and mentions actions like debugging and fixing cycles, but doesn't list multiple specific concrete actions (e.g., parsing error logs, modifying task definitions, re-running specific tasks). It stays at a moderate level of specificity.

2 / 3

Completeness

Clearly answers both 'what' (complex DAG testing workflows with debugging and fixing cycles) and 'when' (explicit 'Use for' clause with trigger phrases, plus explicit boundary distinguishing it from the simpler airflow entrypoint skill).

3 / 3

Trigger Term Quality

Includes strong natural trigger phrases users would actually say: 'test this dag and fix it if it fails', 'test and debug', 'run the pipeline and troubleshoot issues'. Also distinguishes from simpler requests like 'test dag' or 'run dag', which is helpful for routing.

3 / 3

Distinctiveness Conflict Risk

Explicitly differentiates itself from the simpler 'airflow entrypoint skill' for basic test requests, clearly carving out its niche as the iterative test-debug-fix cycle handler. This boundary-setting significantly reduces conflict risk.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill with excellent actionability and workflow clarity — the iterative test-debug-fix cycle is clearly defined with concrete commands and explicit feedback loops. The main weaknesses are verbosity (redundant scenarios, explanations of concepts Claude already knows like how to read stack traces) and a monolithic structure that could benefit from splitting debugging tips and advanced scenarios into separate referenced files.

Suggestions

Move 'Debugging Tips', 'Common Error Patterns', and 'Reading Task Logs' sections into a separate DEBUGGING-REFERENCE.md file and link to it, keeping SKILL.md focused on the core workflow.

Reduce the six testing scenarios to 2-3 representative ones — scenarios 1, 2, and 3 cover the key paths; scenarios 4 and 6 are minor variations that add length without new concepts.

Remove the 'Reading Task Logs' section entirely — Claude already knows how to read stack traces and log output.

DimensionReasoningScore

Conciseness

The skill is reasonably well-structured but includes unnecessary verbosity: the ASCII workflow diagram, redundant scenario examples that repeat the same commands, explanations of what task logs 'typically show' (Claude knows this), and the 'Philosophy' statement. The response interpretation JSON examples are helpful but could be more compact. The 'DO NOT' list is useful but the overall document could be ~30-40% shorter.

2 / 3

Actionability

Excellent actionability throughout — every phase has concrete, copy-paste-ready commands with real arguments. The CLI quick reference table, common fixes table, and scenario walkthroughs all provide specific executable guidance. Commands include actual flags like `--timeout 300` and `--conf '{...}'`.

3 / 3

Workflow Clarity

The test → debug → fix cycle is clearly sequenced with explicit validation checkpoints. The workflow has a clear feedback loop ('Repeat the test → debug → fix loop until the DAG succeeds'), handles multiple outcome branches (success, timeout, failure), and each phase has clear entry/exit conditions. The 'FIRST ACTION' directive is unambiguous.

3 / 3

Progressive Disclosure

The document references related skills at the bottom (authoring-dags, debugging-dags, deploying-airflow) which is good, but the main content is monolithic — the testing scenarios, debugging tips, and Astro-specific content could be split into separate files. The inline content is quite long for a SKILL.md overview, with sections like 'Debugging Tips' and 'Reading Task Logs' that could be in a referenced file.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
astronomer/agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.