CtrlK
BlogDocsLog inGet started
Tessl Logo

testing-dags

Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues". For simple test requests ("test dag", "run dag"), the airflow entrypoint skill handles it directly. This skill is for iterative test-debug-fix cycles.

86

Quality

83%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted description that excels at completeness and distinctiveness by explicitly defining when to use it versus a related simpler skill. The trigger terms are natural and varied. The main weakness is that the specific capabilities could be more concrete—listing specific actions like 'analyzes error logs, modifies task code, re-runs failed tasks' would strengthen specificity.

Suggestions

Add more concrete action verbs describing what the skill actually does during the debug-fix cycle, e.g., 'analyzes task error logs, identifies failure points, modifies DAG code, and re-runs failed tasks'.

DimensionReasoningScore

Specificity

The description names the domain (DAG testing workflows) and mentions actions like debugging and fixing cycles, but doesn't list multiple specific concrete actions (e.g., parsing error logs, modifying task definitions, re-running specific tasks). 'Complex DAG testing workflows with debugging and fixing cycles' is somewhat abstract.

2 / 3

Completeness

Clearly answers both 'what' (complex DAG testing workflows with debugging and fixing cycles) and 'when' (explicit 'Use for' clause with trigger phrases, plus a clear boundary distinguishing it from the simpler airflow entrypoint skill).

3 / 3

Trigger Term Quality

Includes strong natural trigger phrases users would actually say: 'test this dag and fix it if it fails', 'test and debug', 'run the pipeline and troubleshoot issues', 'iterative test-debug-fix cycles'. Also clearly distinguishes from simpler triggers like 'test dag' or 'run dag'.

3 / 3

Distinctiveness Conflict Risk

Explicitly differentiates itself from the 'airflow entrypoint skill' for simple test requests, clearly carving out its niche as the iterative test-debug-fix cycle skill. This boundary-setting significantly reduces conflict risk.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, highly actionable skill with an excellent iterative workflow structure and clear feedback loops. Its main weakness is verbosity — the document repeats concepts across the workflow description, scenarios, and debugging tips sections, and includes some explanatory content Claude doesn't need (e.g., what task logs contain). Trimming redundant scenarios and moving debugging tips to a separate reference file would significantly improve token efficiency.

Suggestions

Consolidate the 6 testing scenarios into 2-3 that cover distinct paths (happy path, failure+fix, DAG-not-found) — scenarios 4 and 6 largely repeat patterns already shown

Move the 'Debugging Tips' and 'Common Error Patterns' sections to a separate DEBUGGING_REFERENCE.md file and link to it, reducing the main skill's token footprint

Remove the 'Reading Task Logs' subsection — Claude already knows how to read stack traces and log output

DimensionReasoningScore

Conciseness

The skill is reasonably well-structured but includes unnecessary verbosity: the ASCII workflow diagram, redundant scenario examples that repeat the same commands, explanations of what task logs 'typically show' (Claude knows this), and the 'Why this is the preferred method' section. The response interpretation JSON examples are useful but could be more compact. The 'DO NOT' list is valuable but the overall document could be ~30% shorter.

2 / 3

Actionability

Excellent actionability throughout — every command is fully executable with concrete examples, specific flags (--timeout, --conf, --try), real command patterns, and a comprehensive quick reference table. The common fixes table maps issues to specific actions. All commands follow the consistent `af` pattern with clear arguments.

3 / 3

Workflow Clarity

The test → debug → fix cycle is clearly sequenced with explicit validation checkpoints. The workflow has a clear feedback loop ('Repeat the test → debug → fix loop until the DAG succeeds'), handles multiple outcome branches (success, timeout, failure), and each phase has clear entry/exit conditions. The 'FIRST ACTION' directive is an excellent guardrail against unnecessary pre-flight checks.

3 / 3

Progressive Disclosure

The document references related skills at the bottom (authoring-dags, debugging-dags, deploying-airflow) but includes a lot of inline content that could be split out — the 6 testing scenarios, debugging tips, and common error patterns sections make this quite long. The CLI quick reference table is well-placed but the scenarios section largely duplicates the workflow already described above.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
astronomer/agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.