CtrlK
BlogDocsLog inGet started
Tessl Logo

testing-dags

Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues". For simple test requests ("test dag", "run dag"), the airflow entrypoint skill handles it directly. This skill is for iterative test-debug-fix cycles.

68

Quality

83%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill with a clear iterative workflow and excellent concrete guidance. Its main weakness is verbosity — the document could be significantly shorter by removing redundant scenario examples, trimming JSON response samples, and offloading reference material (debugging tips, CLI quick reference) into separate bundle files. The workflow clarity is a standout strength with explicit feedback loops and anti-patterns.

Suggestions

Move the 'Debugging Tips', 'Testing Scenarios', and 'CLI Quick Reference' sections into separate bundle files (e.g., DEBUGGING_TIPS.md, SCENARIOS.md) and reference them from the main skill to improve progressive disclosure and reduce token usage.

Trim the JSON response interpretation examples to show only the key fields (state, failed_tasks) rather than full response bodies — Claude can interpret JSON without seeing every field.

DimensionReasoningScore

Conciseness

The skill is reasonably efficient but includes redundant content: the workflow overview diagram repeats what's already stated in text, response interpretation examples are overly verbose with full JSON blocks, and the 'Debugging Tips' section at the end covers patterns Claude would already know (e.g., 'ModuleNotFoundError means package missing'). The testing scenarios section also repeats the same workflow multiple times with minor variations.

2 / 3

Actionability

Excellent actionability throughout — every step has concrete, copy-paste-ready CLI commands with specific flags and arguments. The common fixes table maps issues to specific actions, and the CLI quick reference provides a complete command catalog. Commands are real and executable, not pseudocode.

3 / 3

Workflow Clarity

The test → debug → fix cycle is clearly sequenced with explicit phases, a visual diagram, and validation checkpoints. The feedback loop is explicit ('Repeat the test → debug → fix loop until the DAG succeeds'), and the workflow handles multiple outcomes (success, timeout, failure) with clear branching instructions. The 'FIRST ACTION' section establishes a strong anti-pattern guard.

3 / 3

Progressive Disclosure

The content is well-structured with clear sections and headers, but it's quite long (~300 lines) with no bundle files to offload detail into. The response interpretation JSON examples, debugging tips, and testing scenarios could be split into separate reference files. Related skills are mentioned at the end but there are no actual bundle files to support progressive disclosure.

2 / 3

Total

10

/

12

Passed

Description

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted description that excels at routing clarity by explicitly distinguishing itself from a related simpler skill. The trigger terms are natural and varied, and the 'when' guidance is thorough with concrete example phrases. The main weakness is that the specific capabilities could be more granular—listing concrete actions beyond the general 'debugging and fixing cycles' would strengthen it.

Suggestions

Add more specific concrete actions to improve specificity, e.g., 'Runs DAG tests, parses error logs, identifies failing tasks, applies fixes, and reruns until passing.'

DimensionReasoningScore

Specificity

The description names the domain (DAG testing workflows) and describes the general action (debugging and fixing cycles, iterative test-debug-fix cycles), but doesn't list multiple specific concrete actions like 'parse error logs', 'modify task definitions', 'rerun failed tasks', etc.

2 / 3

Completeness

Clearly answers both 'what' (complex DAG testing workflows with debugging and fixing cycles) and 'when' (multi-step testing requests with explicit example phrases). Also explicitly states when NOT to use it (simple test requests), which adds clarity.

3 / 3

Trigger Term Quality

Includes strong natural trigger terms users would actually say: 'test this dag and fix it if it fails', 'test and debug', 'run the pipeline and troubleshoot issues'. Also distinguishes from simpler requests like 'test dag' or 'run dag', which helps with routing.

3 / 3

Distinctiveness Conflict Risk

Explicitly distinguishes itself from the 'airflow entrypoint skill' for simple test requests, creating a clear boundary. The focus on iterative test-debug-fix cycles carves out a distinct niche that is unlikely to conflict with other skills.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
astronomer/agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.