Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.
71
49%
Does it follow best practices?
Impact
83%
1.13xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./engineering-team/senior-data-engineer/SKILL.mdQuality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates capabilities, names specific technologies, and includes an explicit 'Use when' clause with multiple trigger scenarios. Its main weakness is the very broad scope, which could cause overlap with more specialized skills in areas like SQL, Python, or specific tools like Kafka or Airflow. The description uses proper third-person voice throughout.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions and domains: building data pipelines, ETL/ELT systems, data modeling, pipeline orchestration, data quality, and DataOps. Also names specific technologies (Spark, Airflow, dbt, Kafka). | 3 / 3 |
Completeness | Clearly answers both 'what' (building data pipelines, ETL/ELT systems, data infrastructure, data modeling, orchestration, data quality, DataOps) and 'when' with an explicit 'Use when...' clause covering five distinct trigger scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'data pipelines', 'ETL', 'ELT', 'Spark', 'Airflow', 'dbt', 'Kafka', 'data modeling', 'data quality', 'data governance', 'data architecture'. These are terms practitioners naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | While it carves out a clear data engineering niche, the breadth is very wide—covering everything from SQL to Kafka to data governance—which could overlap with more specialized skills for database design, SQL optimization, Kafka messaging, or general Python development. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
7%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is mostly a high-level overview document that explains concepts Claude already knows (batch vs streaming, Lambda vs Kappa, warehouse vs lakehouse) while failing to provide any actionable, executable guidance. The Quick Start references non-existent scripts, the Workflows section is empty, and the troubleshooting is fully delegated. The skill would need a fundamental rewrite to focus on concrete, executable patterns rather than conceptual comparisons.
Suggestions
Remove the Trigger Phrases section entirely and remove or drastically reduce the architecture comparison tables—Claude already knows these tradeoffs. Replace with project-specific conventions or non-obvious decision criteria.
Add actual executable code examples: a real Airflow DAG, a dbt model, a Spark job, a Kafka consumer—concrete patterns Claude can adapt rather than fictional CLI tools.
Inline the Workflows content with clear step-by-step sequences including validation checkpoints (e.g., 'Run dbt test after each transformation step; only proceed if all tests pass').
Replace the Quick Start fictional scripts with real, executable commands using actual tools (e.g., 'airflow dags test', 'dbt run --select', 'great_expectations checkpoint run').
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is verbose with significant padding. The 'Trigger Phrases' section (~30 lines) is entirely unnecessary—Claude doesn't need to be told when to activate. The architecture decision tables, while informative, explain concepts Claude already knows well (batch vs streaming tradeoffs, Lambda vs Kappa). The tech stack table is a simple enumeration that adds no actionable value. | 1 / 3 |
Actionability | The Quick Start references fictional scripts (pipeline_orchestrator.py, data_quality_validator.py, etl_performance_optimizer.py) that don't exist and aren't provided. The Workflows section is entirely empty, just pointing to a reference file. There is no executable code, no real commands, no concrete examples of actual pipeline code, SQL, Spark jobs, or dbt models that Claude could use. | 1 / 3 |
Workflow Clarity | The Workflows section is completely empty—it's just a pointer to 'references/workflows.md' with no content. There are no step-by-step processes, no validation checkpoints, no feedback loops. For a skill covering ETL pipelines and data quality (which involve destructive/batch operations), this is a critical gap. | 1 / 3 |
Progressive Disclosure | The skill does attempt progressive disclosure with references to external files (references/data_pipeline_architecture.md, references/data_modeling_patterns.md, etc.) and provides clear bullet-point summaries of what each contains. However, the main file itself has too much inlined content that Claude already knows (architecture comparison tables) while the actually useful content (workflows, troubleshooting) is entirely delegated to external files with no inline substance. | 2 / 3 |
Total | 5 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
967fe01
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.