Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.
71
49%
Does it follow best practices?
Impact
83%
1.13xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./engineering-team/senior-data-engineer/SKILL.mdQuality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates capabilities, names specific technologies, and includes an explicit 'Use when' clause with multiple trigger scenarios. Its main weakness is breadth—it covers so much of the data engineering domain that it could potentially conflict with more specialized skills (e.g., a dedicated Airflow skill, a SQL optimization skill, or a Kafka skill). The description uses proper third-person voice throughout.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions and domains: building data pipelines, ETL/ELT systems, data modeling, pipeline orchestration, data quality, and DataOps. Also names specific technologies (Spark, Airflow, dbt, Kafka). | 3 / 3 |
Completeness | Clearly answers both 'what' (building data pipelines, ETL/ELT systems, data infrastructure, data modeling, orchestration, data quality, DataOps) and 'when' with an explicit 'Use when...' clause covering five distinct trigger scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'data pipelines', 'ETL', 'ELT', 'Spark', 'Airflow', 'dbt', 'Kafka', 'data modeling', 'data quality', 'data governance', 'data architecture'. These are terms practitioners naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | While data engineering is a recognizable niche, the description is quite broad and could overlap with general Python/SQL skills, database administration skills, or cloud infrastructure skills. Terms like 'troubleshooting data issues' and 'optimizing data workflows' are somewhat generic and could conflict with analytics or database skills. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
7%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is mostly a high-level overview of data engineering concepts that Claude already knows, with no actionable, executable content. The Quick Start references non-existent scripts, the workflows section is empty, and the bulk of the content consists of comparison tables for well-known architectural patterns. The skill would need a fundamental rewrite to provide actual value—replacing conceptual overviews with concrete, executable code examples and real workflow steps.
Suggestions
Replace the fictional CLI scripts in Quick Start with real, executable code examples (e.g., a complete Airflow DAG, a dbt model, a Spark job, or a Kafka consumer).
Remove the Trigger Phrases section entirely and remove or drastically condense the architecture comparison tables—Claude already knows these concepts.
Add actual workflow content inline with concrete steps, validation checkpoints, and error recovery loops (e.g., step-by-step ETL pipeline build with data quality validation between stages).
Include at least one complete, copy-paste-ready example for a common task like building a dbt model with tests, writing an Airflow DAG, or setting up Great Expectations checks.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is verbose with significant padding. The 'Trigger Phrases' section (~30 lines) is entirely unnecessary—Claude doesn't need to be told when to activate. The architecture decision tables explain concepts Claude already knows well (batch vs streaming, Lambda vs Kappa, warehouse vs lakehouse). The tech stack table is a simple enumeration of well-known tools adding no actionable value. | 1 / 3 |
Actionability | The Quick Start references fictional scripts (pipeline_orchestrator.py, data_quality_validator.py, etl_performance_optimizer.py) that don't exist and aren't provided. The workflows section is entirely empty, pointing to a reference file. There is no executable code, no real commands, no concrete examples of actual pipeline code, SQL, Spark jobs, or dbt models. | 1 / 3 |
Workflow Clarity | The Workflows section is completely empty—just a pointer to 'references/workflows.md'. There are no sequenced steps, no validation checkpoints, no feedback loops. For a skill covering ETL pipelines and data quality (which involve destructive/batch operations), the total absence of workflow steps is a critical gap. | 1 / 3 |
Progressive Disclosure | The skill does attempt progressive disclosure with a table of contents and references to external files (references/data_pipeline_architecture.md, etc.) with clear topic summaries. However, the main file contains too much inline content that Claude already knows (architecture comparison tables) while the actually useful content (workflows, troubleshooting) is entirely delegated to external files with no inline summary. | 2 / 3 |
Total | 5 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f567c61
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.