Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.
71
49%
Does it follow best practices?
Impact
83%
1.13xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./engineering-team/senior-data-engineer/SKILL.mddbt project structure and testing
Staging layer exists
100%
100%
Intermediate layer exists
100%
100%
Marts layer exists
100%
100%
Staging materialized as view
100%
100%
Intermediate materialized as ephemeral
100%
100%
Incremental merge strategy
100%
100%
on_schema_change set
100%
100%
cluster_by configured
0%
100%
Watermark filter in incremental block
100%
100%
Column tests: unique and not_null
100%
100%
accepted_range test on amount
100%
100%
Recency test present
0%
100%
Source freshness config
100%
100%
Surrogate key macro used
0%
0%
Reusable macro defined
100%
100%
Kafka Spark streaming pipeline with DLQ
Kafka partitions=12
100%
0%
Kafka replication-factor=3
100%
100%
Kafka retention config
100%
100%
Kafka cleanup.policy=delete
100%
100%
Checkpoint location set
100%
100%
shuffle.partitions=12
100%
0%
failOnDataLoss=false
0%
100%
Watermark applied
100%
100%
Delta Lake append output
100%
100%
processingTime trigger
50%
100%
foreachBatch pattern used
100%
100%
DLQ write with error columns
100%
100%
Prometheus Counter metric
100%
100%
Prometheus Gauge metric
100%
100%
Airflow ETL pipeline and data contracts
Uses pipeline_orchestrator.py
50%
100%
Orchestrator source/destination flags
50%
100%
Orchestrator incremental mode
57%
57%
Uses data_quality_validator.py
50%
100%
Validator checks specified
0%
0%
Airflow retries=2
0%
0%
Airflow retry_delay=5min
0%
100%
Airflow email_on_failure=True
100%
100%
Airflow schedule 0 5 * * *
100%
100%
Airflow catchup=False
100%
100%
Airflow tags present
100%
100%
Data contract schema section
100%
100%
Data contract SLA section
71%
100%
Data contract consumers section
83%
100%
Spark batch optimization and medallion architecture
KryoSerializer configured
0%
100%
Adaptive query execution enabled
100%
100%
Executor memory 8g
100%
0%
Executor cores 4
0%
100%
Driver memory 4g
100%
0%
Shuffle partitions 200
0%
0%
Broadcast join for small table
100%
100%
MEMORY_AND_DISK persistence
0%
0%
Unpersist called
0%
0%
Bronze: mergeSchema=true
0%
100%
Bronze: metadata columns
42%
100%
Bronze: append mode
0%
100%
Silver: DeltaTable merge upsert
100%
100%
Gold: partitioned by date
100%
100%
All layers use Delta format
100%
100%
SCD Type 2 dimensional modeling in dbt
Surrogate key via generate_surrogate_key
50%
70%
incremental materialization
100%
100%
strategy='check' with check_cols
0%
0%
effective_start_date column
100%
100%
effective_end_date = 9999-12-31
0%
100%
is_current boolean
100%
100%
row_hash for change detection
20%
100%
is_incremental() filter block
100%
100%
Natural key retained
100%
100%
Staging materialized as view
0%
100%
Tests: unique and not_null on surrogate key
100%
100%
Tests: recency on updated_at or effective_start_date
0%
0%
Macro file created
100%
100%
Exactly-once Kafka producer and idempotent Airflow backfill
acks='all'
100%
100%
enable_idempotence=True
100%
100%
max_in_flight=5
100%
100%
retries=max int
100%
100%
transactional_id set
100%
100%
init_transactions called
100%
0%
begin/commit/abort pattern
100%
100%
send_offsets_to_transaction
100%
100%
catchup=True
100%
100%
max_active_runs set
100%
100%
Processes by execution_date not today
100%
100%
Idempotent load pattern
100%
100%
Loads to date partition
100%
100%
967fe01
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.