Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
77
71%
Does it follow best practices?
Impact
77%
1.28xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./tests/ext_conformance/artifacts/agents-wshobson/data-engineering/skills/spark-optimization/SKILL.mdJoin optimization and session configuration
AQE enabled
100%
100%
AQE coalesce partitions
100%
100%
AQE skew join
100%
100%
Kryo serializer
100%
100%
Shuffle partitions
100%
100%
Broadcast join hint
100%
100%
Broadcast threshold config
100%
100%
Shuffle compression
0%
100%
Compression codec lz4
0%
100%
mergeSchema false
100%
100%
Column pruning
0%
100%
maxPartitionBytes config
100%
100%
Caching, persistence, and iterative pipeline patterns
MEMORY_AND_DISK cache
100%
100%
Cache before multiple actions
100%
100%
Unpersist after use
100%
50%
Checkpoint used
0%
0%
Checkpoint dir set
0%
0%
approx_count_distinct used
100%
100%
No Python UDFs
100%
100%
No large collect
100%
100%
No count for existence
100%
100%
AQE enabled
100%
100%
Kryo serializer
0%
100%
Data skew handling and partitioning strategy
Skew detection
100%
100%
Salt column added
100%
0%
Salted key column
50%
0%
Other side exploded
100%
0%
AQE skew factor
0%
100%
AQE skew threshold
100%
100%
Coalesce not repartition
0%
0%
Repartition with key
100%
0%
Write with partitionBy
100%
100%
Memory configuration
100%
100%
Partition count config
100%
100%
Parquet snappy
0%
100%
47823e3
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.