Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
75
71%
Does it follow best practices?
Impact
73%
1.40xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/data-engineering/skills/spark-optimization/SKILL.mdJoin optimization and session configuration
AQE enabled
100%
100%
AQE coalesce partitions
100%
100%
AQE skew join
100%
100%
Kryo serializer
0%
100%
Shuffle partitions
0%
0%
Broadcast join hint
100%
100%
Broadcast threshold config
100%
100%
Shuffle compression
0%
100%
Compression codec lz4
0%
100%
mergeSchema false
0%
100%
Column pruning
0%
100%
maxPartitionBytes config
0%
100%
Caching, persistence, and iterative pipeline patterns
MEMORY_AND_DISK cache
50%
100%
Cache before multiple actions
100%
100%
Unpersist after use
100%
100%
Checkpoint used
0%
0%
Checkpoint dir set
0%
0%
approx_count_distinct used
100%
100%
No Python UDFs
100%
100%
No large collect
100%
100%
No count for existence
100%
100%
AQE enabled
0%
100%
Kryo serializer
100%
100%
Data skew handling and partitioning strategy
Skew detection
100%
100%
Salt column added
100%
80%
Salted key column
60%
80%
Other side exploded
100%
80%
AQE skew factor
100%
100%
AQE skew threshold
100%
100%
Coalesce not repartition
0%
0%
Repartition with key
0%
0%
Write with partitionBy
100%
100%
Memory configuration
100%
100%
Partition count config
100%
100%
Parquet snappy
0%
100%
Bucket joins and Delta Lake optimization
bucketBy used
0%
0%
sortBy used
0%
0%
saveAsTable used
0%
0%
Matching bucket count
0%
0%
Delta optimizeWrite
0%
100%
Delta autoCompact
0%
100%
ZORDER BY applied
100%
100%
Parquet block size
0%
0%
AQE enabled
100%
100%
Kryo serializer
0%
100%
mergeSchema false
0%
0%
Column pruning
0%
0%
Shuffle pre-aggregation and Arrow optimization
Local pre-aggregation
0%
16%
Global aggregation on partial sum
0%
0%
Arrow enabled
100%
100%
approx_count_distinct used
100%
100%
openCostInBytes config
0%
0%
Shuffle compression
0%
100%
Compression codec lz4
0%
100%
AQE shuffle auto
100%
100%
No large collect
100%
100%
Coalesce for output
0%
0%
Kryo serializer
0%
100%
mergeSchema false
0%
0%
Performance monitoring and query plan analysis
explain() used
100%
100%
explain cost mode
50%
100%
spark_partition_id skew check
100%
100%
Skew ratio calculation
62%
100%
statusTracker used
100%
100%
Stage metrics printed
100%
100%
Executor memory monitoring
70%
100%
Memory formatted GB
100%
100%
wholeStage codegen
0%
0%
AQE enabled
100%
100%
calculate_partitions function
100%
100%
Kryo serializer
100%
100%
70444e5
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.