CtrlK
BlogDocsLog inGet started
Tessl Logo

spark-optimization

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

75

1.40x
Quality

71%

Does it follow best practices?

Impact

73%

1.40x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/data-engineering/skills/spark-optimization/SKILL.md
SKILL.md
Quality
Evals
Security

Evaluation results

92%

52%

Sales Analytics Pipeline Optimization

Join optimization and session configuration

Criteria
Without context
With context

AQE enabled

100%

100%

AQE coalesce partitions

100%

100%

AQE skew join

100%

100%

Kryo serializer

0%

100%

Shuffle partitions

0%

0%

Broadcast join hint

100%

100%

Broadcast threshold config

100%

100%

Shuffle compression

0%

100%

Compression codec lz4

0%

100%

mergeSchema false

0%

100%

Column pruning

0%

100%

maxPartitionBytes config

0%

100%

82%

13%

Multi-Feature ML Feature Engineering Pipeline

Caching, persistence, and iterative pipeline patterns

Criteria
Without context
With context

MEMORY_AND_DISK cache

50%

100%

Cache before multiple actions

100%

100%

Unpersist after use

100%

100%

Checkpoint used

0%

0%

Checkpoint dir set

0%

0%

approx_count_distinct used

100%

100%

No Python UDFs

100%

100%

No large collect

100%

100%

No count for existence

100%

100%

AQE enabled

0%

100%

Kryo serializer

100%

100%

78%

6%

E-Commerce Order Attribution Pipeline with Skewed Data

Data skew handling and partitioning strategy

Criteria
Without context
With context

Skew detection

100%

100%

Salt column added

100%

80%

Salted key column

60%

80%

Other side exploded

100%

80%

AQE skew factor

100%

100%

AQE skew threshold

100%

100%

Coalesce not repartition

0%

0%

Repartition with key

0%

0%

Write with partitionBy

100%

100%

Memory configuration

100%

100%

Partition count config

100%

100%

Parquet snappy

0%

100%

38%

22%

Recurring Order-Customer Analytics Pipeline

Bucket joins and Delta Lake optimization

Criteria
Without context
With context

bucketBy used

0%

0%

sortBy used

0%

0%

saveAsTable used

0%

0%

Matching bucket count

0%

0%

Delta optimizeWrite

0%

100%

Delta autoCompact

0%

100%

ZORDER BY applied

100%

100%

Parquet block size

0%

0%

AQE enabled

100%

100%

Kryo serializer

0%

100%

mergeSchema false

0%

0%

Column pruning

0%

0%

57%

23%

Clickstream Event Aggregation Service

Shuffle pre-aggregation and Arrow optimization

Criteria
Without context
With context

Local pre-aggregation

0%

16%

Global aggregation on partial sum

0%

0%

Arrow enabled

100%

100%

approx_count_distinct used

100%

100%

openCostInBytes config

0%

0%

Shuffle compression

0%

100%

Compression codec lz4

0%

100%

AQE shuffle auto

100%

100%

No large collect

100%

100%

Coalesce for output

0%

0%

Kryo serializer

0%

100%

mergeSchema false

0%

0%

92%

10%

Spark Performance Diagnostic Toolkit

Performance monitoring and query plan analysis

Criteria
Without context
With context

explain() used

100%

100%

explain cost mode

50%

100%

spark_partition_id skew check

100%

100%

Skew ratio calculation

62%

100%

statusTracker used

100%

100%

Stage metrics printed

100%

100%

Executor memory monitoring

70%

100%

Memory formatted GB

100%

100%

wholeStage codegen

0%

0%

AQE enabled

100%

100%

calculate_partitions function

100%

100%

Kryo serializer

100%

100%

Repository
wshobson/agents
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.