Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
75
71%
Does it follow best practices?
Impact
73%
1.40xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/data-engineering/skills/spark-optimization/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-crafted skill description that concisely covers specific Spark optimization techniques, includes natural trigger terms users would employ, and clearly delineates both what the skill does and when to use it. It uses proper third-person voice and is distinct enough to avoid conflicts with other data-related skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: partitioning, caching, shuffle optimization, and memory tuning. These are distinct, well-defined Spark optimization techniques. | 3 / 3 |
Completeness | Clearly answers both 'what' (optimize Spark jobs with partitioning, caching, shuffle optimization, memory tuning) and 'when' (improving Spark performance, debugging slow jobs, scaling data processing pipelines) with an explicit 'Use when' clause. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'Spark', 'partitioning', 'caching', 'shuffle', 'memory tuning', 'performance', 'slow jobs', 'data processing pipelines'. These cover common terms a user would use when seeking Spark optimization help. | 3 / 3 |
Distinctiveness Conflict Risk | Clearly scoped to Apache Spark optimization specifically, with distinct triggers like 'Spark', 'shuffle optimization', and 'partitioning' that are unlikely to conflict with general data engineering or other processing skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides highly actionable, executable Spark optimization code covering a comprehensive range of patterns, which is its primary strength. However, it is far too verbose for a SKILL.md - it explains concepts Claude already knows (Spark execution model, what storage levels mean), and packs ~300 lines of detailed patterns into a single file that should use progressive disclosure with references to sub-files. The lack of a diagnostic workflow connecting the patterns reduces its effectiveness as a troubleshooting guide.
Suggestions
Move detailed patterns (join optimization, memory tuning, caching, data formats, monitoring) into separate referenced files, keeping only the Quick Start and Configuration Cheat Sheet in SKILL.md with clear links.
Remove the 'Core Concepts' section entirely - Claude already understands Spark's execution model, what shuffles are, and basic performance factors.
Add a diagnostic workflow at the top: 'Step 1: Check Spark UI for skew/spills → Step 2: Identify bottleneck type → Step 3: Apply relevant pattern → Step 4: Verify improvement with metrics comparison.'
Remove inline comments that explain obvious things (e.g., '# Fast compression', '# No shuffle', '# Spark only reads these columns') to reduce token count.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~300+ lines. It explains basic Spark concepts Claude already knows (execution model, what shuffles are, storage level definitions), includes redundant tables of concepts, and has extensive inline comments explaining obvious things. The 'Core Concepts' section with the execution model diagram and key performance factors table adds little value for Claude. | 1 / 3 |
Actionability | The skill provides fully executable Python code throughout - from session configuration to partitioning, join optimization, caching, memory tuning, and monitoring. Code examples are copy-paste ready with specific configurations, function definitions, and concrete patterns like the salt_join function. | 3 / 3 |
Workflow Clarity | While individual patterns are clear, there's no overarching workflow for diagnosing and fixing Spark performance issues. The patterns are presented as isolated techniques without a clear decision tree or sequence. The monitoring/debugging pattern (Pattern 7) comes last rather than being positioned as a diagnostic first step. No validation checkpoints for verifying optimizations actually improved performance. | 2 / 3 |
Progressive Disclosure | This is a monolithic wall of content with 7 detailed patterns, a configuration cheat sheet, and best practices all inline. At this length, patterns like join optimization, memory tuning, and data format optimization should be split into separate referenced files. There are no references to external files for deeper dives. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
27a7ed9
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.