Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
57
64%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/data-engineering/skills/spark-optimization/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-crafted skill description that concisely covers specific Spark optimization techniques, includes natural trigger terms users would employ, and clearly delineates both what the skill does and when it should be used. It uses proper third-person voice and is distinct enough to avoid conflicts with other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: partitioning, caching, shuffle optimization, and memory tuning. These are distinct, well-defined Spark optimization techniques. | 3 / 3 |
Completeness | Clearly answers both what ('Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning') and when ('Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines') with explicit trigger guidance. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'Spark', 'partitioning', 'caching', 'shuffle', 'memory tuning', 'performance', 'slow jobs', 'data processing pipelines'. These cover common terms a user would use when seeking Spark optimization help. | 3 / 3 |
Distinctiveness Conflict Risk | Clearly scoped to Apache Spark optimization specifically, with domain-specific triggers like 'Spark', 'shuffle optimization', 'partitioning' that are unlikely to conflict with general data processing or other big data skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
29%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a reasonable starting template for Spark optimization with a functional Quick Start code block and useful best practices list, but it falls short on actionability by deferring all detailed patterns to a non-existent reference file. The workflow for actually diagnosing and resolving Spark performance issues is absent, and the core concepts section explains things Claude already knows, wasting token budget.
Suggestions
Add a concrete diagnostic workflow: 1) Check Spark UI for stage durations/skew, 2) Identify bottleneck type, 3) Apply specific fix, 4) Verify improvement via metrics—with explicit validation checkpoints.
Either include the referenced 'references/details.md' file with concrete patterns for partitioning, shuffle optimization, and skew handling, or inline the most critical patterns directly in the SKILL.md.
Remove or drastically condense the 'Core Concepts' section (execution model diagram, performance factors table)—Claude already knows these. Replace with specific, executable code examples for each optimization pattern (e.g., salted join for skew, broadcast join threshold config).
Convert the Do's/Don'ts from abstract advice into concrete code examples showing the anti-pattern and the correct pattern side by side.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The 'Core Concepts' section explaining Spark's execution model (Driver → Job → Stages → Tasks) and the key performance factors table are things Claude already knows well. The 'When to Use This Skill' section is also somewhat redundant. However, the code examples and best practices are reasonably tight. | 2 / 3 |
Actionability | The Quick Start provides executable code for session configuration and a basic ETL pattern, but the detailed optimization patterns (partitioning strategies, shuffle optimization, memory tuning, data skew handling) are deferred to a references/details.md file that doesn't exist in the bundle. The best practices are bullet-point advice rather than concrete, executable guidance. | 2 / 3 |
Workflow Clarity | There is no clear multi-step workflow for diagnosing and fixing Spark performance issues. The skill describes what to do conceptually but lacks a sequenced process with validation checkpoints—e.g., how to identify the bottleneck via Spark UI, apply a fix, then verify improvement. For a performance tuning skill involving potentially destructive changes to production configs, this is a significant gap. | 1 / 3 |
Progressive Disclosure | The skill references 'references/details.md' for detailed patterns, but no bundle files are provided, meaning this reference leads nowhere. The content is also somewhat monolithic—the core concepts section could be omitted or moved to a reference file, while the actual detailed optimization patterns that should be in the main skill are missing entirely. | 1 / 3 |
Total | 6 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
cf6059d
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.