CtrlK
BlogDocsLog inGet started
Tessl Logo

spark-optimization

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

75

1.40x
Quality

71%

Does it follow best practices?

Impact

73%

1.40x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/data-engineering/skills/spark-optimization/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted skill description that concisely covers specific Spark optimization techniques, includes natural trigger terms users would employ, and clearly delineates both what the skill does and when to use it. It uses proper third-person voice and is distinct enough to avoid conflicts with other data-related skills.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: partitioning, caching, shuffle optimization, and memory tuning. These are distinct, well-defined Spark optimization techniques.

3 / 3

Completeness

Clearly answers both 'what' (optimize Spark jobs with partitioning, caching, shuffle optimization, memory tuning) and 'when' (improving Spark performance, debugging slow jobs, scaling data processing pipelines) with an explicit 'Use when' clause.

3 / 3

Trigger Term Quality

Includes strong natural keywords users would say: 'Spark', 'partitioning', 'caching', 'shuffle', 'memory tuning', 'performance', 'slow jobs', 'data processing pipelines'. These cover common terms a user would use when seeking Spark optimization help.

3 / 3

Distinctiveness Conflict Risk

Clearly scoped to Apache Spark optimization specifically, with distinct triggers like 'Spark', 'shuffle optimization', and 'partitioning' that are unlikely to conflict with general data engineering or other processing skills.

3 / 3

Total

12

/

12

Passed

Implementation

42%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides highly actionable, executable Spark optimization code covering a comprehensive range of patterns, which is its primary strength. However, it is far too verbose for a SKILL.md - it explains concepts Claude already knows (Spark execution model, what storage levels mean), and packs ~300 lines of detailed patterns into a single file that should use progressive disclosure with references to sub-files. The lack of a diagnostic workflow connecting the patterns reduces its effectiveness as a troubleshooting guide.

Suggestions

Move detailed patterns (join optimization, memory tuning, caching, data formats, monitoring) into separate referenced files, keeping only the Quick Start and Configuration Cheat Sheet in SKILL.md with clear links.

Remove the 'Core Concepts' section entirely - Claude already understands Spark's execution model, what shuffles are, and basic performance factors.

Add a diagnostic workflow at the top: 'Step 1: Check Spark UI for skew/spills → Step 2: Identify bottleneck type → Step 3: Apply relevant pattern → Step 4: Verify improvement with metrics comparison.'

Remove inline comments that explain obvious things (e.g., '# Fast compression', '# No shuffle', '# Spark only reads these columns') to reduce token count.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~300+ lines. It explains basic Spark concepts Claude already knows (execution model, what shuffles are, storage level definitions), includes redundant tables of concepts, and has extensive inline comments explaining obvious things. The 'Core Concepts' section with the execution model diagram and key performance factors table adds little value for Claude.

1 / 3

Actionability

The skill provides fully executable Python code throughout - from session configuration to partitioning, join optimization, caching, memory tuning, and monitoring. Code examples are copy-paste ready with specific configurations, function definitions, and concrete patterns like the salt_join function.

3 / 3

Workflow Clarity

While individual patterns are clear, there's no overarching workflow for diagnosing and fixing Spark performance issues. The patterns are presented as isolated techniques without a clear decision tree or sequence. The monitoring/debugging pattern (Pattern 7) comes last rather than being positioned as a diagnostic first step. No validation checkpoints for verifying optimizations actually improved performance.

2 / 3

Progressive Disclosure

This is a monolithic wall of content with 7 detailed patterns, a configuration cheat sheet, and best practices all inline. At this length, patterns like join optimization, memory tuning, and data format optimization should be split into separate referenced files. There are no references to external files for deeper dives.

1 / 3

Total

7

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
wshobson/agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.