CtrlK
BlogDocsLog inGet started
Tessl Logo

spark-engineer

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.

91

1.05x
Quality

92%

Does it follow best practices?

Impact

89%

1.05x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that excels across all dimensions. It provides comprehensive, specific actions tied to the Apache Spark ecosystem, includes abundant natural trigger terms that users would actually say, and clearly delineates both what the skill does and when to invoke it. The description is well-structured with a 'Use when' clause followed by an 'Invoke to' clause, and uses proper third-person voice throughout.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'write DataFrame transformations', 'optimize Spark SQL queries', 'implement RDD pipelines', 'tune shuffle operations', 'configure executor memory', 'process .parquet files', 'handle data partitioning', 'build structured streaming analytics'.

3 / 3

Completeness

Clearly answers both 'what' (write DataFrame transformations, optimize queries, implement RDD pipelines, etc.) and 'when' with explicit triggers ('Use when writing Spark jobs, debugging performance issues, or configuring cluster settings'). The 'Use when...' and 'Invoke to...' clauses provide clear guidance.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'Spark jobs', 'performance issues', 'cluster settings', 'Apache Spark', 'distributed data processing', 'big data', 'DataFrame', 'Spark SQL', 'RDD', 'shuffle', 'executor memory', '.parquet', 'partitioning', 'structured streaming'. These are terms a user working with Spark would naturally use.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche around Apache Spark specifically. Terms like 'Spark SQL', 'RDD pipelines', 'executor memory', 'shuffle operations', and '.parquet files' are unique to the Spark ecosystem and unlikely to conflict with general data processing or other big data skills.

3 / 3

Total

12

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong skill with excellent actionability through complete, executable code examples and good progressive disclosure via the reference table. The workflow includes proper validation checkpoints and feedback loops. Minor improvements could be made by trimming the persona description, the 'Knowledge Reference' section, and some constraints that state things Claude already knows.

Suggestions

Remove the 'Knowledge Reference' section at the bottom — Claude already knows these concepts and this wastes tokens.

Trim the opening persona sentence and some obvious MUST NOT items (e.g., 'Run transformations without understanding lazy evaluation') to improve conciseness.

DimensionReasoningScore

Conciseness

Generally efficient but includes some unnecessary content like the 'Knowledge Reference' section at the bottom (Claude already knows these concepts) and the persona description at the top. The constraints section has some items that are basic Spark knowledge Claude would already have (e.g., 'understand lazy evaluation').

2 / 3

Actionability

Provides fully executable PySpark code examples covering multiple common scenarios (mini-pipeline, broadcast join, skew handling with salting, caching pattern). All examples are copy-paste ready with proper imports and realistic patterns.

3 / 3

Workflow Clarity

The core workflow has clear sequencing with explicit validation checkpoints: checking Spark UI for shuffle spill, verifying partition counts, and a feedback loop ('if spill or skew detected, return to step 4'). Code examples also embed validation steps like printing partition counts before writing.

3 / 3

Progressive Disclosure

Excellent structure with a clear overview, concise inline examples for common patterns, and a well-organized reference table pointing to one-level-deep topic-specific files with clear 'Load When' guidance for each reference.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
jeffallan/claude-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.