spark-engineer

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.

Quality

88%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No findings from the security scan

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, actionable skill with executable code examples covering key Spark patterns (broadcast joins, skew salting, caching) and a well-structured workflow with explicit validation checkpoints and feedback loops. The main weaknesses are minor verbosity (persona description, knowledge reference section listing concepts Claude already knows) and the fact that all 5 referenced files in the progressive disclosure table don't exist in the bundle, reducing the practical value of the reference structure.

Suggestions

Remove the 'Knowledge Reference' section — it lists concepts Claude already knows and adds no actionable value.

Provide the referenced files (e.g., `references/spark-sql-dataframes.md`) or remove the reference table if they don't exist, as broken references reduce trust in the skill's structure.

Dimension	Reasoning	Score
Conciseness	Generally efficient but includes some unnecessary framing (e.g., the opening sentence describing the persona, the 'Knowledge Reference' section listing concepts Claude already knows like 'catalyst optimizer, tungsten execution engine'). The constraints section has some obvious items ('understand lazy evaluation') but most content earns its place.	2 / 3
Actionability	Provides fully executable PySpark code examples covering multiple common scenarios (mini-pipeline, broadcast join, skew handling with salting, caching pattern). Code is copy-paste ready with imports, complete function calls, and inline comments explaining intent. The constraints section gives specific, actionable rules with concrete thresholds (e.g., '<200MB' for broadcast, '200-1000 partitions per executor core').	3 / 3
Workflow Clarity	The core workflow has a clear 5-step sequence with an explicit validation checkpoint in step 5 that includes specific verification commands (`df.rdd.getNumPartitions()`), what to check in Spark UI (shuffle spill), and a feedback loop ('if spill or skew detected, return to step 4'). Code examples also embed validation steps (e.g., printing partition count before writing, materializing cache and checking for spill).	3 / 3
Progressive Disclosure	The reference table with 5 topic-specific files is well-structured and clearly signaled with 'Load When' guidance. However, no bundle files were provided, so the referenced files (e.g., `references/spark-sql-dataframes.md`) don't actually exist, making the progressive disclosure aspirational rather than functional. The main file itself is reasonably well-organized but includes substantial inline content (constraints, output templates, knowledge reference) that could be trimmed or moved.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that excels across all dimensions. It provides comprehensive, specific actions, uses natural trigger terms that Spark users would employ, clearly delineates both what the skill does and when to use it, and is highly distinctive to the Apache Spark domain. The description is well-structured with a 'Use when' clause followed by an 'Invoke to' clause, and uses proper third-person voice throughout.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'write DataFrame transformations', 'optimize Spark SQL queries', 'implement RDD pipelines', 'tune shuffle operations', 'configure executor memory', 'process .parquet files', 'handle data partitioning', 'build structured streaming analytics'.	3 / 3
Completeness	Clearly answers both 'what' (write DataFrame transformations, optimize queries, tune shuffle operations, etc.) and 'when' with explicit triggers ('Use when writing Spark jobs, debugging performance issues, or configuring cluster settings'). The 'Use when...' and 'Invoke to...' clauses provide clear guidance.	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'Spark jobs', 'performance issues', 'cluster settings', 'Apache Spark', 'distributed data processing', 'big data', 'DataFrame', 'Spark SQL', 'RDD', 'shuffle', 'executor memory', '.parquet', 'partitioning', 'structured streaming'. These are terms a user working with Spark would naturally use.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with a clear niche around Apache Spark specifically. Terms like 'Spark SQL', 'RDD pipelines', 'executor memory', 'shuffle operations', and '.parquet files' are strongly associated with Spark and unlikely to conflict with general data processing or other big data skills.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: Jeffallan/claude-skills
Path: skills/spark-engineer/SKILL.md
Commit: e8be415

Reviewed: about 5 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.