spark-engineer

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.

Quality

88%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines its scope around Apache Spark development and optimization. It excels in specificity by listing numerous concrete actions, provides excellent trigger term coverage with domain-specific keywords users would naturally use, and explicitly addresses both what the skill does and when to invoke it. The description is well-structured, uses third person voice appropriately, and is distinctive enough to avoid conflicts with other skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'write DataFrame transformations', 'optimize Spark SQL queries', 'implement RDD pipelines', 'tune shuffle operations', 'configure executor memory', 'process .parquet files', 'handle data partitioning', 'build structured streaming analytics'.	3 / 3
Completeness	Clearly answers both 'what' (write DataFrame transformations, optimize queries, tune shuffle operations, etc.) and 'when' with explicit triggers ('Use when writing Spark jobs, debugging performance issues, or configuring cluster settings'). The 'Use when...' and 'Invoke to...' clauses provide clear guidance.	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'Spark jobs', 'performance issues', 'cluster settings', 'Apache Spark', 'distributed data processing', 'big data', 'DataFrame', 'Spark SQL', 'RDD', 'shuffle', 'executor memory', '.parquet', 'partitioning', 'structured streaming'. These are terms a user working with Spark would naturally use.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with a clear niche around Apache Spark specifically. Terms like 'Spark SQL', 'RDD pipelines', 'executor memory', 'shuffle operations', and '.parquet files' are strongly associated with Spark and unlikely to conflict with general data processing or other big data tool skills.	3 / 3
	Total	12 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured Spark skill with excellent actionability through complete, executable code examples covering key patterns (broadcast joins, skew salting, caching). The workflow includes proper validation checkpoints and feedback loops. Main weaknesses are minor verbosity (role description, knowledge reference section listing concepts Claude knows) and missing bundle files for the referenced guides, which limits the progressive disclosure score.

Suggestions

Remove the 'Knowledge Reference' section — it lists concepts Claude already knows and adds no actionable guidance.

Remove or shorten the opening role description ('Senior Apache Spark engineer specializing in...') as it doesn't provide actionable instruction.

Create the referenced bundle files (references/spark-sql-dataframes.md, etc.) to fulfill the progressive disclosure promise of the reference table.

Dimension	Reasoning	Score
Conciseness	Generally efficient but includes some unnecessary content like the 'Knowledge Reference' section which just lists concepts Claude already knows, and the role description at the top ('Senior Apache Spark engineer specializing in...') adds no actionable value. The constraints section has some obvious items (e.g., 'Run transformations without understanding lazy evaluation'). However, the code examples are lean and the reference table is well-structured.	2 / 3
Actionability	Provides fully executable PySpark code examples covering multiple common scenarios (mini-pipeline, broadcast join, skew handling with salting, caching pattern). Each example is copy-paste ready with imports, concrete values, and inline comments explaining intent. The constraints section gives specific thresholds (e.g., '<200MB for broadcast', '200-1000 partitions per executor core').	3 / 3
Workflow Clarity	The core workflow has a clear 5-step sequence with an explicit validation checkpoint in step 5 that includes checking Spark UI for shuffle spill, verifying partition counts, and a feedback loop ('if spill or skew detected, return to step 4'). The caching example also includes a validation step ('Materialize immediately; check Spark UI for spill') and cleanup ('unpersist when done').	3 / 3
Progressive Disclosure	The reference table with 5 topic-specific files is well-structured with clear 'Load When' guidance, which is excellent design. However, no bundle files were provided, meaning all referenced files (references/spark-sql-dataframes.md, etc.) are missing. The skill inlines a substantial amount of content (code examples, constraints, output templates) that could arguably be split out, though the inline content is useful as a quick reference.	2 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: jeffallan/claude-skills
Commit: e8be415

Reviewed: 7 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.