spark-engineer

Use when building Apache Spark applications, distributed data processing pipelines, or optimizing big data workloads. Invoke for DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics.

Install with Tessl CLI

npx tessl i github:jeffallan/claude-skills --skill spark-engineer

What are skills?

Overall
score

67%

Review — 65%

Does it follow best practices?

If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:

npx tessl skill review --optimize ./path/to/skill

Learn more

Validation — 12 / 16 Passed

Validation for skill structure

SKILL.md

Review

Evals

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid skill description with excellent trigger term coverage and clear 'when to use' guidance. The main weakness is that it lists technical concepts rather than concrete actions Claude can perform. Adding specific verbs like 'write', 'optimize', 'debug', or 'migrate' would strengthen the specificity dimension.

Suggestions

Add concrete action verbs to describe what Claude does: e.g., 'Write and optimize Spark applications, debug performance issues, migrate RDD code to DataFrame API'

Consider adding file extensions or common phrases: '.scala', '.py Spark jobs', 'PySpark', 'spark-submit'

Dimension	Reasoning	Score
Specificity	Names the domain (Apache Spark, distributed data processing) and lists some technical areas (DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics), but doesn't describe concrete actions like 'write', 'optimize', 'debug', or 'configure'.	2 / 3
Completeness	Explicitly answers both what (building Spark applications, distributed data processing, big data workloads) and when ('Use when building...', 'Invoke for...') with clear trigger guidance at the start of the description.	3 / 3
Trigger Term Quality	Good coverage of natural terms users would say: 'Spark', 'DataFrame', 'Spark SQL', 'RDD', 'big data', 'streaming analytics', 'performance tuning', 'distributed data processing' - these are terms developers naturally use when working with Spark.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with Spark-specific terminology (DataFrame API, RDD, Spark SQL) that clearly separates it from general data processing or other big data tools like Hadoop or Flink.	3 / 3
	Total	11 / 12 Passed

Implementation

42%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides good structural organization and clear progressive disclosure to reference materials, but critically lacks actionable code examples. The content describes Spark best practices at a conceptual level without providing the executable code, specific commands, or concrete examples that would make it immediately useful. The constraints sections (MUST DO/MUST NOT DO) are valuable but would benefit from accompanying code snippets.

Suggestions

Add executable PySpark code examples for common operations (DataFrame creation with schema, broadcast join, handling skew with salting)

Include specific configuration examples with actual values (e.g., spark.sql.shuffle.partitions=400, executor memory settings)

Add a concrete example showing Spark UI analysis workflow with specific metrics to check and thresholds

Remove or condense the 'Knowledge Reference' keyword list which adds little actionable value

Dimension	Reasoning	Score
Conciseness	The skill contains some unnecessary verbosity in the role definition section that repeats information Claude already knows. The constraints and workflow sections are reasonably efficient, but the 'Knowledge Reference' section is essentially a keyword list that adds little value.	2 / 3
Actionability	The skill lacks any concrete, executable code examples. It describes what to do at a high level (use DataFrame API, broadcast joins, etc.) but provides no actual Spark code, commands, or copy-paste ready examples. The guidance is abstract rather than instructional.	1 / 3
Workflow Clarity	The core workflow provides a clear 5-step sequence, but lacks validation checkpoints and feedback loops. For operations involving production data pipelines, there's no explicit 'validate before proceeding' step or error recovery guidance.	2 / 3
Progressive Disclosure	The skill effectively uses a reference table to point to detailed guidance in separate files, with clear topic categorization and 'Load When' conditions. This is well-organized one-level-deep progressive disclosure.	3 / 3
	Total	8 / 12 Passed

Validation

75%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 12 / 16 Passed

Validation for skill structure

Criteria	Description	Result
metadata_version	'metadata' field is not a dictionary	Warning
license_field	'license' field is missing	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning
body_examples	No examples detected (no code fences and no 'Example' wording)	Warning

	Total	12 / 16 Passed

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.