CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

spark-engineer

tessl i github:jeffallan/claude-skills --skill spark-engineer
github.com/jeffallan/claude-skills

Use when building Apache Spark applications, distributed data processing pipelines, or optimizing big data workloads. Invoke for DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics.

Review Score

67%

Validation Score

12/16

Implementation Score

42%

Activation Score

90%

SKILL.md
Review
Evals

Generated

Validation

Total

12/16

Score

Passed
CriteriaScore

metadata_version

'metadata' field is not a dictionary

license_field

'license' field is missing

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

body_examples

No examples detected (no code fences and no 'Example' wording)

Implementation

Suggestions 4

Score

42%

Overall Assessment

This skill provides good structural organization and clear progressive disclosure to reference materials, but critically lacks actionable code examples. The content describes Spark best practices at a conceptual level without providing the executable code, specific commands, or concrete examples that would make it immediately useful. The constraints sections (MUST DO/MUST NOT DO) are valuable but would benefit from accompanying code snippets.

Suggestions

  • Add executable PySpark code examples for common operations (DataFrame creation with schema, broadcast join, handling skew with salting)
  • Include specific configuration examples with actual values (e.g., spark.sql.shuffle.partitions=400, executor memory settings)
  • Add a concrete example showing Spark UI analysis workflow with specific metrics to check and thresholds
  • Remove or condense the 'Knowledge Reference' keyword list which adds little actionable value
DimensionScoreReasoning

Conciseness

2/3

The skill contains some unnecessary verbosity in the role definition section that repeats information Claude already knows. The constraints and workflow sections are reasonably efficient, but the 'Knowledge Reference' section is essentially a keyword list that adds little value.

Actionability

1/3

The skill lacks any concrete, executable code examples. It describes what to do at a high level (use DataFrame API, broadcast joins, etc.) but provides no actual Spark code, commands, or copy-paste ready examples. The guidance is abstract rather than instructional.

Workflow Clarity

2/3

The core workflow provides a clear 5-step sequence, but lacks validation checkpoints and feedback loops. For operations involving production data pipelines, there's no explicit 'validate before proceeding' step or error recovery guidance.

Progressive Disclosure

3/3

The skill effectively uses a reference table to point to detailed guidance in separate files, with clear topic categorization and 'Load When' conditions. This is well-organized one-level-deep progressive disclosure.

Activation

Suggestions 2

Score

90%

Overall Assessment

This is a solid skill description with excellent trigger term coverage and clear 'when to use' guidance. The main weakness is that it lists technical concepts rather than concrete actions Claude can perform. Adding specific verbs like 'write', 'optimize', 'debug', or 'migrate' would strengthen the specificity dimension.

Suggestions

  • Add concrete action verbs to describe what Claude does: e.g., 'Write and optimize Spark applications, debug performance issues, migrate RDD code to DataFrame API'
  • Consider adding file extensions or common phrases: '.scala', '.py Spark jobs', 'PySpark', 'spark-submit'
DimensionScoreReasoning

Specificity

2/3

Names the domain (Apache Spark, distributed data processing) and lists some technical areas (DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics), but doesn't describe concrete actions like 'write', 'optimize', 'debug', or 'configure'.

Completeness

3/3

Explicitly answers both what (building Spark applications, distributed data processing, big data workloads) and when ('Use when building...', 'Invoke for...') with clear trigger guidance at the start of the description.

Trigger Term Quality

3/3

Good coverage of natural terms users would say: 'Spark', 'DataFrame', 'Spark SQL', 'RDD', 'big data', 'streaming analytics', 'performance tuning', 'distributed data processing' - these are terms developers naturally use when working with Spark.

Distinctiveness Conflict Risk

3/3

Highly distinctive with Spark-specific terminology (DataFrame API, RDD, Spark SQL) that clearly separates it from general data processing or other big data tools like Hadoop or Flink.