Use when building Apache Spark applications, distributed data processing pipelines, or optimizing big data workloads. Invoke for DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics.
Install with Tessl CLI
npx tessl i github:jeffallan/claude-skills --skill spark-engineerOverall
score
67%
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid skill description with excellent trigger term coverage and clear 'when to use' guidance. The main weakness is that it lists technical concepts rather than concrete actions Claude can perform. Adding specific verbs like 'write', 'optimize', 'debug', or 'migrate' would strengthen the specificity dimension.
Suggestions
Add concrete action verbs to describe what Claude does: e.g., 'Write and optimize Spark applications, debug performance issues, migrate RDD code to DataFrame API'
Consider adding file extensions or common phrases: '.scala', '.py Spark jobs', 'PySpark', 'spark-submit'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Apache Spark, distributed data processing) and lists some technical areas (DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics), but doesn't describe concrete actions like 'write', 'optimize', 'debug', or 'configure'. | 2 / 3 |
Completeness | Explicitly answers both what (building Spark applications, distributed data processing, big data workloads) and when ('Use when building...', 'Invoke for...') with clear trigger guidance at the start of the description. | 3 / 3 |
Trigger Term Quality | Good coverage of natural terms users would say: 'Spark', 'DataFrame', 'Spark SQL', 'RDD', 'big data', 'streaming analytics', 'performance tuning', 'distributed data processing' - these are terms developers naturally use when working with Spark. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with Spark-specific terminology (DataFrame API, RDD, Spark SQL) that clearly separates it from general data processing or other big data tools like Hadoop or Flink. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides good structural organization and clear progressive disclosure to reference materials, but critically lacks actionable code examples. The content describes Spark best practices at a conceptual level without providing the executable code, specific commands, or concrete examples that would make it immediately useful. The constraints sections (MUST DO/MUST NOT DO) are valuable but would benefit from accompanying code snippets.
Suggestions
Add executable PySpark code examples for common operations (DataFrame creation with schema, broadcast join, handling skew with salting)
Include specific configuration examples with actual values (e.g., spark.sql.shuffle.partitions=400, executor memory settings)
Add a concrete example showing Spark UI analysis workflow with specific metrics to check and thresholds
Remove or condense the 'Knowledge Reference' keyword list which adds little actionable value
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill contains some unnecessary verbosity in the role definition section that repeats information Claude already knows. The constraints and workflow sections are reasonably efficient, but the 'Knowledge Reference' section is essentially a keyword list that adds little value. | 2 / 3 |
Actionability | The skill lacks any concrete, executable code examples. It describes what to do at a high level (use DataFrame API, broadcast joins, etc.) but provides no actual Spark code, commands, or copy-paste ready examples. The guidance is abstract rather than instructional. | 1 / 3 |
Workflow Clarity | The core workflow provides a clear 5-step sequence, but lacks validation checkpoints and feedback loops. For operations involving production data pipelines, there's no explicit 'validate before proceeding' step or error recovery guidance. | 2 / 3 |
Progressive Disclosure | The skill effectively uses a reference table to point to detailed guidance in separate files, with clear topic categorization and 'Load When' conditions. This is well-organized one-level-deep progressive disclosure. | 3 / 3 |
Total | 8 / 12 Passed |
Validation
75%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 12 / 16 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata' field is not a dictionary | Warning |
license_field | 'license' field is missing | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
body_examples | No examples detected (no code fences and no 'Example' wording) | Warning |
Total | 12 / 16 Passed | |
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.