Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.
72
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly defines its scope around Apache Spark development and optimization. It excels in specificity by listing numerous concrete actions, provides excellent trigger term coverage with domain-specific keywords users would naturally use, and explicitly addresses both what the skill does and when to invoke it. The description is well-structured, uses third person voice appropriately, and is distinctive enough to avoid conflicts with other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'write DataFrame transformations', 'optimize Spark SQL queries', 'implement RDD pipelines', 'tune shuffle operations', 'configure executor memory', 'process .parquet files', 'handle data partitioning', 'build structured streaming analytics'. | 3 / 3 |
Completeness | Clearly answers both 'what' (write DataFrame transformations, optimize queries, tune shuffle operations, etc.) and 'when' with explicit triggers ('Use when writing Spark jobs, debugging performance issues, or configuring cluster settings'). The 'Use when...' and 'Invoke to...' clauses provide clear guidance. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'Spark jobs', 'performance issues', 'cluster settings', 'Apache Spark', 'distributed data processing', 'big data', 'DataFrame', 'Spark SQL', 'RDD', 'shuffle', 'executor memory', '.parquet', 'partitioning', 'structured streaming'. These are terms a user working with Spark would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche around Apache Spark specifically. Terms like 'Spark SQL', 'RDD pipelines', 'executor memory', 'shuffle operations', and '.parquet files' are strongly associated with Spark and unlikely to conflict with general data processing or other big data tool skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured Spark skill with excellent actionability through complete, executable code examples covering key patterns (broadcast joins, skew salting, caching). The workflow includes proper validation checkpoints and feedback loops. Main weaknesses are minor verbosity (role description, knowledge reference section listing concepts Claude knows) and missing bundle files for the referenced guides, which limits the progressive disclosure score.
Suggestions
Remove the 'Knowledge Reference' section — it lists concepts Claude already knows and adds no actionable guidance.
Remove or shorten the opening role description ('Senior Apache Spark engineer specializing in...') as it doesn't provide actionable instruction.
Create the referenced bundle files (references/spark-sql-dataframes.md, etc.) to fulfill the progressive disclosure promise of the reference table.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient but includes some unnecessary content like the 'Knowledge Reference' section which just lists concepts Claude already knows, and the role description at the top ('Senior Apache Spark engineer specializing in...') adds no actionable value. The constraints section has some obvious items (e.g., 'Run transformations without understanding lazy evaluation'). However, the code examples are lean and the reference table is well-structured. | 2 / 3 |
Actionability | Provides fully executable PySpark code examples covering multiple common scenarios (mini-pipeline, broadcast join, skew handling with salting, caching pattern). Each example is copy-paste ready with imports, concrete values, and inline comments explaining intent. The constraints section gives specific thresholds (e.g., '<200MB for broadcast', '200-1000 partitions per executor core'). | 3 / 3 |
Workflow Clarity | The core workflow has a clear 5-step sequence with an explicit validation checkpoint in step 5 that includes checking Spark UI for shuffle spill, verifying partition counts, and a feedback loop ('if spill or skew detected, return to step 4'). The caching example also includes a validation step ('Materialize immediately; check Spark UI for spill') and cleanup ('unpersist when done'). | 3 / 3 |
Progressive Disclosure | The reference table with 5 topic-specific files is well-structured with clear 'Load When' guidance, which is excellent design. However, no bundle files were provided, meaning all referenced files (references/spark-sql-dataframes.md, etc.) are missing. The skill inlines a substantial amount of content (code examples, constraints, output templates) that could arguably be split out, though the inline content is useful as a quick reference. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
e8be415
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.