Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.
91
92%
Does it follow best practices?
Impact
89%
1.05xAverage score across 6 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that excels across all dimensions. It provides comprehensive, specific actions tied to the Apache Spark ecosystem, includes abundant natural trigger terms that users would actually say, and clearly delineates both what the skill does and when to invoke it. The description is well-structured with a 'Use when' clause followed by an 'Invoke to' clause, and uses proper third-person voice throughout.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'write DataFrame transformations', 'optimize Spark SQL queries', 'implement RDD pipelines', 'tune shuffle operations', 'configure executor memory', 'process .parquet files', 'handle data partitioning', 'build structured streaming analytics'. | 3 / 3 |
Completeness | Clearly answers both 'what' (write DataFrame transformations, optimize queries, implement RDD pipelines, etc.) and 'when' with explicit triggers ('Use when writing Spark jobs, debugging performance issues, or configuring cluster settings'). The 'Use when...' and 'Invoke to...' clauses provide clear guidance. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'Spark jobs', 'performance issues', 'cluster settings', 'Apache Spark', 'distributed data processing', 'big data', 'DataFrame', 'Spark SQL', 'RDD', 'shuffle', 'executor memory', '.parquet', 'partitioning', 'structured streaming'. These are terms a user working with Spark would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche around Apache Spark specifically. Terms like 'Spark SQL', 'RDD pipelines', 'executor memory', 'shuffle operations', and '.parquet files' are unique to the Spark ecosystem and unlikely to conflict with general data processing or other big data skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong skill with excellent actionability through complete, executable code examples and good progressive disclosure via the reference table. The workflow includes proper validation checkpoints and feedback loops. Minor improvements could be made by trimming the persona description, the 'Knowledge Reference' section, and some constraints that state things Claude already knows.
Suggestions
Remove the 'Knowledge Reference' section at the bottom — Claude already knows these concepts and this wastes tokens.
Trim the opening persona sentence and some obvious MUST NOT items (e.g., 'Run transformations without understanding lazy evaluation') to improve conciseness.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient but includes some unnecessary content like the 'Knowledge Reference' section at the bottom (Claude already knows these concepts) and the persona description at the top. The constraints section has some items that are basic Spark knowledge Claude would already have (e.g., 'understand lazy evaluation'). | 2 / 3 |
Actionability | Provides fully executable PySpark code examples covering multiple common scenarios (mini-pipeline, broadcast join, skew handling with salting, caching pattern). All examples are copy-paste ready with proper imports and realistic patterns. | 3 / 3 |
Workflow Clarity | The core workflow has clear sequencing with explicit validation checkpoints: checking Spark UI for shuffle spill, verifying partition counts, and a feedback loop ('if spill or skew detected, return to step 4'). Code examples also embed validation steps like printing partition counts before writing. | 3 / 3 |
Progressive Disclosure | Excellent structure with a clear overview, concise inline examples for common patterns, and a well-organized reference table pointing to one-level-deep topic-specific files with clear 'Load When' guidance for each reference. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
3d95bb1
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.