Comprehensive guide to Spark Structured Streaming for production workloads. Use when building streaming pipelines, working with Kafka ingestion, implementing Real-Time Mode (RTM), configuring triggers (processingTime, availableNow), handling stateful operations with watermarks, optimizing checkpoints, performing stream-stream or stream-static joins, writing to multiple sinks, or tuning streaming cost and performance.
95
93%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines its scope (Spark Structured Streaming for production workloads), provides comprehensive trigger terms covering the domain's key concepts, and includes an explicit 'Use when...' clause with numerous specific scenarios. It uses proper third-person voice and is both thorough and well-organized without being padded with fluff.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description lists multiple specific concrete actions: building streaming pipelines, Kafka ingestion, implementing Real-Time Mode, configuring triggers (with specific types), handling stateful operations with watermarks, optimizing checkpoints, performing stream-stream/stream-static joins, writing to multiple sinks, and tuning cost/performance. | 3 / 3 |
Completeness | Clearly answers both 'what' (comprehensive guide to Spark Structured Streaming for production workloads) and 'when' with an explicit 'Use when...' clause listing numerous specific trigger scenarios covering the full breadth of the skill. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'streaming pipelines', 'Kafka', 'Spark Structured Streaming', 'triggers', 'processingTime', 'availableNow', 'watermarks', 'checkpoints', 'stream-stream joins', 'stream-static joins', 'sinks', 'RTM'. These are terms a developer working with Spark streaming would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche in Spark Structured Streaming specifically. The combination of Spark, streaming, Kafka, watermarks, checkpoints, and specific trigger types like processingTime/availableNow makes it very unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
87%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill that excels as a navigation hub for Spark Structured Streaming. It provides an actionable quick-start example, a useful production checklist, and clear references to detailed sub-topics. The main weakness is the lack of explicit validation/verification steps in the workflow, particularly around checkpoint setup and stream health verification before going to production.
Suggestions
Add a brief validation workflow section (e.g., 'After starting a stream: verify checkpoint writes, check streaming query progress via query.lastProgress, confirm no data loss') to improve workflow clarity.
Consider adding a 2-3 step 'troubleshooting' or 'common errors' quick reference inline, since streaming failures (checkpoint corruption, schema evolution issues) are common and benefit from immediate guidance.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient. It provides a quick-start code example without explaining what Kafka or Spark Streaming is, assumes Claude's competence, and uses tables for navigation rather than verbose prose. Every section earns its place. | 3 / 3 |
Actionability | The quick-start example is fully executable, copy-paste ready PySpark code covering Kafka-to-Delta streaming. The production checklist provides specific, concrete guidance (e.g., 'UC volumes, not DBFS', 'fixed-size cluster, no autoscaling'). Detailed patterns are delegated to referenced files. | 3 / 3 |
Workflow Clarity | The quick-start shows a single pipeline setup but lacks explicit validation steps or error handling checkpoints. The production checklist is helpful but is a static list rather than a sequenced workflow with feedback loops. For streaming pipelines that involve destructive/batch operations, the absence of validate-fix-retry patterns caps this at 2. | 2 / 3 |
Progressive Disclosure | Excellent progressive disclosure structure: a concise overview with quick-start code, then well-organized tables pointing to one-level-deep reference files for each major topic (kafka-streaming.md, checkpoint-best-practices.md, etc.). Navigation is clear and well-signaled. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
02aac8c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.