databricks-spark-structured-streaming

Comprehensive guide to Spark Structured Streaming for production workloads. Use when building streaming pipelines, working with Kafka ingestion, implementing Real-Time Mode (RTM), configuring triggers (processingTime, availableNow), handling stateful operations with watermarks, optimizing checkpoints, performing stream-stream or stream-static joins, writing to multiple sinks, or tuning streaming cost and performance.

Quality

93%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines its scope around Spark Structured Streaming for production workloads. It provides comprehensive trigger terms covering the full breadth of streaming concepts, includes an explicit 'Use when...' clause with numerous specific scenarios, and is highly distinctive from other potential skills. The description is well-structured, uses third person voice appropriately, and balances conciseness with thoroughness.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: building streaming pipelines, Kafka ingestion, implementing Real-Time Mode, configuring triggers, handling stateful operations with watermarks, optimizing checkpoints, performing stream-stream/stream-static joins, writing to multiple sinks, and tuning cost/performance.	3 / 3
Completeness	Clearly answers both 'what' (comprehensive guide to Spark Structured Streaming for production workloads) and 'when' with an explicit 'Use when...' clause listing numerous specific trigger scenarios.	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'Spark Structured Streaming', 'streaming pipelines', 'Kafka ingestion', 'RTM', 'processingTime', 'availableNow', 'watermarks', 'checkpoints', 'stream-stream joins', 'stream-static joins', 'multiple sinks', 'streaming cost and performance'. These are terms a developer would naturally use.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with a clear niche in Spark Structured Streaming. The specific mentions of Kafka, RTM, watermarks, checkpoints, and streaming joins make it very unlikely to conflict with other skills like general Spark batch processing or generic data pipeline skills.	3 / 3
	Total	12 / 12 Passed

Implementation

87%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured navigation skill that efficiently provides a quick-start example and organizes detailed content across clearly referenced sub-files. Its main weakness is the lack of validation steps or feedback loops in the quick-start workflow (e.g., checking stream status, handling failures). The production checklist is a nice touch but could benefit from being more actionable with specific commands or code snippets.

Suggestions

Add a brief validation step after the quick-start example, such as checking stream status with `query.status` or `query.lastProgress` to verify the pipeline is running correctly.

Consider adding a 1-2 line error recovery note (e.g., what to do if the stream fails to start or checkpoint is corrupted) to improve workflow clarity.

Dimension	Reasoning	Score
Conciseness	The content is lean and efficient. It provides a quick-start code example without explaining what Kafka or Spark Streaming is, assumes Claude's competence, and uses tables for navigation rather than verbose prose. Every section earns its place.	3 / 3
Actionability	The quick-start example is fully executable, copy-paste ready PySpark code covering Kafka-to-Delta streaming with proper options. The production checklist provides specific, concrete guidance (e.g., 'UC volumes, not DBFS', 'fixed-size cluster, no autoscaling').	3 / 3
Workflow Clarity	The skill is primarily a navigation hub, so multi-step workflows are delegated to referenced files. However, the production checklist lacks sequencing and there are no validation/verification steps or feedback loops for the quick-start streaming pipeline (e.g., how to verify the stream is running correctly, error handling).	2 / 3
Progressive Disclosure	Excellent progressive disclosure structure: concise overview with quick-start code, then well-organized tables pointing to one-level-deep references for each topic area (kafka-streaming.md, stream-stream-joins.md, etc.). Navigation is clear and well-signaled.	3 / 3
	Total	11 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: databricks-solutions/ai-dev-kit
Commit: 93cb4e3

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.