spark-python-data-source

Build custom Python data sources for Apache Spark using the PySpark DataSource API — batch and streaming readers/writers for external systems. Use this skill whenever someone wants to connect Spark to an external system (database, API, message queue, custom protocol), build a Spark connector or plugin in Python, implement a DataSourceReader or DataSourceWriter, pull data from or push data to a system via Spark, or work with the PySpark DataSource API in any way. Even if they just say "read from X in Spark" or "write DataFrame to Y" and there's no native connector, this skill applies.

Quality

86%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines its scope (custom PySpark DataSource API connectors), lists concrete actions, provides explicit trigger guidance with a 'Use this skill whenever...' clause, and includes natural language examples of user requests. It uses proper third-person voice throughout and covers both technical and casual phrasings that users might employ.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'Build custom Python data sources', 'batch and streaming readers/writers', 'connect Spark to an external system (database, API, message queue, custom protocol)', 'implement a DataSourceReader or DataSourceWriter', 'pull data from or push data to a system via Spark'.	3 / 3
Completeness	Clearly answers both 'what' (build custom Python data sources for Spark with batch/streaming readers/writers) and 'when' with an explicit 'Use this skill whenever...' clause listing multiple trigger scenarios, plus natural language examples like 'read from X in Spark'.	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'Spark connector', 'plugin', 'read from X in Spark', 'write DataFrame to Y', 'PySpark DataSource API', 'database, API, message queue', 'external system', 'DataSourceReader', 'DataSourceWriter'. Covers both technical and casual phrasings.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive niche: PySpark DataSource API for custom connectors. The description is specific enough to avoid conflicting with general Spark usage skills, general Python skills, or generic database skills. The focus on building custom data sources/connectors is a clear differentiator.	3 / 3
	Total	12 / 12 Passed

Implementation

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill that excels at progressive disclosure and conciseness, providing a clear architectural overview without over-explaining concepts Claude already knows. Its main weakness is that actionability depends heavily on reference files that aren't provided in the bundle, meaning the SKILL.md alone gives limited executable guidance for the core task of implementing a data source. The workflow could benefit from an explicit step-by-step sequence with validation checkpoints.

Suggestions

Add a minimal but complete executable example of a simple DataSource implementation (e.g., a trivial in-memory or CSV-based source) directly in SKILL.md so the core pattern is actionable without consulting references.

Add an explicit end-to-end workflow section with numbered steps and validation checkpoints, e.g.: 1. Create project → 2. Implement DataSource class → 3. Register with Spark → 4. Verify with spark.read.format('your-format').load() → 5. Run tests → 6. Build wheel.

Dimension	Reasoning	Score
Conciseness	The content is lean and efficient. It avoids explaining basic Spark or Python concepts, focuses on PySpark DataSource API-specific constraints (serialization, executor imports, flat inheritance), and every section earns its place. The design constraints section explicitly states it won't repeat general Python best practices.	3 / 3
Actionability	The skill provides some concrete guidance — project setup commands, a test example, and directory structure — but the core implementation details are deferred to reference files (implementation-template.md, partitioning-patterns.md, etc.) which are not provided. The main SKILL.md itself lacks executable code for the actual data source implementation, relying on references that can't be evaluated.	2 / 3
Workflow Clarity	The core architecture section outlines the four-class structure clearly, and the project setup provides a clear sequence. However, there's no explicit end-to-end workflow with validation checkpoints — no 'build → register → test → validate → deploy' sequence with feedback loops. For a multi-step process like building a data source connector, the lack of explicit validation steps (e.g., verify registration, test serialization) is a gap.	2 / 3
Progressive Disclosure	Excellent progressive disclosure structure. The SKILL.md provides a clear overview with well-signaled, one-level-deep references to specific topics (implementation-template, partitioning-patterns, authentication-patterns, etc.). Each reference includes a brief description of when to consult it. The References section at the bottom provides a comprehensive index with contextual guidance on when each file is relevant.	3 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: databricks-solutions/ai-dev-kit
Commit: 93cb4e3

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.