CtrlK
BlogDocsLog inGet started
Tessl Logo

spark-python-data-source

Build custom Python data sources for Apache Spark using the PySpark DataSource API — batch and streaming readers/writers for external systems. Use this skill whenever someone wants to connect Spark to an external system (database, API, message queue, custom protocol), build a Spark connector or plugin in Python, implement a DataSourceReader or DataSourceWriter, pull data from or push data to a system via Spark, or work with the PySpark DataSource API in any way. Even if they just say "read from X in Spark" or "write DataFrame to Y" and there's no native connector, this skill applies.

95

Quality

93%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines its scope (custom PySpark DataSource API connectors), lists concrete actions and use cases, and provides explicit trigger guidance with natural language examples. It uses proper third-person voice throughout and covers both common and edge-case trigger scenarios. The description is comprehensive without being unnecessarily verbose.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'Build custom Python data sources', 'batch and streaming readers/writers', 'connect Spark to an external system (database, API, message queue, custom protocol)', 'implement a DataSourceReader or DataSourceWriter', 'pull data from or push data to a system via Spark'.

3 / 3

Completeness

Clearly answers both 'what' (build custom Python data sources for Spark with batch/streaming readers/writers) and 'when' with an explicit 'Use this skill whenever...' clause listing multiple trigger scenarios, plus natural language examples like 'read from X in Spark'.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'Spark connector', 'plugin', 'read from X in Spark', 'write DataFrame to Y', 'PySpark DataSource API', 'database, API, message queue', 'external system'. These match how users naturally phrase requests.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive niche: PySpark DataSource API for custom connectors. The specificity of 'DataSource API', 'DataSourceReader', 'DataSourceWriter', and the Python+Spark combination makes it very unlikely to conflict with general Spark skills or general Python skills.

3 / 3

Total

12

/

12

Passed

Implementation

87%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-crafted skill that efficiently communicates PySpark DataSource API-specific knowledge without over-explaining basics. Its strengths are excellent progressive disclosure with clearly signaled references, actionable code examples, and concise Spark-specific design constraints. The main weakness is the lack of an explicit end-to-end implementation workflow with validation checkpoints between steps.

Suggestions

Add an explicit numbered workflow section (e.g., 'Building a Data Source Step-by-Step') with validation checkpoints such as: implement DataSource class → verify registration with spark.read.format() → implement reader → run unit tests → implement writer → run integration tests → build wheel.

DimensionReasoningScore

Conciseness

The content is lean and efficient. It avoids explaining basic Spark or Python concepts, focuses on PySpark DataSource API-specific constraints (serialization, executor imports, flat inheritance), and every section earns its place. The design constraints section explicitly states it won't repeat general Python best practices.

3 / 3

Actionability

Provides executable code for project setup (bash commands), directory structure, test examples with real pytest fixtures and mocking patterns, and concrete commands for running tests/linting. The design constraints are specific and actionable (e.g., 'import third-party libraries inside executor methods').

3 / 3

Workflow Clarity

The skill covers project setup and testing but lacks an explicit end-to-end workflow with validation checkpoints. There's no clear sequence like 'implement → validate schema → test locally → build → deploy' with verification steps. For a skill involving building connectors that interact with external systems, explicit validation/verification steps would strengthen the workflow.

2 / 3

Progressive Disclosure

Excellent progressive disclosure with a clear overview in the main file and well-signaled, one-level-deep references to 8 separate reference files. Each reference includes a brief description of when to read it. The References section at the bottom provides a complete navigation guide with context for each file.

3 / 3

Total

11

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
databricks-solutions/ai-dev-kit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.