Build custom Python data sources for Apache Spark using the PySpark DataSource API — batch and streaming readers/writers for external systems. Use this skill whenever someone wants to connect Spark to an external system (database, API, message queue, custom protocol), build a Spark connector or plugin in Python, implement a DataSourceReader or DataSourceWriter, pull data from or push data to a system via Spark, or work with the PySpark DataSource API in any way. Even if they just say "read from X in Spark" or "write DataFrame to Y" and there's no native connector, this skill applies.
89
86%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines its niche (custom PySpark DataSource API connectors), lists concrete actions and capabilities, and provides comprehensive trigger guidance with both technical terms and natural language examples. It uses proper third-person voice throughout and covers edge cases like when no native connector exists.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Build custom Python data sources', 'batch and streaming readers/writers', 'connect Spark to an external system (database, API, message queue, custom protocol)', 'implement a DataSourceReader or DataSourceWriter', 'pull data from or push data to a system via Spark'. | 3 / 3 |
Completeness | Clearly answers both 'what' (build custom Python data sources for Spark with batch/streaming readers/writers) and 'when' with an explicit 'Use this skill whenever...' clause listing multiple trigger scenarios, plus natural language examples like 'read from X in Spark'. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'Spark connector', 'plugin', 'read from X in Spark', 'write DataFrame to Y', 'PySpark DataSource API', 'database, API, message queue', 'external system', 'DataSourceReader', 'DataSourceWriter'. Covers both technical and casual phrasings. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche: PySpark DataSource API for custom connectors. The specificity of 'DataSource API', 'DataSourceReader', 'DataSourceWriter', and the Python+Spark combination makes it very unlikely to conflict with general data processing or generic Spark skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
72%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill that excels at progressive disclosure and conciseness, providing Spark DataSource API-specific constraints and patterns without over-explaining. Its main weaknesses are the lack of an inline minimal working example of a DataSource class (the core artifact) and the absence of explicit validation checkpoints in the workflow. The reference architecture is excellent but the main file could be more self-contained for the happy path.
Suggestions
Add a minimal but complete inline DataSource class example (even 15-20 lines) showing the simplest possible batch reader, so the skill is actionable without opening implementation-template.md.
Add an explicit end-to-end workflow with validation steps, e.g.: 1) Create project → 2) Implement DataSource → 3) Register with spark.dataSource.register() → 4) Verify with spark.read.format('your-format').load() → 5) Run tests.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is lean and efficient. It avoids explaining general Python or Spark concepts Claude already knows, and explicitly states it won't repeat general best practices. Every section provides Spark DataSource API-specific knowledge that earns its place. | 3 / 3 |
Actionability | The skill provides concrete project setup commands, a directory structure, and a testing example with executable code. However, the core implementation guidance is delegated to reference files (implementation-template.md) rather than including even a minimal inline executable example of a DataSource class, leaving the main skill somewhat incomplete on its own. | 2 / 3 |
Workflow Clarity | The core architecture section outlines a clear 4-part structure, and the project setup provides sequenced commands. However, there's no explicit end-to-end workflow with validation checkpoints — e.g., no step to verify the data source registers correctly, no feedback loop for debugging serialization issues, and the 'Key Implementation Decisions' section reads more like a menu than a sequenced process. | 2 / 3 |
Progressive Disclosure | Excellent progressive disclosure with a clear overview in the main file and well-signaled, one-level-deep references to 7+ specialized reference files. Each reference includes a brief description of when to read it, making navigation easy and purposeful. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
02aac8c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.