Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.
88
86%
Does it follow best practices?
Impact
91%
1.71xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines its purpose, provides explicit trigger conditions, and proactively distinguishes itself from related tools. The inclusion of when NOT to use it (preferring vaex or polars in specific scenarios) is particularly strong for reducing conflict risk. The description uses proper third-person voice and is concise yet comprehensive.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions and use cases: 'larger-than-RAM pandas/NumPy workflows', 'parallel file processing', 'distributed ML', 'integration with existing pandas code', and even contrasts with alternatives (vaex, polars) for different scenarios. | 3 / 3 |
Completeness | Clearly answers both 'what' (distributed computing for larger-than-RAM pandas/NumPy workflows) and 'when' ('Use when you need to scale existing pandas/NumPy code beyond memory or across clusters'), with explicit trigger guidance and even negative triggers distinguishing from vaex and polars. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'pandas', 'NumPy', 'larger-than-RAM', 'distributed computing', 'clusters', 'parallel file processing', 'distributed ML', 'out-of-core', 'scale', plus mentions of alternative tools (vaex, polars) which helps with disambiguation. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive by explicitly carving out its niche (distributed/larger-than-RAM pandas/NumPy) and differentiating from competing skills (vaex for out-of-core single machine, polars for in-memory speed), making it very unlikely to conflict with other data processing skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
72%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill with strong actionability and excellent progressive disclosure to reference files. Its main weaknesses are moderate verbosity (redundant 'when to use' guidance, explanatory content Claude doesn't need) and lack of explicit validation/feedback loops in multi-step workflows. The code examples are consistently high quality and executable.
Suggestions
Remove or significantly condense the 'When to Use This Skill' section and per-component 'When to Use' blocks — the description metadata and quick examples already convey this; merge the decision guide into a single compact table.
Add explicit validation checkpoints to workflow patterns (e.g., check task graph size before compute, verify partition count after reading, validate output row counts after ETL).
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably well-organized but includes unnecessary explanatory content Claude already knows (e.g., 'When to Use This Skill' section largely restates the overview, the 'Selecting the Right Component' decision guide is somewhat redundant with per-section 'When to Use' blocks, and the overview explains what Dask is). The 'Start with Simpler Solutions' section is also padding. Could be tightened significantly. | 2 / 3 |
Actionability | All code examples are concrete, executable, and copy-paste ready. The anti-patterns in Best Practices show both wrong and correct approaches with real code. Quick examples for each component are functional and demonstrate actual API usage patterns. | 3 / 3 |
Workflow Clarity | The debugging/development workflow has a clear 3-step progression (synchronous → threads → distributed), and the ETL pipeline patterns are well-sequenced. However, there are no explicit validation checkpoints or feedback loops for error recovery in the workflow patterns. The Common Issues section lists problems but doesn't integrate them as validation steps within workflows. | 2 / 3 |
Progressive Disclosure | Excellent progressive disclosure structure: the main file provides concise overviews with quick examples for each component, then clearly signals one-level-deep references (references/dataframes.md, references/arrays.md, etc.) with descriptions of what each contains. Navigation is clear and well-organized with a dedicated Reference Files section. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
cbcae7b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.