Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.
83
78%
Does it follow best practices?
Impact
91%
1.71xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/dask/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly communicates what the skill does (distributed computing for scaling pandas/NumPy), when to use it (beyond memory limits or across clusters), and importantly when NOT to use it (providing alternatives like vaex and polars). The description is concise, uses third-person voice, and includes rich natural trigger terms that users would actually use.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions and use cases: 'larger-than-RAM pandas/NumPy workflows', 'parallel file processing', 'distributed ML', 'integration with existing pandas code'. Also provides differentiation against alternatives (vaex, polars). | 3 / 3 |
Completeness | Clearly answers both 'what' (distributed computing for larger-than-RAM pandas/NumPy workflows) and 'when' ('Use when you need to scale existing pandas/NumPy code beyond memory or across clusters'). Also includes negative triggers distinguishing from vaex and polars use cases. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'pandas', 'NumPy', 'distributed computing', 'larger-than-RAM', 'memory', 'clusters', 'parallel file processing', 'distributed ML', 'out-of-core', 'polars', 'vaex'. These cover many natural ways a user might describe their need. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive by explicitly carving out its niche (distributed/larger-than-RAM pandas/NumPy) and differentiating from related tools (vaex for out-of-core single machine, polars for in-memory speed). This makes it very unlikely to conflict with skills for those other tools. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
57%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill is well-structured with excellent progressive disclosure and highly actionable code examples covering all five Dask components. Its main weakness is significant verbosity — the overview, 'when to use' sections, component descriptions, and decision guides contain substantial information Claude already knows, roughly doubling the token cost without proportional value. Workflow clarity would benefit from explicit validation checkpoints in multi-step processes.
Suggestions
Cut the 'Overview', 'When to Use This Skill', and 'Selecting the Right Component' sections significantly — Claude already knows what Dask is and can infer component selection from the quick examples and reference pointers.
Remove or drastically shorten the 'When to Use' and 'Key Points' subsections under each component — these restate information Claude already knows about parallel computing concepts.
Add explicit validation checkpoints to the ETL and development workflows (e.g., 'Verify row count after filtering: `len(ddf)` before proceeding to aggregation').
Consolidate the 'Integration Considerations' section into the relevant reference files rather than including it in the main skill body.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~400+ lines. It explains what Dask is, when to use each component, and includes extensive decision guides and integration notes that Claude already knows. The 'Overview' section explains basic concepts, 'When to Use This Skill' restates the description, and sections like 'Selecting the Right Component' and 'Integration Considerations' contain information Claude can infer. Much of this could be cut in half while preserving all actionable content. | 1 / 3 |
Actionability | The skill provides fully executable, copy-paste ready code examples throughout — reading CSVs, array operations, bag processing, futures, scheduler selection, ETL pipelines, and debugging workflows. Anti-patterns are shown with correct alternatives (e.g., don't load locally then hand to Dask). All examples are concrete and runnable. | 3 / 3 |
Workflow Clarity | The debugging/development workflow has a clear 3-step sequence (synchronous → threads → distributed), and the ETL pipeline shows extract/transform/load steps. However, there are no explicit validation checkpoints or feedback loops — no 'verify the output before proceeding' steps, no error recovery guidance in the workflows, and the common issues section is reactive rather than integrated into workflows. | 2 / 3 |
Progressive Disclosure | The skill has excellent progressive disclosure with a clear overview structure and well-signaled one-level-deep references to six specific reference files (dataframes.md, arrays.md, bags.md, futures.md, schedulers.md, best-practices.md). Each component section provides a quick example inline and points to the detailed reference for comprehensive guidance. | 3 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
b58ad7e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.