Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.
83
78%
Does it follow best practices?
Impact
91%
1.71xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/dask/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly communicates what the skill does (distributed computing for scaling pandas/NumPy), when to use it (beyond memory limits or across clusters), and importantly, when NOT to use it (directing users to vaex or polars for alternative scenarios). The description is concise, uses natural trigger terms, and is highly distinctive.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions and use cases: 'larger-than-RAM pandas/NumPy workflows', 'parallel file processing', 'distributed ML', 'integration with existing pandas code'. Also distinguishes from alternatives (vaex, polars) with specific criteria. | 3 / 3 |
Completeness | Clearly answers both 'what' (distributed computing for larger-than-RAM pandas/NumPy workflows) and 'when' ('Use when you need to scale existing pandas/NumPy code beyond memory or across clusters'). Also includes explicit negative triggers distinguishing from vaex and polars. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'distributed computing', 'larger-than-RAM', 'pandas', 'NumPy', 'scale', 'memory', 'clusters', 'parallel file processing', 'distributed ML', 'out-of-core'. These are terms a user working with big data in Python would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive by explicitly carving out its niche relative to competing tools (vaex for out-of-core single machine, polars for in-memory speed). The focus on distributed computing with pandas/NumPy compatibility creates a clear, non-overlapping identity. This is likely Dask. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
57%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill has excellent structure with strong progressive disclosure and fully actionable code examples. However, it is significantly too verbose—it over-explains concepts Claude already knows (what parallel computing is, when to use DataFrames vs Arrays), duplicates information across sections (the component descriptions and the selection guide), and includes unnecessary 'When to Use' and 'Purpose' sections that add little value. Trimming redundancy could cut this by 40-50% without losing actionable content.
Suggestions
Remove the 'Overview', 'When to Use This Skill', and 'Selecting the Right Component' sections entirely—Claude can infer these from the component descriptions and the skill's YAML metadata.
Collapse each component section to just the Quick Example, Key Points, and reference link—remove the 'Purpose' and 'When to Use' sub-sections which explain things Claude already knows.
Add explicit validation steps to workflow patterns (e.g., check task graph size before compute, verify output row counts after ETL) to strengthen workflow clarity.
Merge the 'Common Issues' troubleshooting into the Best Practices reference file rather than duplicating guidance in the main skill.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is excessively verbose at ~350+ lines. It explains what Dask is, when to use each component, and includes extensive 'When to Use' bullet lists that Claude already knows. The 'Overview' section explains basic concepts like parallel processing and distributed computing. The 'Selecting the Right Component' decision guide largely duplicates information already covered in each component section. | 1 / 3 |
Actionability | All code examples are concrete, executable, and copy-paste ready. The anti-patterns in Best Practices show both wrong and correct approaches with real code. Quick examples for each component are functional and demonstrate actual API usage. | 3 / 3 |
Workflow Clarity | The debugging/development workflow has a clear 3-step sequence (synchronous → threads → distributed), and the ETL pipeline patterns are well-sequenced. However, there are no explicit validation checkpoints or feedback loops for error recovery in the workflow patterns, and the common issues section is reactive rather than integrated into workflows. | 2 / 3 |
Progressive Disclosure | Excellent progressive disclosure with clear overview content and well-signaled one-level-deep references to six specific reference files. Each component section provides a quick example inline while pointing to detailed reference docs for comprehensive guidance. | 3 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
25e1c0f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.