Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.
87
85%
Does it follow best practices?
Impact
91%
1.71xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
DataFrame loading and computation anti-patterns
Dask-native loading
100%
100%
No pandas preload
100%
100%
Batched compute
0%
100%
No compute in loop
100%
100%
Parquet output
100%
100%
map_partitions usage
0%
100%
No row-wise apply
100%
100%
Column early selection
0%
0%
Lazy pipeline
100%
100%
Glob pattern read
100%
100%
Bag processing, foldby aggregation, scheduler selection
Bag for JSON ingestion
0%
100%
foldby not groupby
0%
0%
No bag groupby
100%
100%
Bag to DataFrame
0%
100%
Process scheduler
100%
70%
Lazy chain
0%
100%
JSON parsing with map
0%
100%
Filter before convert
0%
100%
Parquet or structured output
100%
100%
Functional operations
0%
100%
Futures, scatter, gather, and distributed client usage
Distributed Client created
0%
100%
Futures used for tasks
0%
100%
Data pre-scattered
50%
100%
No repeated data passing
100%
100%
client.gather for bulk retrieval
50%
100%
No sequential result loop
100%
100%
Task granularity appropriate
100%
100%
Client closed
66%
100%
Immediate execution noted
80%
100%
Results written to file
100%
100%
899a51b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.