Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.
84
86%
Does it follow best practices?
Impact
72%
1.22xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Batched aggregations and virtual columns
delay=True usage
0%
100%
vaex.execute() batching
0%
33%
Virtual column creation
100%
100%
Named selections used
0%
0%
Selection-based aggregation
0%
0%
Avoids to_pandas_df on main data
100%
100%
Avoids .values for computation
100%
100%
HDF5 format used
50%
0%
Multi-column statistics
100%
100%
Segment comparison output
100%
100%
Large dataset visualization with density plots
df.plot() for 2D density
25%
0%
df.plot1d() for histograms
20%
0%
Percentile-based limits
0%
0%
Logarithmic color scale
0%
0%
Named selections for comparison
0%
100%
Figures saved to files
100%
100%
No full-dataset sampling for main plots
100%
100%
shape parameter used
100%
100%
Multiple plot types
100%
100%
Comparison visualization
42%
100%
ML pipeline with vaex.ml and state management
vaex.ml import
100%
100%
StandardScaler or MinMaxScaler
100%
100%
Categorical encoder from vaex.ml
100%
100%
vaex.ml.sklearn.Predictor
100%
100%
fit on training data
100%
100%
state_write() called
53%
100%
state.json file present
100%
100%
state_load() on new data
33%
100%
Virtual column preservation
100%
100%
Predictions output
100%
100%
1420470
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.