Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.
84
81%
Does it follow best practices?
Impact
86%
1.01xAverage score across 3 eval scenarios
Passed
No known issues
Lazy evaluation and query optimization
Lazy CSV reading
100%
100%
Calls collect()
100%
100%
Filter before aggregation
100%
100%
Categorical for low-cardinality columns
0%
0%
Parquet output
100%
100%
Expression-based column references
100%
100%
Multiple filter args
66%
100%
No row iteration
100%
100%
No in-place assignment
100%
100%
Chained operations
50%
50%
group_by syntax
100%
100%
Pipeline log file
100%
100%
Pandas to Polars migration patterns
group_by spelling
100%
100%
No reset_index
100%
100%
when/then/otherwise
100%
100%
with_columns for new columns
100%
100%
No Python apply/lambda
100%
100%
Expression-based filter
100%
100%
No row iteration
100%
100%
Polars import only
100%
100%
pl.col() references
100%
100%
Produces output CSV
100%
100%
Window functions and parallel column computation
over() for group stats
100%
100%
over() for ranking
100%
100%
Parallel with_columns
50%
50%
rechunk on concat
0%
0%
Categorical for region
0%
0%
No map_elements/apply
100%
100%
No row iteration
100%
100%
Parquet output
100%
100%
Expression-based
100%
100%
Functional chaining
50%
80%
b58ad7e
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.