0
# Polars
1
2
A blazingly fast DataFrame library for Python built on Apache Arrow Columnar Format with lazy and eager execution modes. Polars provides comprehensive data manipulation and analysis capabilities with multi-threaded processing, SIMD optimization, query optimization, and powerful expression APIs designed for maximum performance in data science workflows.
3
4
## Package Information
5
6
- **Package Name**: polars
7
- **Language**: Python
8
- **Installation**: `pip install polars`
9
- **Documentation**: https://docs.pola.rs/api/python/stable/reference/index.html
10
11
## Core Imports
12
13
```python
14
import polars as pl
15
```
16
17
For specific components:
18
19
```python
20
from polars import DataFrame, LazyFrame, Series, Expr
21
from polars import col, lit, when
22
from polars import read_csv, read_parquet, scan_csv
23
```
24
25
## Basic Usage
26
27
```python
28
import polars as pl
29
30
# Create DataFrame from dictionary
31
df = pl.DataFrame({
32
"name": ["Alice", "Bob", "Charlie"],
33
"age": [25, 30, 35],
34
"city": ["New York", "London", "Tokyo"]
35
})
36
37
# Basic operations
38
result = (
39
df
40
.filter(pl.col("age") > 27)
41
.select([
42
pl.col("name"),
43
pl.col("age"),
44
pl.col("city").alias("location")
45
])
46
.sort("age", descending=True)
47
)
48
49
print(result)
50
51
# Lazy evaluation for query optimization
52
lazy_result = (
53
pl.scan_csv("data.csv")
54
.filter(pl.col("revenue") > 1000)
55
.group_by("department")
56
.agg([
57
pl.col("revenue").sum().alias("total_revenue"),
58
pl.col("employee_id").count().alias("employee_count")
59
])
60
.collect()
61
)
62
```
63
64
## Architecture
65
66
Polars provides two main execution paradigms:
67
68
- **Eager Execution**: Immediate computation with DataFrame and Series
69
- **Lazy Execution**: Deferred computation with LazyFrame for query optimization
70
71
Key architectural components:
72
73
- **DataFrame**: Eager evaluation data structure for immediate operations
74
- **LazyFrame**: Lazy evaluation with automatic query optimization and predicate pushdown
75
- **Series**: One-dimensional data structure with vectorized operations
76
- **Expr**: Expression API for column operations and transformations
77
- **Arrow Integration**: Built on Apache Arrow for efficient memory layout and interoperability
78
79
## Capabilities
80
81
### Core Data Structures
82
83
Primary data structures for eager and lazy computation, providing comprehensive data manipulation capabilities with vectorized operations and type safety.
84
85
```python { .api }
86
class DataFrame:
87
def __init__(self, data=None, schema=None, *, schema_overrides=None, strict=True, orient=None, infer_schema_length=None, nan_to_null=False): ...
88
def select(self, *exprs, **named_exprs) -> DataFrame: ...
89
def filter(self, *predicates) -> DataFrame: ...
90
def with_columns(self, *exprs, **named_exprs) -> DataFrame: ...
91
def group_by(self, *by, maintain_order=False) -> GroupBy: ...
92
93
class LazyFrame:
94
def select(self, *exprs, **named_exprs) -> LazyFrame: ...
95
def filter(self, *predicates) -> LazyFrame: ...
96
def with_columns(self, *exprs, **named_exprs) -> LazyFrame: ...
97
def collect(self, **kwargs) -> DataFrame: ...
98
99
class Series:
100
def __init__(self, name=None, values=None, dtype=None): ...
101
def filter(self, predicate) -> Series: ...
102
def map_elements(self, function, return_dtype=None) -> Series: ...
103
104
class Expr:
105
def alias(self, name: str) -> Expr: ...
106
def filter(self, predicate) -> Expr: ...
107
def sum(self) -> Expr: ...
108
```
109
110
[Core Data Structures](./core-data-structures.md)
111
112
### Data Types and Schema
113
114
Comprehensive type system supporting primitive types, temporal data, nested structures, and schema validation with automatic type inference and casting.
115
116
```python { .api }
117
# Primitive Types
118
Boolean: DataType
119
Int8, Int16, Int32, Int64, Int128: DataType
120
UInt8, UInt16, UInt32, UInt64: DataType
121
Float32, Float64: DataType
122
Decimal: DataType
123
124
# String and Binary Types
125
String: DataType
126
Binary: DataType
127
Categorical: DataType
128
Enum: DataType
129
130
# Temporal Types
131
Date: DataType
132
Datetime: DataType
133
Time: DataType
134
Duration: DataType
135
136
# Nested Types
137
List: DataType
138
Array: DataType
139
Struct: DataType
140
141
class Schema:
142
def __init__(self, schema): ...
143
def names(self) -> list[str]: ...
144
def dtypes(self) -> list[DataType]: ...
145
```
146
147
[Data Types and Schema](./data-types.md)
148
149
### Functions and Expressions
150
151
90+ utility functions for data construction, aggregation, statistical operations, and expression building with support for vectorized computations and window functions.
152
153
```python { .api }
154
# Construction Functions
155
def col(name: str) -> Expr: ...
156
def lit(value) -> Expr: ...
157
def when(predicate) -> When: ...
158
def struct(*exprs) -> Expr: ...
159
160
# Aggregation Functions
161
def sum(*exprs) -> Expr: ...
162
def mean(*exprs) -> Expr: ...
163
def count(*exprs) -> Expr: ...
164
def max(*exprs) -> Expr: ...
165
def min(*exprs) -> Expr: ...
166
167
# Range Functions
168
def arange(start, end, step=1, dtype=None) -> Expr: ...
169
def date_range(start, end, interval="1d") -> Expr: ...
170
def int_range(start, end, step=1, dtype=None) -> Expr: ...
171
172
# Statistical Functions
173
def corr(a, b, method="pearson") -> Expr: ...
174
def std(column, ddof=1) -> Expr: ...
175
def var(column, ddof=1) -> Expr: ...
176
```
177
178
[Functions and Expressions](./functions-expressions.md)
179
180
### Input/Output Operations
181
182
Comprehensive I/O support for 15+ file formats including CSV, Parquet, JSON, Excel, databases, and cloud storage with both eager reading and lazy scanning capabilities.
183
184
```python { .api }
185
# Read Functions (Eager)
186
def read_csv(source, **kwargs) -> DataFrame: ...
187
def read_parquet(source, **kwargs) -> DataFrame: ...
188
def read_json(source, **kwargs) -> DataFrame: ...
189
def read_excel(source, **kwargs) -> DataFrame: ...
190
def read_database(query, connection, **kwargs) -> DataFrame: ...
191
192
# Scan Functions (Lazy)
193
def scan_csv(source, **kwargs) -> LazyFrame: ...
194
def scan_parquet(source, **kwargs) -> LazyFrame: ...
195
def scan_ndjson(source, **kwargs) -> LazyFrame: ...
196
def scan_delta(source, **kwargs) -> LazyFrame: ...
197
198
# Cloud Credentials
199
class CredentialProviderAWS:
200
def __init__(self, **kwargs): ...
201
202
class CredentialProviderGCP:
203
def __init__(self, **kwargs): ...
204
```
205
206
[Input/Output Operations](./io-operations.md)
207
208
### SQL Interface
209
210
SQL query execution capabilities with SQLContext for managing multiple DataFrames and native SQL expression support within DataFrame operations.
211
212
```python { .api }
213
class SQLContext:
214
def __init__(self): ...
215
def register(self, name: str, frame) -> None: ...
216
def execute(self, query: str, **kwargs) -> DataFrame: ...
217
def tables(self) -> list[str]: ...
218
219
def sql(query: str, **kwargs) -> DataFrame: ...
220
def sql_expr(sql: str) -> Expr: ...
221
```
222
223
[SQL Interface](./sql-interface.md)
224
225
### Configuration and Optimization
226
227
Global configuration system for controlling formatting, streaming behavior, and optimization settings with context managers and persistent configuration.
228
229
```python { .api }
230
class Config:
231
@classmethod
232
def set_fmt_str_lengths(cls, n: int) -> type[Config]: ...
233
@classmethod
234
def set_tbl_rows(cls, n: int) -> type[Config]: ...
235
@classmethod
236
def set_streaming_chunk_size(cls, size: int) -> type[Config]: ...
237
@classmethod
238
def restore_defaults(cls) -> type[Config]: ...
239
240
class QueryOptFlags:
241
def __init__(self, **kwargs): ...
242
243
class GPUEngine:
244
def __init__(self, **kwargs): ...
245
```
246
247
[Configuration and Optimization](./configuration.md)
248
249
### Column Selection
250
251
Advanced column selection system with 30+ selector functions supporting pattern matching, data type filtering, and logical operations for complex column manipulation.
252
253
```python { .api }
254
import polars.selectors as cs
255
256
# Data Type Selectors
257
def by_dtype(dtypes) -> Selector: ...
258
def numeric() -> Selector: ...
259
def string() -> Selector: ...
260
def temporal() -> Selector: ...
261
def boolean() -> Selector: ...
262
263
# Pattern Selectors
264
def contains(pattern: str) -> Selector: ...
265
def starts_with(prefix: str) -> Selector: ...
266
def ends_with(suffix: str) -> Selector: ...
267
def matches(pattern: str) -> Selector: ...
268
269
# Index Selectors
270
def by_index(indices) -> Selector: ...
271
def first(n: int = 1) -> Selector: ...
272
def last(n: int = 1) -> Selector: ...
273
```
274
275
[Column Selection](./column-selection.md)
276
277
### Data Conversion
278
279
Seamless integration with pandas, NumPy, PyArrow, and PyTorch through conversion functions supporting bidirectional data exchange with automatic schema mapping.
280
281
```python { .api }
282
def from_pandas(df, **kwargs) -> DataFrame: ...
283
def from_numpy(data, schema=None, **kwargs) -> DataFrame: ...
284
def from_arrow(data, **kwargs) -> DataFrame: ...
285
def from_dict(data, schema=None) -> DataFrame: ...
286
def from_dicts(dicts, schema=None) -> DataFrame: ...
287
def from_torch(tensor, **kwargs) -> DataFrame: ...
288
def json_normalize(data, **kwargs) -> DataFrame: ...
289
```
290
291
[Data Conversion](./data-conversion.md)
292
293
### Error Handling and Exceptions
294
295
Comprehensive exception hierarchy for handling data errors, computation failures, and I/O issues with specific error types for precise error handling.
296
297
```python { .api }
298
# Base Exceptions
299
class PolarsError(Exception): ...
300
class ComputeError(PolarsError): ...
301
302
# Data Exceptions
303
class ColumnNotFoundError(PolarsError): ...
304
class SchemaError(PolarsError): ...
305
class DuplicateError(PolarsError): ...
306
class ShapeError(PolarsError): ...
307
308
# Additional Row-Related Exceptions
309
class RowsError(PolarsError): ...
310
class NoRowsReturnedError(RowsError): ...
311
class TooManyRowsReturnedError(RowsError): ...
312
313
# SQL Exceptions
314
class SQLInterfaceError(PolarsError): ...
315
class SQLSyntaxError(PolarsError): ...
316
317
# Warning Types
318
class PerformanceWarning(UserWarning): ...
319
class CategoricalRemappingWarning(UserWarning): ...
320
```
321
322
[Error Handling](./error-handling.md)