Tessl Tile for pypi/polars@1.33.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

column-selection.md configuration.md core-data-structures.md data-conversion.md data-types.md error-handling.md functions-expressions.md index.md io-operations.md sql-interface.md

index.mddocs/

0
# Polars
1

2
A blazingly fast DataFrame library for Python built on Apache Arrow Columnar Format with lazy and eager execution modes. Polars provides comprehensive data manipulation and analysis capabilities with multi-threaded processing, SIMD optimization, query optimization, and powerful expression APIs designed for maximum performance in data science workflows.
3

4
## Package Information
5

6
- **Package Name**: polars
7
- **Language**: Python
8
- **Installation**: `pip install polars`
9
- **Documentation**: https://docs.pola.rs/api/python/stable/reference/index.html
10

11
## Core Imports
12

13
```python
14
import polars as pl
15
```
16

17
For specific components:
18

19
```python
20
from polars import DataFrame, LazyFrame, Series, Expr
21
from polars import col, lit, when
22
from polars import read_csv, read_parquet, scan_csv
23
```
24

25
## Basic Usage
26

27
```python
28
import polars as pl
29

30
# Create DataFrame from dictionary
31
df = pl.DataFrame({
32
    "name": ["Alice", "Bob", "Charlie"],
33
    "age": [25, 30, 35],
34
    "city": ["New York", "London", "Tokyo"]
35
})
36

37
# Basic operations
38
result = (
39
    df
40
    .filter(pl.col("age") > 27)
41
    .select([
42
        pl.col("name"),
43
        pl.col("age"),
44
        pl.col("city").alias("location")
45
    ])
46
    .sort("age", descending=True)
47
)
48

49
print(result)
50

51
# Lazy evaluation for query optimization
52
lazy_result = (
53
    pl.scan_csv("data.csv")
54
    .filter(pl.col("revenue") > 1000)
55
    .group_by("department")
56
    .agg([
57
        pl.col("revenue").sum().alias("total_revenue"),
58
        pl.col("employee_id").count().alias("employee_count")
59
    ])
60
    .collect()
61
)
62
```
63

64
## Architecture
65

66
Polars provides two main execution paradigms:
67

68
- **Eager Execution**: Immediate computation with DataFrame and Series
69
- **Lazy Execution**: Deferred computation with LazyFrame for query optimization
70

71
Key architectural components:
72

73
- **DataFrame**: Eager evaluation data structure for immediate operations
74
- **LazyFrame**: Lazy evaluation with automatic query optimization and predicate pushdown
75
- **Series**: One-dimensional data structure with vectorized operations
76
- **Expr**: Expression API for column operations and transformations
77
- **Arrow Integration**: Built on Apache Arrow for efficient memory layout and interoperability
78

79
## Capabilities
80

81
### Core Data Structures
82

83
Primary data structures for eager and lazy computation, providing comprehensive data manipulation capabilities with vectorized operations and type safety.
84

85
```python { .api }
86
class DataFrame:
87
    def __init__(self, data=None, schema=None, *, schema_overrides=None, strict=True, orient=None, infer_schema_length=None, nan_to_null=False): ...
88
    def select(self, *exprs, **named_exprs) -> DataFrame: ...
89
    def filter(self, *predicates) -> DataFrame: ...
90
    def with_columns(self, *exprs, **named_exprs) -> DataFrame: ...
91
    def group_by(self, *by, maintain_order=False) -> GroupBy: ...
92

93
class LazyFrame:
94
    def select(self, *exprs, **named_exprs) -> LazyFrame: ...
95
    def filter(self, *predicates) -> LazyFrame: ...
96
    def with_columns(self, *exprs, **named_exprs) -> LazyFrame: ...
97
    def collect(self, **kwargs) -> DataFrame: ...
98

99
class Series:
100
    def __init__(self, name=None, values=None, dtype=None): ...
101
    def filter(self, predicate) -> Series: ...
102
    def map_elements(self, function, return_dtype=None) -> Series: ...
103

104
class Expr:
105
    def alias(self, name: str) -> Expr: ...
106
    def filter(self, predicate) -> Expr: ...
107
    def sum(self) -> Expr: ...
108
```
109

110
[Core Data Structures](./core-data-structures.md)
111

112
### Data Types and Schema
113

114
Comprehensive type system supporting primitive types, temporal data, nested structures, and schema validation with automatic type inference and casting.
115

116
```python { .api }
117
# Primitive Types
118
Boolean: DataType
119
Int8, Int16, Int32, Int64, Int128: DataType  
120
UInt8, UInt16, UInt32, UInt64: DataType
121
Float32, Float64: DataType
122
Decimal: DataType
123

124
# String and Binary Types  
125
String: DataType
126
Binary: DataType
127
Categorical: DataType
128
Enum: DataType
129

130
# Temporal Types
131
Date: DataType
132
Datetime: DataType
133
Time: DataType
134
Duration: DataType
135

136
# Nested Types
137
List: DataType
138
Array: DataType
139
Struct: DataType
140

141
class Schema:
142
    def __init__(self, schema): ...
143
    def names(self) -> list[str]: ...
144
    def dtypes(self) -> list[DataType]: ...
145
```
146

147
[Data Types and Schema](./data-types.md)
148

149
### Functions and Expressions
150

151
90+ utility functions for data construction, aggregation, statistical operations, and expression building with support for vectorized computations and window functions.
152

153
```python { .api }
154
# Construction Functions
155
def col(name: str) -> Expr: ...
156
def lit(value) -> Expr: ...
157
def when(predicate) -> When: ...
158
def struct(*exprs) -> Expr: ...
159

160
# Aggregation Functions
161
def sum(*exprs) -> Expr: ...
162
def mean(*exprs) -> Expr: ...
163
def count(*exprs) -> Expr: ...
164
def max(*exprs) -> Expr: ...
165
def min(*exprs) -> Expr: ...
166

167
# Range Functions
168
def arange(start, end, step=1, dtype=None) -> Expr: ...
169
def date_range(start, end, interval="1d") -> Expr: ...
170
def int_range(start, end, step=1, dtype=None) -> Expr: ...
171

172
# Statistical Functions
173
def corr(a, b, method="pearson") -> Expr: ...
174
def std(column, ddof=1) -> Expr: ...
175
def var(column, ddof=1) -> Expr: ...
176
```
177

178
[Functions and Expressions](./functions-expressions.md)
179

180
### Input/Output Operations
181

182
Comprehensive I/O support for 15+ file formats including CSV, Parquet, JSON, Excel, databases, and cloud storage with both eager reading and lazy scanning capabilities.
183

184
```python { .api }
185
# Read Functions (Eager)
186
def read_csv(source, **kwargs) -> DataFrame: ...
187
def read_parquet(source, **kwargs) -> DataFrame: ...
188
def read_json(source, **kwargs) -> DataFrame: ...
189
def read_excel(source, **kwargs) -> DataFrame: ...
190
def read_database(query, connection, **kwargs) -> DataFrame: ...
191

192
# Scan Functions (Lazy)  
193
def scan_csv(source, **kwargs) -> LazyFrame: ...
194
def scan_parquet(source, **kwargs) -> LazyFrame: ...
195
def scan_ndjson(source, **kwargs) -> LazyFrame: ...
196
def scan_delta(source, **kwargs) -> LazyFrame: ...
197

198
# Cloud Credentials
199
class CredentialProviderAWS:
200
    def __init__(self, **kwargs): ...
201

202
class CredentialProviderGCP:
203
    def __init__(self, **kwargs): ...
204
```
205

206
[Input/Output Operations](./io-operations.md)
207

208
### SQL Interface
209

210
SQL query execution capabilities with SQLContext for managing multiple DataFrames and native SQL expression support within DataFrame operations.
211

212
```python { .api }
213
class SQLContext:
214
    def __init__(self): ...
215
    def register(self, name: str, frame) -> None: ...
216
    def execute(self, query: str, **kwargs) -> DataFrame: ...
217
    def tables(self) -> list[str]: ...
218

219
def sql(query: str, **kwargs) -> DataFrame: ...
220
def sql_expr(sql: str) -> Expr: ...
221
```
222

223
[SQL Interface](./sql-interface.md)
224

225
### Configuration and Optimization
226

227
Global configuration system for controlling formatting, streaming behavior, and optimization settings with context managers and persistent configuration.
228

229
```python { .api }
230
class Config:
231
    @classmethod
232
    def set_fmt_str_lengths(cls, n: int) -> type[Config]: ...
233
    @classmethod  
234
    def set_tbl_rows(cls, n: int) -> type[Config]: ...
235
    @classmethod
236
    def set_streaming_chunk_size(cls, size: int) -> type[Config]: ...
237
    @classmethod
238
    def restore_defaults(cls) -> type[Config]: ...
239

240
class QueryOptFlags:
241
    def __init__(self, **kwargs): ...
242

243
class GPUEngine:
244
    def __init__(self, **kwargs): ...
245
```
246

247
[Configuration and Optimization](./configuration.md)
248

249
### Column Selection
250

251
Advanced column selection system with 30+ selector functions supporting pattern matching, data type filtering, and logical operations for complex column manipulation.
252

253
```python { .api }
254
import polars.selectors as cs
255

256
# Data Type Selectors
257
def by_dtype(dtypes) -> Selector: ...
258
def numeric() -> Selector: ...  
259
def string() -> Selector: ...
260
def temporal() -> Selector: ...
261
def boolean() -> Selector: ...
262

263
# Pattern Selectors
264
def contains(pattern: str) -> Selector: ...
265
def starts_with(prefix: str) -> Selector: ...
266
def ends_with(suffix: str) -> Selector: ...
267
def matches(pattern: str) -> Selector: ...
268

269
# Index Selectors
270
def by_index(indices) -> Selector: ...
271
def first(n: int = 1) -> Selector: ...
272
def last(n: int = 1) -> Selector: ...
273
```
274

275
[Column Selection](./column-selection.md)
276

277
### Data Conversion
278

279
Seamless integration with pandas, NumPy, PyArrow, and PyTorch through conversion functions supporting bidirectional data exchange with automatic schema mapping.
280

281
```python { .api }
282
def from_pandas(df, **kwargs) -> DataFrame: ...
283
def from_numpy(data, schema=None, **kwargs) -> DataFrame: ...
284
def from_arrow(data, **kwargs) -> DataFrame: ...
285
def from_dict(data, schema=None) -> DataFrame: ...
286
def from_dicts(dicts, schema=None) -> DataFrame: ...
287
def from_torch(tensor, **kwargs) -> DataFrame: ...
288
def json_normalize(data, **kwargs) -> DataFrame: ...
289
```
290

291
[Data Conversion](./data-conversion.md)
292

293
### Error Handling and Exceptions
294

295
Comprehensive exception hierarchy for handling data errors, computation failures, and I/O issues with specific error types for precise error handling.
296

297
```python { .api }
298
# Base Exceptions
299
class PolarsError(Exception): ...
300
class ComputeError(PolarsError): ...
301

302
# Data Exceptions  
303
class ColumnNotFoundError(PolarsError): ...
304
class SchemaError(PolarsError): ...
305
class DuplicateError(PolarsError): ...
306
class ShapeError(PolarsError): ...
307

308
# Additional Row-Related Exceptions
309
class RowsError(PolarsError): ...
310
class NoRowsReturnedError(RowsError): ...
311
class TooManyRowsReturnedError(RowsError): ...
312

313
# SQL Exceptions
314
class SQLInterfaceError(PolarsError): ...
315
class SQLSyntaxError(PolarsError): ...
316

317
# Warning Types
318
class PerformanceWarning(UserWarning): ...
319
class CategoricalRemappingWarning(UserWarning): ...
320
```
321

322
[Error Handling](./error-handling.md)

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/