Tessl Tile for pypi/polars-u64-idx@1.33.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-polars-u64-idx

Blazingly fast DataFrame library with 64-bit index support for handling datasets with more than 4.2 billion rows

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/polars-u64-idx@1.33.x

To install, run

npx @tessl/cli install tessl/pypi-polars-u64-idx@1.33.0

0
# Polars u64-idx
1

2
Polars is a blazingly fast DataFrame library optimized for performance and memory efficiency. This variant provides 64-bit index support, enabling analysis of datasets with more than 4.2 billion rows. Built in Rust using Apache Arrow Columnar Format, it features lazy/eager execution, multi-threading, SIMD optimization, query optimization, and hybrid streaming for larger-than-RAM datasets.
3

4
## Package Information
5

6
- **Package Name**: polars-u64-idx
7
- **Language**: Python
8
- **Installation**: `pip install polars-u64-idx`
9

10
## Core Imports
11

12
```python
13
import polars as pl
14
```
15

16
For specific functionality:
17

18
```python
19
# Core data structures
20
from polars import DataFrame, Series, LazyFrame
21

22
# Data types
23
from polars import Int64, Float64, String, Date, Datetime
24

25
# Functions and expressions
26
from polars import col, lit, when, concat
27
```
28

29
## Basic Usage
30

31
```python
32
import polars as pl
33

34
# Create a DataFrame
35
df = pl.DataFrame({
36
    "name": ["Alice", "Bob", "Charlie"],
37
    "age": [25, 30, 35],
38
    "city": ["New York", "London", "Tokyo"]
39
})
40

41
# Basic operations
42
result = (df
43
    .filter(pl.col("age") > 28)
44
    .select([
45
        pl.col("name"),
46
        pl.col("age"),
47
        pl.col("city").alias("location")
48
    ])
49
    .sort("age")
50
)
51

52
print(result)
53

54
# Lazy evaluation for larger datasets
55
lazy_df = (pl
56
    .scan_csv("large_file.csv")
57
    .filter(pl.col("amount") > 1000)
58
    .group_by("category")
59
    .agg([
60
        pl.col("amount").sum().alias("total_amount"),
61
        pl.col("id").count().alias("count")
62
    ])
63
)
64

65
# Execute the lazy computation
66
result = lazy_df.collect()
67
```
68

69
## Architecture
70

71
Polars uses a columnar data model built on Apache Arrow with several key components:
72

73
- **DataFrame/Series**: Eager evaluation data structures for immediate computation
74
- **LazyFrame**: Deferred evaluation with query optimization for better performance
75
- **Expressions**: Composable operations that work on columns (Expr class)
76
- **Data Types**: Comprehensive type system with 20+ types including nested types
77
- **I/O Engine**: Native support for 10+ file formats with lazy scanning capabilities
78
- **Query Engine**: Rust-based OLAP engine with predicate pushdown, projection pushdown, and streaming
79

80
The 64-bit index variant removes the 4.2 billion row limit of standard Polars, making it suitable for very large datasets while maintaining the same API and performance characteristics.
81

82
## Capabilities
83

84
### Core Data Structures
85

86
Primary data structures for working with tabular data, including eager DataFrame/Series for immediate operations and LazyFrame for optimized query execution.
87

88
```python { .api }
89
class DataFrame:
90
    def __init__(self, data=None, schema=None, schema_overrides=None, orient=None, infer_schema_length=N_INFER_DEFAULT, nan_to_null=False): ...
91
    def select(self, *exprs, **named_exprs) -> DataFrame: ...
92
    def filter(self, *predicates, **constraints) -> DataFrame: ...
93
    def with_columns(self, *exprs, **named_exprs) -> DataFrame: ...
94
    def group_by(self, *by, maintain_order=False, **named_by) -> GroupBy: ...
95
    def sort(self, by, *, descending=False, nulls_last=False, multithreaded=True) -> DataFrame: ...
96
    def join(self, other, on=None, how="inner", *, left_on=None, right_on=None, suffix="_right", validate="m:m", join_nulls=False, coalesce=None) -> DataFrame: ...
97

98
class Series:
99
    def __init__(self, name=None, values=None, dtype=None, strict=True, nan_to_null=False, dtype_if_empty=Null): ...
100

101
class LazyFrame:
102
    def select(self, *exprs, **named_exprs) -> LazyFrame: ...
103
    def filter(self, *predicates, **constraints) -> LazyFrame: ...
104
    def collect(self, *, type_coercion=True, predicate_pushdown=True, projection_pushdown=True, simplify_expression=True, slice_pushdown=True, comm_subplan_elim=True, comm_subexpr_elim=True, cluster_with_columns=True, no_optimization=False, streaming=False, background=False, _eager=False) -> DataFrame: ...
105
```
106

107
[Core Data Structures](./core-data-structures.md)
108

109
### Expressions and Column Operations
110

111
Powerful expression system for column transformations, aggregations, and complex operations that work across DataFrame and LazyFrame.
112

113
```python { .api }
114
class Expr:
115
    def alias(self, name: str) -> Expr: ...
116
    def cast(self, dtype: DataType | type[Any], *, strict: bool = True) -> Expr: ...
117
    def filter(self, predicate: Expr) -> Expr: ...
118
    def sort(self, *, descending: bool = False, nulls_last: bool = False) -> Expr: ...
119
    def sum(self) -> Expr: ...
120
    def mean(self) -> Expr: ...
121
    def max(self) -> Expr: ...
122
    def min(self) -> Expr: ...
123
    def count(self) -> Expr: ...
124

125
def col(name: str | DataType) -> Expr: ...
126
def lit(value: Any, dtype: DataType | None = None) -> Expr: ...
127
def when(predicate: Expr) -> When: ...
128
```
129

130
[Expressions and Column Operations](./expressions.md)
131

132
### Data Types and Schema
133

134
Comprehensive type system with numeric, text, temporal, and nested types, plus schema definition and validation capabilities.
135

136
```python { .api }
137
# Numeric types
138
class Int8: ...
139
class Int16: ...
140
class Int32: ...
141
class Int64: ...
142
class Int128: ...
143
class UInt8: ...
144
class UInt16: ...
145
class UInt32: ...
146
class UInt64: ...
147
class Float32: ...
148
class Float64: ...
149
class Decimal: ...
150

151
# Text types  
152
class String: ...
153
class Binary: ...
154

155
# Temporal types
156
class Date: ...
157
class Datetime: ...
158
class Time: ...
159
class Duration: ...
160

161
# Special types
162
class Boolean: ...
163
class Categorical: ...
164
class Enum: ...
165
class List: ...
166
class Array: ...
167
class Struct: ...
168

169
class Schema:
170
    def __init__(self, schema: Mapping[str, DataType] | Iterable[tuple[str, DataType]] | None = None): ...
171
```
172

173
[Data Types and Schema](./data-types.md)
174

175
### I/O Operations
176

177
Comprehensive I/O capabilities supporting 10+ file formats with both eager reading and lazy scanning for performance optimization.
178

179
```python { .api }
180
# CSV
181
def read_csv(source: str | Path | IO[str] | IO[bytes] | bytes, **kwargs) -> DataFrame: ...
182
def scan_csv(source: str | Path | list[str] | list[Path], **kwargs) -> LazyFrame: ...
183

184
# Parquet
185
def read_parquet(source: str | Path | IO[bytes] | bytes, **kwargs) -> DataFrame: ...
186
def scan_parquet(source: str | Path | list[str] | list[Path], **kwargs) -> LazyFrame: ...
187

188
# JSON
189
def read_json(source: str | Path | IO[str] | IO[bytes] | bytes, **kwargs) -> DataFrame: ...
190
def read_ndjson(source: str | Path | IO[str] | IO[bytes] | bytes, **kwargs) -> DataFrame: ...
191

192
# Database
193
def read_database(query: str, connection: str | ConnectionOrCursor, **kwargs) -> DataFrame: ...
194

195
# Excel
196
def read_excel(source: str | Path | IO[bytes] | bytes, **kwargs) -> DataFrame: ...
197
```
198

199
[I/O Operations](./io-operations.md)
200

201
### Functions and Utilities
202

203
Built-in functions for aggregation, transformations, date/time operations, string manipulation, and utility functions.
204

205
```python { .api }
206
# Aggregation functions
207
def sum(*exprs) -> Expr: ...
208
def mean(*exprs) -> Expr: ...
209
def max(*exprs) -> Expr: ...
210
def min(*exprs) -> Expr: ...
211
def count(*exprs) -> Expr: ...
212
def all(*exprs) -> Expr: ...
213
def any(*exprs) -> Expr: ...
214

215
# Date/time functions
216
def date(year: int | Expr, month: int | Expr, day: int | Expr) -> Expr: ...
217
def datetime(year: int | Expr, month: int | Expr, day: int | Expr, hour: int | Expr = 0, minute: int | Expr = 0, second: int | Expr = 0, microsecond: int | Expr = 0, *, time_unit: TimeUnit = "us", time_zone: str | None = None) -> Expr: ...
218
def date_range(start: date | datetime | IntoExpr, end: date | datetime | IntoExpr, interval: str | timedelta = "1d", *, closed: ClosedInterval = "both", time_unit: TimeUnit | None = None, time_zone: str | None = None, eager: bool = False) -> Expr | Series: ...
219

220
# String functions
221
def concat_str(exprs: IntoExpr, *, separator: str = "", ignore_nulls: bool = False) -> Expr: ...
222
```
223

224
[Functions and Utilities](./functions.md)
225

226
### SQL Interface
227

228
SQL query interface allowing standard SQL operations on DataFrames and integration with existing SQL workflows.
229

230
```python { .api }
231
class SQLContext:
232
    def __init__(self, frames: dict[str, DataFrame | LazyFrame] | None = None, **named_frames: DataFrame | LazyFrame): ...
233
    def execute(self, query: str, *, eager: bool = True) -> DataFrame | LazyFrame: ...
234
    def register(self, name: str, frame: DataFrame | LazyFrame) -> SQLContext: ...
235
    def unregister(self, name: str) -> SQLContext: ...
236

237
def sql(query: str, *, eager: bool = True, **named_frames: DataFrame | LazyFrame) -> DataFrame | LazyFrame: ...
238
```
239

240
[SQL Interface](./sql-interface.md)
241

242
## Error Handling
243

244
Polars provides a comprehensive exception hierarchy for different error scenarios:
245

246
```python { .api }
247
# Core exceptions
248
class PolarsError(Exception): ...
249
class ColumnNotFoundError(PolarsError): ...
250
class ComputeError(PolarsError): ...
251
class DuplicateError(PolarsError): ...
252
class InvalidOperationError(PolarsError): ...
253
class NoDataError(PolarsError): ...
254
class OutOfBoundsError(PolarsError): ...
255
class PanicException(PolarsError): ...
256
class SchemaError(PolarsError): ...
257
class SchemaFieldNotFoundError(PolarsError): ...
258
class ShapeError(PolarsError): ...
259
class SQLInterfaceError(PolarsError): ...
260
class SQLSyntaxError(PolarsError): ...
261

262
# Warnings
263
class PolarsWarning(Exception): ...
264
class PerformanceWarning(PolarsWarning): ...
265
```
266

267
All operations can raise these exceptions when encountering invalid data, schema mismatches, or computational errors. Proper exception handling should be used for production code.