or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-polars-u64-idx

Blazingly fast DataFrame library with 64-bit index support for handling datasets with more than 4.2 billion rows

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/polars-u64-idx@1.33.x

To install, run

npx @tessl/cli install tessl/pypi-polars-u64-idx@1.33.0

0

# Polars u64-idx

1

2

Polars is a blazingly fast DataFrame library optimized for performance and memory efficiency. This variant provides 64-bit index support, enabling analysis of datasets with more than 4.2 billion rows. Built in Rust using Apache Arrow Columnar Format, it features lazy/eager execution, multi-threading, SIMD optimization, query optimization, and hybrid streaming for larger-than-RAM datasets.

3

4

## Package Information

5

6

- **Package Name**: polars-u64-idx

7

- **Language**: Python

8

- **Installation**: `pip install polars-u64-idx`

9

10

## Core Imports

11

12

```python

13

import polars as pl

14

```

15

16

For specific functionality:

17

18

```python

19

# Core data structures

20

from polars import DataFrame, Series, LazyFrame

21

22

# Data types

23

from polars import Int64, Float64, String, Date, Datetime

24

25

# Functions and expressions

26

from polars import col, lit, when, concat

27

```

28

29

## Basic Usage

30

31

```python

32

import polars as pl

33

34

# Create a DataFrame

35

df = pl.DataFrame({

36

"name": ["Alice", "Bob", "Charlie"],

37

"age": [25, 30, 35],

38

"city": ["New York", "London", "Tokyo"]

39

})

40

41

# Basic operations

42

result = (df

43

.filter(pl.col("age") > 28)

44

.select([

45

pl.col("name"),

46

pl.col("age"),

47

pl.col("city").alias("location")

48

])

49

.sort("age")

50

)

51

52

print(result)

53

54

# Lazy evaluation for larger datasets

55

lazy_df = (pl

56

.scan_csv("large_file.csv")

57

.filter(pl.col("amount") > 1000)

58

.group_by("category")

59

.agg([

60

pl.col("amount").sum().alias("total_amount"),

61

pl.col("id").count().alias("count")

62

])

63

)

64

65

# Execute the lazy computation

66

result = lazy_df.collect()

67

```

68

69

## Architecture

70

71

Polars uses a columnar data model built on Apache Arrow with several key components:

72

73

- **DataFrame/Series**: Eager evaluation data structures for immediate computation

74

- **LazyFrame**: Deferred evaluation with query optimization for better performance

75

- **Expressions**: Composable operations that work on columns (Expr class)

76

- **Data Types**: Comprehensive type system with 20+ types including nested types

77

- **I/O Engine**: Native support for 10+ file formats with lazy scanning capabilities

78

- **Query Engine**: Rust-based OLAP engine with predicate pushdown, projection pushdown, and streaming

79

80

The 64-bit index variant removes the 4.2 billion row limit of standard Polars, making it suitable for very large datasets while maintaining the same API and performance characteristics.

81

82

## Capabilities

83

84

### Core Data Structures

85

86

Primary data structures for working with tabular data, including eager DataFrame/Series for immediate operations and LazyFrame for optimized query execution.

87

88

```python { .api }

89

class DataFrame:

90

def __init__(self, data=None, schema=None, schema_overrides=None, orient=None, infer_schema_length=N_INFER_DEFAULT, nan_to_null=False): ...

91

def select(self, *exprs, **named_exprs) -> DataFrame: ...

92

def filter(self, *predicates, **constraints) -> DataFrame: ...

93

def with_columns(self, *exprs, **named_exprs) -> DataFrame: ...

94

def group_by(self, *by, maintain_order=False, **named_by) -> GroupBy: ...

95

def sort(self, by, *, descending=False, nulls_last=False, multithreaded=True) -> DataFrame: ...

96

def join(self, other, on=None, how="inner", *, left_on=None, right_on=None, suffix="_right", validate="m:m", join_nulls=False, coalesce=None) -> DataFrame: ...

97

98

class Series:

99

def __init__(self, name=None, values=None, dtype=None, strict=True, nan_to_null=False, dtype_if_empty=Null): ...

100

101

class LazyFrame:

102

def select(self, *exprs, **named_exprs) -> LazyFrame: ...

103

def filter(self, *predicates, **constraints) -> LazyFrame: ...

104

def collect(self, *, type_coercion=True, predicate_pushdown=True, projection_pushdown=True, simplify_expression=True, slice_pushdown=True, comm_subplan_elim=True, comm_subexpr_elim=True, cluster_with_columns=True, no_optimization=False, streaming=False, background=False, _eager=False) -> DataFrame: ...

105

```

106

107

[Core Data Structures](./core-data-structures.md)

108

109

### Expressions and Column Operations

110

111

Powerful expression system for column transformations, aggregations, and complex operations that work across DataFrame and LazyFrame.

112

113

```python { .api }

114

class Expr:

115

def alias(self, name: str) -> Expr: ...

116

def cast(self, dtype: DataType | type[Any], *, strict: bool = True) -> Expr: ...

117

def filter(self, predicate: Expr) -> Expr: ...

118

def sort(self, *, descending: bool = False, nulls_last: bool = False) -> Expr: ...

119

def sum(self) -> Expr: ...

120

def mean(self) -> Expr: ...

121

def max(self) -> Expr: ...

122

def min(self) -> Expr: ...

123

def count(self) -> Expr: ...

124

125

def col(name: str | DataType) -> Expr: ...

126

def lit(value: Any, dtype: DataType | None = None) -> Expr: ...

127

def when(predicate: Expr) -> When: ...

128

```

129

130

[Expressions and Column Operations](./expressions.md)

131

132

### Data Types and Schema

133

134

Comprehensive type system with numeric, text, temporal, and nested types, plus schema definition and validation capabilities.

135

136

```python { .api }

137

# Numeric types

138

class Int8: ...

139

class Int16: ...

140

class Int32: ...

141

class Int64: ...

142

class Int128: ...

143

class UInt8: ...

144

class UInt16: ...

145

class UInt32: ...

146

class UInt64: ...

147

class Float32: ...

148

class Float64: ...

149

class Decimal: ...

150

151

# Text types

152

class String: ...

153

class Binary: ...

154

155

# Temporal types

156

class Date: ...

157

class Datetime: ...

158

class Time: ...

159

class Duration: ...

160

161

# Special types

162

class Boolean: ...

163

class Categorical: ...

164

class Enum: ...

165

class List: ...

166

class Array: ...

167

class Struct: ...

168

169

class Schema:

170

def __init__(self, schema: Mapping[str, DataType] | Iterable[tuple[str, DataType]] | None = None): ...

171

```

172

173

[Data Types and Schema](./data-types.md)

174

175

### I/O Operations

176

177

Comprehensive I/O capabilities supporting 10+ file formats with both eager reading and lazy scanning for performance optimization.

178

179

```python { .api }

180

# CSV

181

def read_csv(source: str | Path | IO[str] | IO[bytes] | bytes, **kwargs) -> DataFrame: ...

182

def scan_csv(source: str | Path | list[str] | list[Path], **kwargs) -> LazyFrame: ...

183

184

# Parquet

185

def read_parquet(source: str | Path | IO[bytes] | bytes, **kwargs) -> DataFrame: ...

186

def scan_parquet(source: str | Path | list[str] | list[Path], **kwargs) -> LazyFrame: ...

187

188

# JSON

189

def read_json(source: str | Path | IO[str] | IO[bytes] | bytes, **kwargs) -> DataFrame: ...

190

def read_ndjson(source: str | Path | IO[str] | IO[bytes] | bytes, **kwargs) -> DataFrame: ...

191

192

# Database

193

def read_database(query: str, connection: str | ConnectionOrCursor, **kwargs) -> DataFrame: ...

194

195

# Excel

196

def read_excel(source: str | Path | IO[bytes] | bytes, **kwargs) -> DataFrame: ...

197

```

198

199

[I/O Operations](./io-operations.md)

200

201

### Functions and Utilities

202

203

Built-in functions for aggregation, transformations, date/time operations, string manipulation, and utility functions.

204

205

```python { .api }

206

# Aggregation functions

207

def sum(*exprs) -> Expr: ...

208

def mean(*exprs) -> Expr: ...

209

def max(*exprs) -> Expr: ...

210

def min(*exprs) -> Expr: ...

211

def count(*exprs) -> Expr: ...

212

def all(*exprs) -> Expr: ...

213

def any(*exprs) -> Expr: ...

214

215

# Date/time functions

216

def date(year: int | Expr, month: int | Expr, day: int | Expr) -> Expr: ...

217

def datetime(year: int | Expr, month: int | Expr, day: int | Expr, hour: int | Expr = 0, minute: int | Expr = 0, second: int | Expr = 0, microsecond: int | Expr = 0, *, time_unit: TimeUnit = "us", time_zone: str | None = None) -> Expr: ...

218

def date_range(start: date | datetime | IntoExpr, end: date | datetime | IntoExpr, interval: str | timedelta = "1d", *, closed: ClosedInterval = "both", time_unit: TimeUnit | None = None, time_zone: str | None = None, eager: bool = False) -> Expr | Series: ...

219

220

# String functions

221

def concat_str(exprs: IntoExpr, *, separator: str = "", ignore_nulls: bool = False) -> Expr: ...

222

```

223

224

[Functions and Utilities](./functions.md)

225

226

### SQL Interface

227

228

SQL query interface allowing standard SQL operations on DataFrames and integration with existing SQL workflows.

229

230

```python { .api }

231

class SQLContext:

232

def __init__(self, frames: dict[str, DataFrame | LazyFrame] | None = None, **named_frames: DataFrame | LazyFrame): ...

233

def execute(self, query: str, *, eager: bool = True) -> DataFrame | LazyFrame: ...

234

def register(self, name: str, frame: DataFrame | LazyFrame) -> SQLContext: ...

235

def unregister(self, name: str) -> SQLContext: ...

236

237

def sql(query: str, *, eager: bool = True, **named_frames: DataFrame | LazyFrame) -> DataFrame | LazyFrame: ...

238

```

239

240

[SQL Interface](./sql-interface.md)

241

242

## Error Handling

243

244

Polars provides a comprehensive exception hierarchy for different error scenarios:

245

246

```python { .api }

247

# Core exceptions

248

class PolarsError(Exception): ...

249

class ColumnNotFoundError(PolarsError): ...

250

class ComputeError(PolarsError): ...

251

class DuplicateError(PolarsError): ...

252

class InvalidOperationError(PolarsError): ...

253

class NoDataError(PolarsError): ...

254

class OutOfBoundsError(PolarsError): ...

255

class PanicException(PolarsError): ...

256

class SchemaError(PolarsError): ...

257

class SchemaFieldNotFoundError(PolarsError): ...

258

class ShapeError(PolarsError): ...

259

class SQLInterfaceError(PolarsError): ...

260

class SQLSyntaxError(PolarsError): ...

261

262

# Warnings

263

class PolarsWarning(Exception): ...

264

class PerformanceWarning(PolarsWarning): ...

265

```

266

267

All operations can raise these exceptions when encountering invalid data, schema mismatches, or computational errors. Proper exception handling should be used for production code.