or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

column-selection.mdconfiguration.mdcore-data-structures.mddata-conversion.mddata-types.mderror-handling.mdfunctions-expressions.mdindex.mdio-operations.mdsql-interface.md

index.mddocs/

0

# Polars

1

2

A blazingly fast DataFrame library for Python built on Apache Arrow Columnar Format with lazy and eager execution modes. Polars provides comprehensive data manipulation and analysis capabilities with multi-threaded processing, SIMD optimization, query optimization, and powerful expression APIs designed for maximum performance in data science workflows.

3

4

## Package Information

5

6

- **Package Name**: polars

7

- **Language**: Python

8

- **Installation**: `pip install polars`

9

- **Documentation**: https://docs.pola.rs/api/python/stable/reference/index.html

10

11

## Core Imports

12

13

```python

14

import polars as pl

15

```

16

17

For specific components:

18

19

```python

20

from polars import DataFrame, LazyFrame, Series, Expr

21

from polars import col, lit, when

22

from polars import read_csv, read_parquet, scan_csv

23

```

24

25

## Basic Usage

26

27

```python

28

import polars as pl

29

30

# Create DataFrame from dictionary

31

df = pl.DataFrame({

32

"name": ["Alice", "Bob", "Charlie"],

33

"age": [25, 30, 35],

34

"city": ["New York", "London", "Tokyo"]

35

})

36

37

# Basic operations

38

result = (

39

df

40

.filter(pl.col("age") > 27)

41

.select([

42

pl.col("name"),

43

pl.col("age"),

44

pl.col("city").alias("location")

45

])

46

.sort("age", descending=True)

47

)

48

49

print(result)

50

51

# Lazy evaluation for query optimization

52

lazy_result = (

53

pl.scan_csv("data.csv")

54

.filter(pl.col("revenue") > 1000)

55

.group_by("department")

56

.agg([

57

pl.col("revenue").sum().alias("total_revenue"),

58

pl.col("employee_id").count().alias("employee_count")

59

])

60

.collect()

61

)

62

```

63

64

## Architecture

65

66

Polars provides two main execution paradigms:

67

68

- **Eager Execution**: Immediate computation with DataFrame and Series

69

- **Lazy Execution**: Deferred computation with LazyFrame for query optimization

70

71

Key architectural components:

72

73

- **DataFrame**: Eager evaluation data structure for immediate operations

74

- **LazyFrame**: Lazy evaluation with automatic query optimization and predicate pushdown

75

- **Series**: One-dimensional data structure with vectorized operations

76

- **Expr**: Expression API for column operations and transformations

77

- **Arrow Integration**: Built on Apache Arrow for efficient memory layout and interoperability

78

79

## Capabilities

80

81

### Core Data Structures

82

83

Primary data structures for eager and lazy computation, providing comprehensive data manipulation capabilities with vectorized operations and type safety.

84

85

```python { .api }

86

class DataFrame:

87

def __init__(self, data=None, schema=None, *, schema_overrides=None, strict=True, orient=None, infer_schema_length=None, nan_to_null=False): ...

88

def select(self, *exprs, **named_exprs) -> DataFrame: ...

89

def filter(self, *predicates) -> DataFrame: ...

90

def with_columns(self, *exprs, **named_exprs) -> DataFrame: ...

91

def group_by(self, *by, maintain_order=False) -> GroupBy: ...

92

93

class LazyFrame:

94

def select(self, *exprs, **named_exprs) -> LazyFrame: ...

95

def filter(self, *predicates) -> LazyFrame: ...

96

def with_columns(self, *exprs, **named_exprs) -> LazyFrame: ...

97

def collect(self, **kwargs) -> DataFrame: ...

98

99

class Series:

100

def __init__(self, name=None, values=None, dtype=None): ...

101

def filter(self, predicate) -> Series: ...

102

def map_elements(self, function, return_dtype=None) -> Series: ...

103

104

class Expr:

105

def alias(self, name: str) -> Expr: ...

106

def filter(self, predicate) -> Expr: ...

107

def sum(self) -> Expr: ...

108

```

109

110

[Core Data Structures](./core-data-structures.md)

111

112

### Data Types and Schema

113

114

Comprehensive type system supporting primitive types, temporal data, nested structures, and schema validation with automatic type inference and casting.

115

116

```python { .api }

117

# Primitive Types

118

Boolean: DataType

119

Int8, Int16, Int32, Int64, Int128: DataType

120

UInt8, UInt16, UInt32, UInt64: DataType

121

Float32, Float64: DataType

122

Decimal: DataType

123

124

# String and Binary Types

125

String: DataType

126

Binary: DataType

127

Categorical: DataType

128

Enum: DataType

129

130

# Temporal Types

131

Date: DataType

132

Datetime: DataType

133

Time: DataType

134

Duration: DataType

135

136

# Nested Types

137

List: DataType

138

Array: DataType

139

Struct: DataType

140

141

class Schema:

142

def __init__(self, schema): ...

143

def names(self) -> list[str]: ...

144

def dtypes(self) -> list[DataType]: ...

145

```

146

147

[Data Types and Schema](./data-types.md)

148

149

### Functions and Expressions

150

151

90+ utility functions for data construction, aggregation, statistical operations, and expression building with support for vectorized computations and window functions.

152

153

```python { .api }

154

# Construction Functions

155

def col(name: str) -> Expr: ...

156

def lit(value) -> Expr: ...

157

def when(predicate) -> When: ...

158

def struct(*exprs) -> Expr: ...

159

160

# Aggregation Functions

161

def sum(*exprs) -> Expr: ...

162

def mean(*exprs) -> Expr: ...

163

def count(*exprs) -> Expr: ...

164

def max(*exprs) -> Expr: ...

165

def min(*exprs) -> Expr: ...

166

167

# Range Functions

168

def arange(start, end, step=1, dtype=None) -> Expr: ...

169

def date_range(start, end, interval="1d") -> Expr: ...

170

def int_range(start, end, step=1, dtype=None) -> Expr: ...

171

172

# Statistical Functions

173

def corr(a, b, method="pearson") -> Expr: ...

174

def std(column, ddof=1) -> Expr: ...

175

def var(column, ddof=1) -> Expr: ...

176

```

177

178

[Functions and Expressions](./functions-expressions.md)

179

180

### Input/Output Operations

181

182

Comprehensive I/O support for 15+ file formats including CSV, Parquet, JSON, Excel, databases, and cloud storage with both eager reading and lazy scanning capabilities.

183

184

```python { .api }

185

# Read Functions (Eager)

186

def read_csv(source, **kwargs) -> DataFrame: ...

187

def read_parquet(source, **kwargs) -> DataFrame: ...

188

def read_json(source, **kwargs) -> DataFrame: ...

189

def read_excel(source, **kwargs) -> DataFrame: ...

190

def read_database(query, connection, **kwargs) -> DataFrame: ...

191

192

# Scan Functions (Lazy)

193

def scan_csv(source, **kwargs) -> LazyFrame: ...

194

def scan_parquet(source, **kwargs) -> LazyFrame: ...

195

def scan_ndjson(source, **kwargs) -> LazyFrame: ...

196

def scan_delta(source, **kwargs) -> LazyFrame: ...

197

198

# Cloud Credentials

199

class CredentialProviderAWS:

200

def __init__(self, **kwargs): ...

201

202

class CredentialProviderGCP:

203

def __init__(self, **kwargs): ...

204

```

205

206

[Input/Output Operations](./io-operations.md)

207

208

### SQL Interface

209

210

SQL query execution capabilities with SQLContext for managing multiple DataFrames and native SQL expression support within DataFrame operations.

211

212

```python { .api }

213

class SQLContext:

214

def __init__(self): ...

215

def register(self, name: str, frame) -> None: ...

216

def execute(self, query: str, **kwargs) -> DataFrame: ...

217

def tables(self) -> list[str]: ...

218

219

def sql(query: str, **kwargs) -> DataFrame: ...

220

def sql_expr(sql: str) -> Expr: ...

221

```

222

223

[SQL Interface](./sql-interface.md)

224

225

### Configuration and Optimization

226

227

Global configuration system for controlling formatting, streaming behavior, and optimization settings with context managers and persistent configuration.

228

229

```python { .api }

230

class Config:

231

@classmethod

232

def set_fmt_str_lengths(cls, n: int) -> type[Config]: ...

233

@classmethod

234

def set_tbl_rows(cls, n: int) -> type[Config]: ...

235

@classmethod

236

def set_streaming_chunk_size(cls, size: int) -> type[Config]: ...

237

@classmethod

238

def restore_defaults(cls) -> type[Config]: ...

239

240

class QueryOptFlags:

241

def __init__(self, **kwargs): ...

242

243

class GPUEngine:

244

def __init__(self, **kwargs): ...

245

```

246

247

[Configuration and Optimization](./configuration.md)

248

249

### Column Selection

250

251

Advanced column selection system with 30+ selector functions supporting pattern matching, data type filtering, and logical operations for complex column manipulation.

252

253

```python { .api }

254

import polars.selectors as cs

255

256

# Data Type Selectors

257

def by_dtype(dtypes) -> Selector: ...

258

def numeric() -> Selector: ...

259

def string() -> Selector: ...

260

def temporal() -> Selector: ...

261

def boolean() -> Selector: ...

262

263

# Pattern Selectors

264

def contains(pattern: str) -> Selector: ...

265

def starts_with(prefix: str) -> Selector: ...

266

def ends_with(suffix: str) -> Selector: ...

267

def matches(pattern: str) -> Selector: ...

268

269

# Index Selectors

270

def by_index(indices) -> Selector: ...

271

def first(n: int = 1) -> Selector: ...

272

def last(n: int = 1) -> Selector: ...

273

```

274

275

[Column Selection](./column-selection.md)

276

277

### Data Conversion

278

279

Seamless integration with pandas, NumPy, PyArrow, and PyTorch through conversion functions supporting bidirectional data exchange with automatic schema mapping.

280

281

```python { .api }

282

def from_pandas(df, **kwargs) -> DataFrame: ...

283

def from_numpy(data, schema=None, **kwargs) -> DataFrame: ...

284

def from_arrow(data, **kwargs) -> DataFrame: ...

285

def from_dict(data, schema=None) -> DataFrame: ...

286

def from_dicts(dicts, schema=None) -> DataFrame: ...

287

def from_torch(tensor, **kwargs) -> DataFrame: ...

288

def json_normalize(data, **kwargs) -> DataFrame: ...

289

```

290

291

[Data Conversion](./data-conversion.md)

292

293

### Error Handling and Exceptions

294

295

Comprehensive exception hierarchy for handling data errors, computation failures, and I/O issues with specific error types for precise error handling.

296

297

```python { .api }

298

# Base Exceptions

299

class PolarsError(Exception): ...

300

class ComputeError(PolarsError): ...

301

302

# Data Exceptions

303

class ColumnNotFoundError(PolarsError): ...

304

class SchemaError(PolarsError): ...

305

class DuplicateError(PolarsError): ...

306

class ShapeError(PolarsError): ...

307

308

# Additional Row-Related Exceptions

309

class RowsError(PolarsError): ...

310

class NoRowsReturnedError(RowsError): ...

311

class TooManyRowsReturnedError(RowsError): ...

312

313

# SQL Exceptions

314

class SQLInterfaceError(PolarsError): ...

315

class SQLSyntaxError(PolarsError): ...

316

317

# Warning Types

318

class PerformanceWarning(UserWarning): ...

319

class CategoricalRemappingWarning(UserWarning): ...

320

```

321

322

[Error Handling](./error-handling.md)