or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

config-utilities.mdcore-data-structures.mddata-types.mdexpressions.mdfunctions.mdindex.mdio-operations.mdselectors.mdsql-interface.md

data-types.mddocs/

0

# Data Types and Schema

1

2

Comprehensive type system with numeric, text, temporal, and nested types, plus schema definition and validation capabilities. Polars provides a rich type system that ensures data integrity and enables optimized operations.

3

4

## Capabilities

5

6

### Numeric Data Types

7

8

Integer and floating-point data types with various precision levels.

9

10

```python { .api }

11

# Signed integers

12

class Int8:

13

"""8-bit signed integer (-128 to 127)."""

14

15

class Int16:

16

"""16-bit signed integer (-32,768 to 32,767)."""

17

18

class Int32:

19

"""32-bit signed integer (-2^31 to 2^31-1)."""

20

21

class Int64:

22

"""64-bit signed integer (-2^63 to 2^63-1)."""

23

24

class Int128:

25

"""128-bit signed integer."""

26

27

# Unsigned integers

28

class UInt8:

29

"""8-bit unsigned integer (0 to 255)."""

30

31

class UInt16:

32

"""16-bit unsigned integer (0 to 65,535)."""

33

34

class UInt32:

35

"""32-bit unsigned integer (0 to 2^32-1)."""

36

37

class UInt64:

38

"""64-bit unsigned integer (0 to 2^64-1)."""

39

40

# Floating point

41

class Float32:

42

"""32-bit floating point number."""

43

44

class Float64:

45

"""64-bit floating point number."""

46

47

class Decimal:

48

"""Arbitrary precision decimal number."""

49

def __init__(self, precision: int | None = None, scale: int = 0):

50

"""

51

Create decimal type.

52

53

Parameters:

54

- precision: Number of significant digits

55

- scale: Number of decimal places

56

"""

57

```

58

59

### Text Data Types

60

61

String and binary data types for text processing.

62

63

```python { .api }

64

class String:

65

"""UTF-8 encoded string data (variable length)."""

66

67

class Utf8:

68

"""UTF-8 encoded string data (alias for String)."""

69

70

class Binary:

71

"""Binary data (bytes)."""

72

```

73

74

### Temporal Data Types

75

76

Date, time, and duration types for temporal data processing.

77

78

```python { .api }

79

class Date:

80

"""Calendar date (year, month, day)."""

81

82

class Datetime:

83

"""Date and time with optional timezone."""

84

def __init__(self, time_unit: TimeUnit = "us", time_zone: str | None = None):

85

"""

86

Create datetime type.

87

88

Parameters:

89

- time_unit: Time precision ("ns", "us", "ms", "s")

90

- time_zone: Timezone (e.g., "UTC", "America/New_York")

91

"""

92

93

class Time:

94

"""Time of day (hour, minute, second, subsecond)."""

95

96

class Duration:

97

"""Time duration/interval."""

98

def __init__(self, time_unit: TimeUnit = "us"):

99

"""

100

Create duration type.

101

102

Parameters:

103

- time_unit: Time precision ("ns", "us", "ms", "s")

104

"""

105

```

106

107

### Boolean and Special Types

108

109

Boolean values and special data types.

110

111

```python { .api }

112

class Boolean:

113

"""Boolean true/false values."""

114

115

class Null:

116

"""Null type (no data)."""

117

118

class Unknown:

119

"""Unknown type placeholder."""

120

121

class Object:

122

"""Python object type (stores arbitrary Python objects)."""

123

```

124

125

### Categorical and Enumerated Types

126

127

Types for categorical and enumerated data with optimized storage.

128

129

```python { .api }

130

class Categorical:

131

"""Categorical data with string categories."""

132

def __init__(self, ordering: CategoricalOrdering = "physical"):

133

"""

134

Create categorical type.

135

136

Parameters:

137

- ordering: Category ordering ("physical" or "lexical")

138

"""

139

140

class Enum:

141

"""Enumerated type with fixed set of string values."""

142

def __init__(self, categories: list[str] | Series):

143

"""

144

Create enum type.

145

146

Parameters:

147

- categories: Fixed list of valid string values

148

"""

149

150

class Categories:

151

"""Categories metadata for categorical types."""

152

```

153

154

### Nested Data Types

155

156

Complex nested data structures including lists, arrays, and structs.

157

158

```python { .api }

159

class List:

160

"""Variable-length list of same-typed elements."""

161

def __init__(self, inner: DataType):

162

"""

163

Create list type.

164

165

Parameters:

166

- inner: Element data type

167

"""

168

169

class Array:

170

"""Fixed-length array of same-typed elements."""

171

def __init__(self, inner: DataType, shape: int | tuple[int, ...]):

172

"""

173

Create array type.

174

175

Parameters:

176

- inner: Element data type

177

- shape: Array dimensions

178

"""

179

180

class Struct:

181

"""Struct/record type with named fields."""

182

def __init__(self, fields: list[Field] | dict[str, DataType]):

183

"""

184

Create struct type.

185

186

Parameters:

187

- fields: List of Field objects or dict mapping names to types

188

"""

189

190

class Field:

191

"""Named field in struct type."""

192

def __init__(self, name: str, dtype: DataType):

193

"""

194

Create field.

195

196

Parameters:

197

- name: Field name

198

- dtype: Field data type

199

"""

200

```

201

202

### Schema Definition

203

204

Schema class for defining and validating DataFrame structure.

205

206

```python { .api }

207

class Schema:

208

def __init__(self, schema: Mapping[str, DataType] | Iterable[tuple[str, DataType]] | None = None):

209

"""

210

Create schema.

211

212

Parameters:

213

- schema: Mapping of column names to data types

214

"""

215

216

def __getitem__(self, item: str) -> DataType:

217

"""Get data type for column."""

218

219

def __contains__(self, item: str) -> bool:

220

"""Check if column exists in schema."""

221

222

def __iter__(self) -> Iterator[str]:

223

"""Iterate over column names."""

224

225

def __len__(self) -> int:

226

"""Get number of columns."""

227

228

def names(self) -> list[str]:

229

"""Get all column names."""

230

231

def dtypes(self) -> list[DataType]:

232

"""Get all data types."""

233

234

def to_python(self) -> dict[str, type]:

235

"""Convert to Python type mapping."""

236

```

237

238

### Type Utilities

239

240

Utility functions for working with data types.

241

242

```python { .api }

243

def dtype_to_py_type(dtype: DataType) -> type:

244

"""

245

Convert Polars data type to Python type.

246

247

Parameters:

248

- dtype: Polars data type

249

250

Returns:

251

Corresponding Python type

252

"""

253

254

def is_polars_dtype(dtype: Any) -> bool:

255

"""

256

Check if object is a Polars data type.

257

258

Parameters:

259

- dtype: Object to check

260

261

Returns:

262

True if Polars data type

263

"""

264

265

def py_type_to_constructor(py_type: type) -> DataType:

266

"""

267

Get Polars constructor for Python type.

268

269

Parameters:

270

- py_type: Python type

271

272

Returns:

273

Polars data type constructor

274

"""

275

276

def numpy_char_code_to_dtype(char_code: str) -> DataType | None:

277

"""

278

Convert NumPy character code to Polars data type.

279

280

Parameters:

281

- char_code: NumPy dtype character code

282

283

Returns:

284

Polars data type or None

285

"""

286

287

def unpack_dtypes(*dtypes: DataType | Iterable[DataType]) -> list[DataType]:

288

"""

289

Unpack and flatten data type specifications.

290

291

Parameters:

292

- dtypes: Data type specifications

293

294

Returns:

295

Flattened list of data types

296

"""

297

```

298

299

### Type Groups and Constants

300

301

Type groups and constants for working with related data types.

302

303

```python { .api }

304

class IntegerType:

305

"""Base class for integer types."""

306

307

class TemporalType:

308

"""Base class for temporal types."""

309

310

class DataTypeClass:

311

"""Metaclass for data type classes."""

312

313

# Constants

314

N_INFER_DEFAULT: int # Default number of rows for type inference

315

DTYPE_TEMPORAL_UNITS: frozenset[str] # Valid temporal units

316

```

317

318

## Usage Examples

319

320

### Basic Type Usage

321

322

```python

323

import polars as pl

324

325

# Creating DataFrames with explicit types

326

df = pl.DataFrame({

327

"id": [1, 2, 3],

328

"name": ["Alice", "Bob", "Charlie"],

329

"salary": [50000.0, 60000.0, 70000.0],

330

"is_active": [True, False, True]

331

}, schema={

332

"id": pl.Int32,

333

"name": pl.String,

334

"salary": pl.Float64,

335

"is_active": pl.Boolean

336

})

337

338

# Schema inspection

339

print(df.schema)

340

print(df.dtypes)

341

```

342

343

### Working with Temporal Types

344

345

```python

346

# Creating datetime columns with different precisions

347

df = pl.DataFrame({

348

"timestamp_us": ["2023-01-01 12:00:00"],

349

"timestamp_ms": ["2023-01-01 12:00:00"],

350

"date_only": ["2023-01-01"],

351

"time_only": ["12:00:00"]

352

}).with_columns([

353

pl.col("timestamp_us").str.strptime(pl.Datetime("us")),

354

pl.col("timestamp_ms").str.strptime(pl.Datetime("ms")),

355

pl.col("date_only").str.strptime(pl.Date),

356

pl.col("time_only").str.strptime(pl.Time)

357

])

358

359

# Working with timezones

360

df_tz = pl.DataFrame({

361

"utc_time": ["2023-01-01 12:00:00"]

362

}).with_columns([

363

pl.col("utc_time").str.strptime(pl.Datetime("us", "UTC"))

364

])

365

```

366

367

### Categorical and Enum Types

368

369

```python

370

# Categorical data

371

df = pl.DataFrame({

372

"category": ["A", "B", "A", "C", "B"]

373

}).with_columns([

374

pl.col("category").cast(pl.Categorical)

375

])

376

377

# Enum with fixed categories

378

df = pl.DataFrame({

379

"status": ["active", "inactive", "pending"]

380

}).with_columns([

381

pl.col("status").cast(pl.Enum(["active", "inactive", "pending"]))

382

])

383

```

384

385

### Nested Data Types

386

387

```python

388

# List columns

389

df = pl.DataFrame({

390

"numbers": [[1, 2, 3], [4, 5], [6, 7, 8, 9]]

391

})

392

print(df.dtypes) # [List(Int64)]

393

394

# Struct columns

395

df = pl.DataFrame({

396

"person": [

397

{"name": "Alice", "age": 25},

398

{"name": "Bob", "age": 30}

399

]

400

})

401

print(df.dtypes) # [Struct([Field('name', String), Field('age', Int64)])]

402

403

# Creating nested types explicitly

404

schema = pl.Schema({

405

"id": pl.Int32,

406

"scores": pl.List(pl.Float64),

407

"metadata": pl.Struct([

408

pl.Field("created_at", pl.Datetime),

409

pl.Field("version", pl.String)

410

])

411

})

412

```

413

414

### Type Casting and Conversion

415

416

```python

417

df = pl.DataFrame({

418

"text_numbers": ["1", "2", "3"],

419

"floats": [1.0, 2.0, 3.0]

420

})

421

422

# Cast to different types

423

result = df.with_columns([

424

pl.col("text_numbers").cast(pl.Int32).alias("integers"),

425

pl.col("floats").cast(pl.Int64).alias("rounded")

426

])

427

428

# Safe casting with error handling

429

result = df.with_columns([

430

pl.col("text_numbers").cast(pl.Int32, strict=False).alias("safe_cast")

431

])

432

```

433

434

### Schema Validation

435

436

```python

437

# Define expected schema

438

expected_schema = pl.Schema({

439

"id": pl.Int32,

440

"name": pl.String,

441

"amount": pl.Float64,

442

"timestamp": pl.Datetime("us")

443

})

444

445

# Read with schema validation

446

df = pl.read_csv("data.csv", schema=expected_schema)

447

448

# Override specific types

449

df = pl.read_csv("data.csv", schema_overrides={

450

"id": pl.String, # Read ID as string instead of number

451

"amount": pl.Decimal(10, 2) # Use decimal for precise amounts

452

})

453

```

454

455

### Working with Decimal Types

456

457

```python

458

# High precision decimal calculations

459

df = pl.DataFrame({

460

"price": ["19.99", "29.99", "9.95"]

461

}).with_columns([

462

pl.col("price").cast(pl.Decimal(10, 2))

463

])

464

465

# Financial calculations maintaining precision

466

result = df.with_columns([

467

(pl.col("price") * pl.lit("1.08")).alias("with_tax"),

468

(pl.col("price") * pl.lit("0.9")).alias("discounted")

469

])

470

```

471

472

### Type Inspection and Utilities

473

474

```python

475

# Check data types

476

df = pl.DataFrame({"mixed": [1, 2.5, "text"]})

477

print(pl.dtype_to_py_type(df.dtypes[0]))

478

479

# Type checking

480

schema = df.schema

481

for name, dtype in schema.items():

482

print(f"{name}: {dtype}")

483

if isinstance(dtype, pl.List):

484

print(f" List element type: {dtype.inner}")

485

elif isinstance(dtype, pl.Struct):

486

print(f" Struct fields: {dtype.fields}")

487

```