or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

column-selection.mdconfiguration.mdcore-data-structures.mddata-conversion.mddata-types.mderror-handling.mdfunctions-expressions.mdindex.mdio-operations.mdsql-interface.md

data-types.mddocs/

0

# Data Types and Schema

1

2

Comprehensive type system supporting primitive types, temporal data, nested structures, and schema validation with automatic type inference, casting capabilities, and interoperability with Arrow and other data formats.

3

4

## Capabilities

5

6

### Primitive Data Types

7

8

Fundamental numeric and boolean types with support for various precision levels and null value handling.

9

10

```python { .api }

11

# Integer Types

12

Int8: DataType # 8-bit signed integer

13

Int16: DataType # 16-bit signed integer

14

Int32: DataType # 32-bit signed integer

15

Int64: DataType # 64-bit signed integer

16

Int128: DataType # 128-bit signed integer

17

18

# Unsigned Integer Types

19

UInt8: DataType # 8-bit unsigned integer

20

UInt16: DataType # 16-bit unsigned integer

21

UInt32: DataType # 32-bit unsigned integer

22

UInt64: DataType # 64-bit unsigned integer

23

24

# Floating Point Types

25

Float32: DataType # 32-bit floating point

26

Float64: DataType # 64-bit floating point

27

28

# Decimal Type

29

Decimal: DataType # High-precision decimal type

30

31

# Boolean Type

32

Boolean: DataType # Boolean true/false

33

```

34

35

### String and Binary Types

36

37

Text and binary data types with categorical optimization and encoding support.

38

39

```python { .api }

40

# String Types

41

String: DataType # UTF-8 string type

42

Utf8: DataType # Alias for String (deprecated)

43

44

# Binary Type

45

Binary: DataType # Binary data type

46

47

# Categorical Types

48

Categorical: DataType # Categorical string type for efficiency

49

Enum: DataType # Enumerated string type with fixed categories

50

51

# Special Types

52

Null: DataType # Null type

53

Unknown: DataType # Unknown type placeholder

54

Object: DataType # Python object type

55

```

56

57

### Temporal Data Types

58

59

Date, time, and duration types with timezone support and various precision levels.

60

61

```python { .api }

62

# Date and Time Types

63

Date: DataType # Date type (days since epoch)

64

Time: DataType # Time of day type

65

Duration: DataType # Time duration type

66

67

# DateTime Type with timezone support

68

Datetime: DataType # DateTime with optional timezone

69

70

# DateTime constructor

71

def Datetime(time_unit="us", time_zone=None) -> DataType:

72

"""

73

Create datetime type with specified precision and timezone.

74

75

Parameters:

76

- time_unit: Precision ("ns", "us", "ms")

77

- time_zone: Timezone string (e.g., "UTC", "America/New_York")

78

79

Returns:

80

Datetime data type

81

"""

82

83

# Duration constructor

84

def Duration(time_unit="us") -> DataType:

85

"""

86

Create duration type with specified precision.

87

88

Parameters:

89

- time_unit: Precision ("ns", "us", "ms")

90

91

Returns:

92

Duration data type

93

"""

94

```

95

96

### Nested Data Types

97

98

Complex nested structures supporting lists, arrays, and structured data.

99

100

```python { .api }

101

# List Type (variable length)

102

List: DataType

103

104

def List(inner=None) -> DataType:

105

"""

106

Create list type with specified inner type.

107

108

Parameters:

109

- inner: Inner data type for list elements

110

111

Returns:

112

List data type

113

"""

114

115

# Array Type (fixed length)

116

Array: DataType

117

118

def Array(inner=None, width=None) -> DataType:

119

"""

120

Create array type with specified inner type and width.

121

122

Parameters:

123

- inner: Inner data type for array elements

124

- width: Fixed width of array

125

126

Returns:

127

Array data type

128

"""

129

130

# Struct Type

131

Struct: DataType

132

133

def Struct(fields=None) -> DataType:

134

"""

135

Create struct type with specified fields.

136

137

Parameters:

138

- fields: List of Field objects or dict mapping names to types

139

140

Returns:

141

Struct data type

142

"""

143

```

144

145

### Schema Management

146

147

Schema definition and validation with field specifications and type checking.

148

149

```python { .api }

150

class Schema:

151

def __init__(self, schema=None):

152

"""

153

Create schema from various inputs.

154

155

Parameters:

156

- schema: Dict mapping column names to types, list of Field objects, or existing Schema

157

"""

158

159

def names(self) -> list[str]:

160

"""Get column names in schema order."""

161

162

def dtypes(self) -> list[DataType]:

163

"""Get column data types in schema order."""

164

165

def len(self) -> int:

166

"""Get number of columns in schema."""

167

168

def __contains__(self, item) -> bool:

169

"""Check if column name exists in schema."""

170

171

def __getitem__(self, item) -> DataType:

172

"""Get data type for column name."""

173

174

def __iter__(self):

175

"""Iterate over (name, dtype) pairs."""

176

177

class Field:

178

def __init__(self, name: str, dtype: DataType):

179

"""

180

Create field definition.

181

182

Parameters:

183

- name: Field name

184

- dtype: Field data type

185

"""

186

187

@property

188

def name(self) -> str:

189

"""Field name."""

190

191

@property

192

def dtype(self) -> DataType:

193

"""Field data type."""

194

```

195

196

### Type Utilities and Checking

197

198

Functions for type inspection, validation, and conversion operations.

199

200

```python { .api }

201

def dtype_of(value) -> DataType:

202

"""

203

Get the data type of a value or expression.

204

205

Parameters:

206

- value: Value or expression to inspect

207

208

Returns:

209

Data type of the value

210

"""

211

212

class DataType:

213

def __eq__(self, other) -> bool:

214

"""Check type equality."""

215

216

def __ne__(self, other) -> bool:

217

"""Check type inequality."""

218

219

def __hash__(self) -> int:

220

"""Hash for use in sets/dicts."""

221

222

def __repr__(self) -> str:

223

"""String representation."""

224

225

def is_numeric(self) -> bool:

226

"""Check if type is numeric."""

227

228

def is_integer(self) -> bool:

229

"""Check if type is integer."""

230

231

def is_float(self) -> bool:

232

"""Check if type is floating point."""

233

234

def is_temporal(self) -> bool:

235

"""Check if type is temporal."""

236

237

def is_nested(self) -> bool:

238

"""Check if type is nested (List, Array, Struct)."""

239

```

240

241

### Categorical Types

242

243

Categorical and enumerated types for memory-efficient string handling with optional ordering.

244

245

```python { .api }

246

def Categorical(ordering=None) -> DataType:

247

"""

248

Create categorical type.

249

250

Parameters:

251

- ordering: Ordering type ("physical" or "lexical")

252

253

Returns:

254

Categorical data type

255

"""

256

257

def Enum(categories=None) -> DataType:

258

"""

259

Create enum type with fixed categories.

260

261

Parameters:

262

- categories: List of valid category strings

263

264

Returns:

265

Enum data type

266

"""

267

268

class Categories:

269

def __init__(self, categories=None):

270

"""

271

Create categories definition.

272

273

Parameters:

274

- categories: List of category strings

275

"""

276

```

277

278

### Decimal Type

279

280

High-precision decimal type for financial and scientific calculations requiring exact decimal representation.

281

282

```python { .api }

283

def Decimal(precision=None, scale=0) -> DataType:

284

"""

285

Create decimal type with specified precision and scale.

286

287

Parameters:

288

- precision: Total number of digits (default: inferred)

289

- scale: Number of digits after decimal point

290

291

Returns:

292

Decimal data type

293

"""

294

```

295

296

## Usage Examples

297

298

### Basic Type Creation and Usage

299

300

```python

301

import polars as pl

302

303

# Create DataFrame with explicit types

304

df = pl.DataFrame({

305

"id": [1, 2, 3],

306

"price": [10.5, 20.0, 15.75],

307

"category": ["A", "B", "A"],

308

"date": ["2023-01-01", "2023-01-02", "2023-01-03"]

309

}, schema={

310

"id": pl.Int32,

311

"price": pl.Float64,

312

"category": pl.Categorical,

313

"date": pl.Date

314

})

315

316

# Check schema

317

print(df.schema)

318

print(df.dtypes)

319

```

320

321

### Working with Temporal Types

322

323

```python

324

# Create datetime with timezone

325

dt_type = pl.Datetime("ms", "UTC")

326

327

# Create DataFrame with temporal types

328

df = pl.DataFrame({

329

"timestamp": ["2023-01-01T10:30:00", "2023-01-01T11:45:00"],

330

"date": ["2023-01-01", "2023-01-02"],

331

"duration": ["1h 30m", "2h 15m"]

332

}, schema={

333

"timestamp": pl.Datetime("ms", "UTC"),

334

"date": pl.Date,

335

"duration": pl.Duration("ms")

336

})

337

338

# Convert and work with temporal data

339

result = df.with_columns([

340

pl.col("timestamp").dt.hour().alias("hour"),

341

pl.col("date").dt.day().alias("day"),

342

pl.col("duration").dt.total_seconds().alias("duration_seconds")

343

])

344

```

345

346

### Nested Types: Lists and Structs

347

348

```python

349

# Working with List types

350

df = pl.DataFrame({

351

"id": [1, 2, 3],

352

"scores": [[85, 90, 88], [92, 87, 95], [78, 82, 85]]

353

}, schema={

354

"id": pl.Int32,

355

"scores": pl.List(pl.Int32)

356

})

357

358

# Operations on lists

359

result = df.with_columns([

360

pl.col("scores").list.mean().alias("avg_score"),

361

pl.col("scores").list.max().alias("max_score"),

362

pl.col("scores").list.len().alias("num_scores")

363

])

364

365

# Working with Struct types

366

df = pl.DataFrame({

367

"person": [

368

{"name": "Alice", "age": 25, "city": "NYC"},

369

{"name": "Bob", "age": 30, "city": "LA"},

370

]

371

}, schema={

372

"person": pl.Struct([

373

pl.Field("name", pl.String),

374

pl.Field("age", pl.Int32),

375

pl.Field("city", pl.String)

376

])

377

})

378

379

# Access struct fields

380

result = df.with_columns([

381

pl.col("person").struct.field("name").alias("name"),

382

pl.col("person").struct.field("age").alias("age")

383

])

384

```

385

386

### Type Casting and Conversion

387

388

```python

389

# Type casting

390

df = pl.DataFrame({

391

"int_col": [1, 2, 3],

392

"str_col": ["10", "20", "30"],

393

"float_col": [1.1, 2.2, 3.3]

394

})

395

396

# Cast between types

397

result = df.with_columns([

398

pl.col("int_col").cast(pl.Float64).alias("int_as_float"),

399

pl.col("str_col").cast(pl.Int32).alias("str_as_int"),

400

pl.col("float_col").cast(pl.String).alias("float_as_str")

401

])

402

403

# Safe casting with strict=False

404

result = df.with_columns([

405

pl.col("str_col").cast(pl.Int32, strict=False).alias("safe_cast")

406

])

407

```

408

409

### Schema Validation and Overrides

410

411

```python

412

# Define schema with validation

413

schema = pl.Schema({

414

"id": pl.Int64,

415

"name": pl.String,

416

"score": pl.Float64,

417

"category": pl.Categorical

418

})

419

420

# Create DataFrame with schema validation

421

df = pl.DataFrame({

422

"id": [1, 2, 3],

423

"name": ["Alice", "Bob", "Charlie"],

424

"score": [85.5, 92.0, 78.5],

425

"category": ["A", "B", "A"]

426

}, schema=schema)

427

428

# Schema overrides for specific columns

429

df = pl.DataFrame({

430

"values": ["1", "2", "3"]

431

}, schema_overrides={

432

"values": pl.Int32 # Override inferred String type

433

})

434

```

435

436

### Working with Categorical Data

437

438

```python

439

# Create categorical for memory efficiency

440

df = pl.DataFrame({

441

"id": [1, 2, 3, 4, 5],

442

"category": ["Small", "Large", "Medium", "Small", "Large"]

443

}, schema={

444

"category": pl.Categorical

445

})

446

447

# Enum with fixed categories

448

df = pl.DataFrame({

449

"size": ["S", "M", "L", "S", "M"]

450

}, schema={

451

"size": pl.Enum(["S", "M", "L", "XL"])

452

})

453

454

# Operations on categorical data

455

result = df.group_by("category").agg([

456

pl.col("id").count().alias("count")

457

])

458

```

459

460

### High-Precision Decimal Arithmetic

461

462

```python

463

# Financial calculations with exact precision

464

df = pl.DataFrame({

465

"amount": ["123.456789", "987.654321", "555.111222"]

466

}, schema={

467

"amount": pl.Decimal(precision=10, scale=6)

468

})

469

470

# Precise calculations

471

result = df.with_columns([

472

(pl.col("amount") * pl.lit("1.05")).alias("with_tax"),

473

pl.col("amount").round(2).alias("rounded")

474

])

475

```