or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-classes.mddata-types.mdexpressions.mdfunctions.mdindex.mdio-operations.mdsql-functionality.md

data-types.mddocs/

0

# Data Types

1

2

Comprehensive type system supporting numeric, text, temporal, and complex nested data types with full type safety and memory efficiency. Polars provides a rich set of data types that map efficiently to Arrow's columnar format.

3

4

## Capabilities

5

6

### Numeric Types

7

8

Integer and floating-point types with various precision levels for optimal memory usage and performance.

9

10

```python { .api }

11

# Signed integers

12

class Int8:

13

"""8-bit signed integer (-128 to 127)"""

14

15

class Int16:

16

"""16-bit signed integer (-32,768 to 32,767)"""

17

18

class Int32:

19

"""32-bit signed integer (-2^31 to 2^31-1)"""

20

21

class Int64:

22

"""64-bit signed integer (-2^63 to 2^63-1)"""

23

24

class Int128:

25

"""128-bit signed integer"""

26

27

# Unsigned integers

28

class UInt8:

29

"""8-bit unsigned integer (0 to 255)"""

30

31

class UInt16:

32

"""16-bit unsigned integer (0 to 65,535)"""

33

34

class UInt32:

35

"""32-bit unsigned integer (0 to 2^32-1)"""

36

37

class UInt64:

38

"""64-bit unsigned integer (0 to 2^64-1)"""

39

40

# Floating point

41

class Float32:

42

"""32-bit floating point number"""

43

44

class Float64:

45

"""64-bit floating point number"""

46

47

# Decimal

48

class Decimal:

49

def __init__(self, precision: int, scale: int = 0):

50

"""

51

Fixed-point decimal type.

52

53

Parameters:

54

- precision: Total number of digits

55

- scale: Number of digits after decimal point

56

"""

57

```

58

59

### Text Types

60

61

String and binary data types with full Unicode support and efficient storage.

62

63

```python { .api }

64

class String:

65

"""UTF-8 encoded string type"""

66

67

class Utf8:

68

"""Alias for String type"""

69

70

class Binary:

71

"""Binary data type for storing raw bytes"""

72

```

73

74

### Temporal Types

75

76

Date, time, and duration types with timezone support and flexible precision.

77

78

```python { .api }

79

class Date:

80

"""Date type (year, month, day)"""

81

82

class Datetime:

83

def __init__(self, time_unit: str = "us", time_zone: str | None = None):

84

"""

85

Datetime type with optional timezone.

86

87

Parameters:

88

- time_unit: Precision ('ns', 'us', 'ms')

89

- time_zone: Timezone name (e.g., 'UTC', 'America/New_York')

90

"""

91

92

class Time:

93

"""Time type (hour, minute, second, microsecond)"""

94

95

class Duration:

96

def __init__(self, time_unit: str = "us"):

97

"""

98

Duration type for time intervals.

99

100

Parameters:

101

- time_unit: Precision ('ns', 'us', 'ms')

102

"""

103

```

104

105

### Boolean and Null Types

106

107

Logical and null value types.

108

109

```python { .api }

110

class Boolean:

111

"""Boolean type (True/False/null)"""

112

113

class Null:

114

"""Null type containing only null values"""

115

116

class Unknown:

117

"""Unknown type placeholder for type inference"""

118

```

119

120

### Complex Types

121

122

Nested and structured data types for handling complex data structures.

123

124

```python { .api }

125

class List:

126

def __init__(self, inner: type):

127

"""

128

Variable-length list type.

129

130

Parameters:

131

- inner: Type of list elements

132

"""

133

134

class Array:

135

def __init__(self, inner: type, shape: int | tuple[int, ...]):

136

"""

137

Fixed-length array type.

138

139

Parameters:

140

- inner: Type of array elements

141

- shape: Array dimensions

142

"""

143

144

class Struct:

145

def __init__(self, fields: list[Field] | dict[str, type]):

146

"""

147

Structured type with named fields.

148

149

Parameters:

150

- fields: List of Field objects or dict of {name: type}

151

"""

152

153

class Field:

154

def __init__(self, name: str, dtype: type):

155

"""

156

Schema field definition.

157

158

Parameters:

159

- name: Field name

160

- dtype: Field data type

161

"""

162

```

163

164

### Categorical Types

165

166

Types for handling categorical data with efficient storage and operations.

167

168

```python { .api }

169

class Categorical:

170

def __init__(self, ordering: str = "physical"):

171

"""

172

Categorical type for string categories.

173

174

Parameters:

175

- ordering: Ordering method ('physical' or 'lexical')

176

"""

177

178

class Enum:

179

def __init__(self, categories: list[str]):

180

"""

181

Enumerated type with fixed categories.

182

183

Parameters:

184

- categories: List of allowed category values

185

"""

186

187

class Categories:

188

"""Categorical metadata container"""

189

```

190

191

### Special Types

192

193

Additional types for Python object storage and type system utilities.

194

195

```python { .api }

196

class Object:

197

"""Python object type for arbitrary Python objects"""

198

199

class DataType:

200

"""Base class for all data types"""

201

```

202

203

## Type System Utilities

204

205

### Type Checking and Conversion

206

207

```python { .api }

208

def is_polars_dtype(dtype: Any) -> bool:

209

"""

210

Check if object is a Polars data type.

211

212

Parameters:

213

- dtype: Object to check

214

215

Returns:

216

- bool: True if dtype is a Polars type

217

"""

218

219

def dtype_to_py_type(dtype: type) -> type:

220

"""

221

Convert Polars data type to Python type.

222

223

Parameters:

224

- dtype: Polars data type

225

226

Returns:

227

- type: Corresponding Python type

228

"""

229

230

def parse_into_dtype(dtype: str | type) -> type:

231

"""

232

Parse string or type into Polars data type.

233

234

Parameters:

235

- dtype: String representation or type object

236

237

Returns:

238

- type: Polars data type

239

"""

240

```

241

242

### Schema Operations

243

244

```python { .api }

245

class Schema:

246

def __init__(self, schema: dict[str, type] | list[tuple[str, type]] | None = None):

247

"""

248

Schema definition for DataFrames.

249

250

Parameters:

251

- schema: Column definitions as dict or list of (name, type) tuples

252

"""

253

254

def __getitem__(self, key: str) -> type:

255

"""Get column type by name."""

256

257

def __contains__(self, key: str) -> bool:

258

"""Check if column exists in schema."""

259

260

def names(self) -> list[str]:

261

"""Get column names."""

262

263

def dtypes(self) -> list[type]:

264

"""Get column types."""

265

266

def to_python(self) -> dict[str, type]:

267

"""Convert to Python dict."""

268

```

269

270

## Usage Examples

271

272

### Basic Type Usage

273

274

```python

275

import polars as pl

276

277

# Create DataFrame with explicit types

278

df = pl.DataFrame({

279

"id": pl.Series([1, 2, 3], dtype=pl.Int32),

280

"name": pl.Series(["Alice", "Bob", "Charlie"], dtype=pl.String),

281

"score": pl.Series([95.5, 87.2, 92.1], dtype=pl.Float64),

282

"active": pl.Series([True, False, True], dtype=pl.Boolean),

283

"created": pl.Series(["2023-01-01", "2023-01-02", "2023-01-03"], dtype=pl.Date)

284

})

285

286

print(df.dtypes)

287

# [Int32, String, Float64, Boolean, Date]

288

```

289

290

### Complex Types

291

292

```python

293

# List type

294

df_with_lists = pl.DataFrame({

295

"id": [1, 2, 3],

296

"scores": [[95, 87, 92], [88, 91], [85, 89, 93, 87]]

297

}, schema={"id": pl.Int32, "scores": pl.List(pl.Int32)})

298

299

# Struct type

300

df_with_struct = pl.DataFrame({

301

"person": [

302

{"name": "Alice", "age": 25},

303

{"name": "Bob", "age": 30},

304

{"name": "Charlie", "age": 35}

305

]

306

}, schema={"person": pl.Struct([

307

pl.Field("name", pl.String),

308

pl.Field("age", pl.Int32)

309

])})

310

311

# Array type (fixed length)

312

df_with_arrays = pl.DataFrame({

313

"coordinates": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]

314

}, schema={"coordinates": pl.Array(pl.Float64, 3)})

315

```

316

317

### Datetime with Timezone

318

319

```python

320

# Datetime with timezone

321

df_with_tz = pl.DataFrame({

322

"timestamp": ["2023-01-01 12:00:00", "2023-01-01 15:30:00"],

323

}, schema={"timestamp": pl.Datetime("us", "UTC")})

324

325

# Duration type

326

df_with_duration = pl.DataFrame({

327

"elapsed": ["1h 30m", "2h 15m", "45m"]

328

}, schema={"elapsed": pl.Duration("us")})

329

```

330

331

### Categorical Types

332

333

```python

334

# Categorical type

335

df_categorical = pl.DataFrame({

336

"category": ["A", "B", "A", "C", "B"]

337

}, schema={"category": pl.Categorical()})

338

339

# Enum type with fixed categories

340

df_enum = pl.DataFrame({

341

"grade": ["A", "B", "A", "C"]

342

}, schema={"grade": pl.Enum(["A", "B", "C", "D", "F"])})

343

```

344

345

### Decimal Type

346

347

```python

348

# Decimal type for precise arithmetic

349

df_decimal = pl.DataFrame({

350

"price": ["19.99", "25.50", "12.75"]

351

}, schema={"price": pl.Decimal(precision=10, scale=2)})

352

```

353

354

### Type Casting

355

356

```python

357

# Cast between types

358

df_cast = df.select([

359

pl.col("id").cast(pl.Int64),

360

pl.col("score").cast(pl.Float32),

361

pl.col("created").cast(pl.Datetime("us"))

362

])

363

364

# Cast with error handling

365

df_safe_cast = df.select([

366

pl.col("score").cast(pl.Int32, strict=False) # Returns null on cast failure

367

])

368

```

369

370

### Schema Definition

371

372

```python

373

# Define schema explicitly

374

schema = pl.Schema({

375

"id": pl.Int64,

376

"name": pl.String,

377

"timestamp": pl.Datetime("us", "UTC"),

378

"values": pl.List(pl.Float64)

379

})

380

381

# Use schema when reading data

382

df_with_schema = pl.read_csv("data.csv", schema=schema)

383

384

# Schema overrides for specific columns

385

df_override = pl.read_csv("data.csv", schema_overrides={

386

"id": pl.Int32, # Override inferred type

387

"date": pl.Date # Override inferred type

388

})

389

```

390

391

### Type Checking

392

393

```python

394

# Check if value is a Polars type

395

assert pl.is_polars_dtype(pl.Int64)

396

assert not pl.is_polars_dtype(int)

397

398

# Convert to Python type

399

py_type = pl.dtype_to_py_type(pl.Float64) # Returns float

400

401

# Parse string to type

402

parsed_type = pl.parse_into_dtype("int64") # Returns pl.Int64

403

```

404

405

## Type Hierarchies

406

407

### Numeric Type Hierarchy

408

409

```

410

DataType

411

├── Int8, Int16, Int32, Int64, Int128

412

├── UInt8, UInt16, UInt32, UInt64

413

├── Float32, Float64

414

└── Decimal

415

```

416

417

### Temporal Type Hierarchy

418

419

```

420

DataType

421

├── Date

422

├── Datetime

423

├── Time

424

└── Duration

425

```

426

427

### Complex Type Hierarchy

428

429

```

430

DataType

431

├── List

432

├── Array

433

├── Struct

434

├── Categorical

435

├── Enum

436

└── Object

437

```

438

439

## Memory Efficiency

440

441

Polars types are designed for optimal memory usage:

442

443

- **Integer types**: Choose smallest type that fits your data range

444

- **Categorical**: Use for repeated string values to save memory

445

- **List vs Array**: Use Array for fixed-size data, List for variable-size

446

- **String interning**: Enable string cache for categorical-like string data

447

- **Null representation**: Efficient null handling with validity bitmaps

448

449

## Type Compatibility

450

451

### Arrow Integration

452

453

All Polars types map directly to Apache Arrow types for zero-copy interoperability:

454

455

```python

456

# Convert to Arrow

457

arrow_table = df.to_arrow()

458

459

# Convert from Arrow

460

df_from_arrow = pl.from_arrow(arrow_table)

461

```

462

463

### Pandas Integration

464

465

Polars types convert to pandas types with appropriate handling:

466

467

```python

468

# Convert to pandas

469

pandas_df = df.to_pandas()

470

471

# Convert from pandas

472

df_from_pandas = pl.from_pandas(pandas_df)

473

```

474

475

### NumPy Integration

476

477

Numeric types integrate seamlessly with NumPy:

478

479

```python

480

# Convert to numpy

481

numpy_array = df.select(pl.col("score")).to_numpy()

482

483

# Convert from numpy

484

df_from_numpy = pl.from_numpy(numpy_array, schema=["values"])

485

```