0
# Data Types and Schema
1
2
Comprehensive type system with numeric, text, temporal, and nested types, plus schema definition and validation capabilities. Polars provides a rich type system that ensures data integrity and enables optimized operations.
3
4
## Capabilities
5
6
### Numeric Data Types
7
8
Integer and floating-point data types with various precision levels.
9
10
```python { .api }
11
# Signed integers
12
class Int8:
13
"""8-bit signed integer (-128 to 127)."""
14
15
class Int16:
16
"""16-bit signed integer (-32,768 to 32,767)."""
17
18
class Int32:
19
"""32-bit signed integer (-2^31 to 2^31-1)."""
20
21
class Int64:
22
"""64-bit signed integer (-2^63 to 2^63-1)."""
23
24
class Int128:
25
"""128-bit signed integer."""
26
27
# Unsigned integers
28
class UInt8:
29
"""8-bit unsigned integer (0 to 255)."""
30
31
class UInt16:
32
"""16-bit unsigned integer (0 to 65,535)."""
33
34
class UInt32:
35
"""32-bit unsigned integer (0 to 2^32-1)."""
36
37
class UInt64:
38
"""64-bit unsigned integer (0 to 2^64-1)."""
39
40
# Floating point
41
class Float32:
42
"""32-bit floating point number."""
43
44
class Float64:
45
"""64-bit floating point number."""
46
47
class Decimal:
48
"""Arbitrary precision decimal number."""
49
def __init__(self, precision: int | None = None, scale: int = 0):
50
"""
51
Create decimal type.
52
53
Parameters:
54
- precision: Number of significant digits
55
- scale: Number of decimal places
56
"""
57
```
58
59
### Text Data Types
60
61
String and binary data types for text processing.
62
63
```python { .api }
64
class String:
65
"""UTF-8 encoded string data (variable length)."""
66
67
class Utf8:
68
"""UTF-8 encoded string data (alias for String)."""
69
70
class Binary:
71
"""Binary data (bytes)."""
72
```
73
74
### Temporal Data Types
75
76
Date, time, and duration types for temporal data processing.
77
78
```python { .api }
79
class Date:
80
"""Calendar date (year, month, day)."""
81
82
class Datetime:
83
"""Date and time with optional timezone."""
84
def __init__(self, time_unit: TimeUnit = "us", time_zone: str | None = None):
85
"""
86
Create datetime type.
87
88
Parameters:
89
- time_unit: Time precision ("ns", "us", "ms", "s")
90
- time_zone: Timezone (e.g., "UTC", "America/New_York")
91
"""
92
93
class Time:
94
"""Time of day (hour, minute, second, subsecond)."""
95
96
class Duration:
97
"""Time duration/interval."""
98
def __init__(self, time_unit: TimeUnit = "us"):
99
"""
100
Create duration type.
101
102
Parameters:
103
- time_unit: Time precision ("ns", "us", "ms", "s")
104
"""
105
```
106
107
### Boolean and Special Types
108
109
Boolean values and special data types.
110
111
```python { .api }
112
class Boolean:
113
"""Boolean true/false values."""
114
115
class Null:
116
"""Null type (no data)."""
117
118
class Unknown:
119
"""Unknown type placeholder."""
120
121
class Object:
122
"""Python object type (stores arbitrary Python objects)."""
123
```
124
125
### Categorical and Enumerated Types
126
127
Types for categorical and enumerated data with optimized storage.
128
129
```python { .api }
130
class Categorical:
131
"""Categorical data with string categories."""
132
def __init__(self, ordering: CategoricalOrdering = "physical"):
133
"""
134
Create categorical type.
135
136
Parameters:
137
- ordering: Category ordering ("physical" or "lexical")
138
"""
139
140
class Enum:
141
"""Enumerated type with fixed set of string values."""
142
def __init__(self, categories: list[str] | Series):
143
"""
144
Create enum type.
145
146
Parameters:
147
- categories: Fixed list of valid string values
148
"""
149
150
class Categories:
151
"""Categories metadata for categorical types."""
152
```
153
154
### Nested Data Types
155
156
Complex nested data structures including lists, arrays, and structs.
157
158
```python { .api }
159
class List:
160
"""Variable-length list of same-typed elements."""
161
def __init__(self, inner: DataType):
162
"""
163
Create list type.
164
165
Parameters:
166
- inner: Element data type
167
"""
168
169
class Array:
170
"""Fixed-length array of same-typed elements."""
171
def __init__(self, inner: DataType, shape: int | tuple[int, ...]):
172
"""
173
Create array type.
174
175
Parameters:
176
- inner: Element data type
177
- shape: Array dimensions
178
"""
179
180
class Struct:
181
"""Struct/record type with named fields."""
182
def __init__(self, fields: list[Field] | dict[str, DataType]):
183
"""
184
Create struct type.
185
186
Parameters:
187
- fields: List of Field objects or dict mapping names to types
188
"""
189
190
class Field:
191
"""Named field in struct type."""
192
def __init__(self, name: str, dtype: DataType):
193
"""
194
Create field.
195
196
Parameters:
197
- name: Field name
198
- dtype: Field data type
199
"""
200
```
201
202
### Schema Definition
203
204
Schema class for defining and validating DataFrame structure.
205
206
```python { .api }
207
class Schema:
208
def __init__(self, schema: Mapping[str, DataType] | Iterable[tuple[str, DataType]] | None = None):
209
"""
210
Create schema.
211
212
Parameters:
213
- schema: Mapping of column names to data types
214
"""
215
216
def __getitem__(self, item: str) -> DataType:
217
"""Get data type for column."""
218
219
def __contains__(self, item: str) -> bool:
220
"""Check if column exists in schema."""
221
222
def __iter__(self) -> Iterator[str]:
223
"""Iterate over column names."""
224
225
def __len__(self) -> int:
226
"""Get number of columns."""
227
228
def names(self) -> list[str]:
229
"""Get all column names."""
230
231
def dtypes(self) -> list[DataType]:
232
"""Get all data types."""
233
234
def to_python(self) -> dict[str, type]:
235
"""Convert to Python type mapping."""
236
```
237
238
### Type Utilities
239
240
Utility functions for working with data types.
241
242
```python { .api }
243
def dtype_to_py_type(dtype: DataType) -> type:
244
"""
245
Convert Polars data type to Python type.
246
247
Parameters:
248
- dtype: Polars data type
249
250
Returns:
251
Corresponding Python type
252
"""
253
254
def is_polars_dtype(dtype: Any) -> bool:
255
"""
256
Check if object is a Polars data type.
257
258
Parameters:
259
- dtype: Object to check
260
261
Returns:
262
True if Polars data type
263
"""
264
265
def py_type_to_constructor(py_type: type) -> DataType:
266
"""
267
Get Polars constructor for Python type.
268
269
Parameters:
270
- py_type: Python type
271
272
Returns:
273
Polars data type constructor
274
"""
275
276
def numpy_char_code_to_dtype(char_code: str) -> DataType | None:
277
"""
278
Convert NumPy character code to Polars data type.
279
280
Parameters:
281
- char_code: NumPy dtype character code
282
283
Returns:
284
Polars data type or None
285
"""
286
287
def unpack_dtypes(*dtypes: DataType | Iterable[DataType]) -> list[DataType]:
288
"""
289
Unpack and flatten data type specifications.
290
291
Parameters:
292
- dtypes: Data type specifications
293
294
Returns:
295
Flattened list of data types
296
"""
297
```
298
299
### Type Groups and Constants
300
301
Type groups and constants for working with related data types.
302
303
```python { .api }
304
class IntegerType:
305
"""Base class for integer types."""
306
307
class TemporalType:
308
"""Base class for temporal types."""
309
310
class DataTypeClass:
311
"""Metaclass for data type classes."""
312
313
# Constants
314
N_INFER_DEFAULT: int # Default number of rows for type inference
315
DTYPE_TEMPORAL_UNITS: frozenset[str] # Valid temporal units
316
```
317
318
## Usage Examples
319
320
### Basic Type Usage
321
322
```python
323
import polars as pl
324
325
# Creating DataFrames with explicit types
326
df = pl.DataFrame({
327
"id": [1, 2, 3],
328
"name": ["Alice", "Bob", "Charlie"],
329
"salary": [50000.0, 60000.0, 70000.0],
330
"is_active": [True, False, True]
331
}, schema={
332
"id": pl.Int32,
333
"name": pl.String,
334
"salary": pl.Float64,
335
"is_active": pl.Boolean
336
})
337
338
# Schema inspection
339
print(df.schema)
340
print(df.dtypes)
341
```
342
343
### Working with Temporal Types
344
345
```python
346
# Creating datetime columns with different precisions
347
df = pl.DataFrame({
348
"timestamp_us": ["2023-01-01 12:00:00"],
349
"timestamp_ms": ["2023-01-01 12:00:00"],
350
"date_only": ["2023-01-01"],
351
"time_only": ["12:00:00"]
352
}).with_columns([
353
pl.col("timestamp_us").str.strptime(pl.Datetime("us")),
354
pl.col("timestamp_ms").str.strptime(pl.Datetime("ms")),
355
pl.col("date_only").str.strptime(pl.Date),
356
pl.col("time_only").str.strptime(pl.Time)
357
])
358
359
# Working with timezones
360
df_tz = pl.DataFrame({
361
"utc_time": ["2023-01-01 12:00:00"]
362
}).with_columns([
363
pl.col("utc_time").str.strptime(pl.Datetime("us", "UTC"))
364
])
365
```
366
367
### Categorical and Enum Types
368
369
```python
370
# Categorical data
371
df = pl.DataFrame({
372
"category": ["A", "B", "A", "C", "B"]
373
}).with_columns([
374
pl.col("category").cast(pl.Categorical)
375
])
376
377
# Enum with fixed categories
378
df = pl.DataFrame({
379
"status": ["active", "inactive", "pending"]
380
}).with_columns([
381
pl.col("status").cast(pl.Enum(["active", "inactive", "pending"]))
382
])
383
```
384
385
### Nested Data Types
386
387
```python
388
# List columns
389
df = pl.DataFrame({
390
"numbers": [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
391
})
392
print(df.dtypes) # [List(Int64)]
393
394
# Struct columns
395
df = pl.DataFrame({
396
"person": [
397
{"name": "Alice", "age": 25},
398
{"name": "Bob", "age": 30}
399
]
400
})
401
print(df.dtypes) # [Struct([Field('name', String), Field('age', Int64)])]
402
403
# Creating nested types explicitly
404
schema = pl.Schema({
405
"id": pl.Int32,
406
"scores": pl.List(pl.Float64),
407
"metadata": pl.Struct([
408
pl.Field("created_at", pl.Datetime),
409
pl.Field("version", pl.String)
410
])
411
})
412
```
413
414
### Type Casting and Conversion
415
416
```python
417
df = pl.DataFrame({
418
"text_numbers": ["1", "2", "3"],
419
"floats": [1.0, 2.0, 3.0]
420
})
421
422
# Cast to different types
423
result = df.with_columns([
424
pl.col("text_numbers").cast(pl.Int32).alias("integers"),
425
pl.col("floats").cast(pl.Int64).alias("rounded")
426
])
427
428
# Safe casting with error handling
429
result = df.with_columns([
430
pl.col("text_numbers").cast(pl.Int32, strict=False).alias("safe_cast")
431
])
432
```
433
434
### Schema Validation
435
436
```python
437
# Define expected schema
438
expected_schema = pl.Schema({
439
"id": pl.Int32,
440
"name": pl.String,
441
"amount": pl.Float64,
442
"timestamp": pl.Datetime("us")
443
})
444
445
# Read with schema validation
446
df = pl.read_csv("data.csv", schema=expected_schema)
447
448
# Override specific types
449
df = pl.read_csv("data.csv", schema_overrides={
450
"id": pl.String, # Read ID as string instead of number
451
"amount": pl.Decimal(10, 2) # Use decimal for precise amounts
452
})
453
```
454
455
### Working with Decimal Types
456
457
```python
458
# High precision decimal calculations
459
df = pl.DataFrame({
460
"price": ["19.99", "29.99", "9.95"]
461
}).with_columns([
462
pl.col("price").cast(pl.Decimal(10, 2))
463
])
464
465
# Financial calculations maintaining precision
466
result = df.with_columns([
467
(pl.col("price") * pl.lit("1.08")).alias("with_tax"),
468
(pl.col("price") * pl.lit("0.9")).alias("discounted")
469
])
470
```
471
472
### Type Inspection and Utilities
473
474
```python
475
# Check data types
476
df = pl.DataFrame({"mixed": [1, 2.5, "text"]})
477
print(pl.dtype_to_py_type(df.dtypes[0]))
478
479
# Type checking
480
schema = df.schema
481
for name, dtype in schema.items():
482
print(f"{name}: {dtype}")
483
if isinstance(dtype, pl.List):
484
print(f" List element type: {dtype.inner}")
485
elif isinstance(dtype, pl.Struct):
486
print(f" Struct fields: {dtype.fields}")
487
```