0
# Data Types
1
2
Comprehensive type system supporting numeric, text, temporal, and complex nested data types with full type safety and memory efficiency. Polars provides a rich set of data types that map efficiently to Arrow's columnar format.
3
4
## Capabilities
5
6
### Numeric Types
7
8
Integer and floating-point types with various precision levels for optimal memory usage and performance.
9
10
```python { .api }
11
# Signed integers
12
class Int8:
13
"""8-bit signed integer (-128 to 127)"""
14
15
class Int16:
16
"""16-bit signed integer (-32,768 to 32,767)"""
17
18
class Int32:
19
"""32-bit signed integer (-2^31 to 2^31-1)"""
20
21
class Int64:
22
"""64-bit signed integer (-2^63 to 2^63-1)"""
23
24
class Int128:
25
"""128-bit signed integer"""
26
27
# Unsigned integers
28
class UInt8:
29
"""8-bit unsigned integer (0 to 255)"""
30
31
class UInt16:
32
"""16-bit unsigned integer (0 to 65,535)"""
33
34
class UInt32:
35
"""32-bit unsigned integer (0 to 2^32-1)"""
36
37
class UInt64:
38
"""64-bit unsigned integer (0 to 2^64-1)"""
39
40
# Floating point
41
class Float32:
42
"""32-bit floating point number"""
43
44
class Float64:
45
"""64-bit floating point number"""
46
47
# Decimal
48
class Decimal:
49
def __init__(self, precision: int, scale: int = 0):
50
"""
51
Fixed-point decimal type.
52
53
Parameters:
54
- precision: Total number of digits
55
- scale: Number of digits after decimal point
56
"""
57
```
58
59
### Text Types
60
61
String and binary data types with full Unicode support and efficient storage.
62
63
```python { .api }
64
class String:
65
"""UTF-8 encoded string type"""
66
67
class Utf8:
68
"""Alias for String type"""
69
70
class Binary:
71
"""Binary data type for storing raw bytes"""
72
```
73
74
### Temporal Types
75
76
Date, time, and duration types with timezone support and flexible precision.
77
78
```python { .api }
79
class Date:
80
"""Date type (year, month, day)"""
81
82
class Datetime:
83
def __init__(self, time_unit: str = "us", time_zone: str | None = None):
84
"""
85
Datetime type with optional timezone.
86
87
Parameters:
88
- time_unit: Precision ('ns', 'us', 'ms')
89
- time_zone: Timezone name (e.g., 'UTC', 'America/New_York')
90
"""
91
92
class Time:
93
"""Time type (hour, minute, second, microsecond)"""
94
95
class Duration:
96
def __init__(self, time_unit: str = "us"):
97
"""
98
Duration type for time intervals.
99
100
Parameters:
101
- time_unit: Precision ('ns', 'us', 'ms')
102
"""
103
```
104
105
### Boolean and Null Types
106
107
Logical and null value types.
108
109
```python { .api }
110
class Boolean:
111
"""Boolean type (True/False/null)"""
112
113
class Null:
114
"""Null type containing only null values"""
115
116
class Unknown:
117
"""Unknown type placeholder for type inference"""
118
```
119
120
### Complex Types
121
122
Nested and structured data types for handling complex data structures.
123
124
```python { .api }
125
class List:
126
def __init__(self, inner: type):
127
"""
128
Variable-length list type.
129
130
Parameters:
131
- inner: Type of list elements
132
"""
133
134
class Array:
135
def __init__(self, inner: type, shape: int | tuple[int, ...]):
136
"""
137
Fixed-length array type.
138
139
Parameters:
140
- inner: Type of array elements
141
- shape: Array dimensions
142
"""
143
144
class Struct:
145
def __init__(self, fields: list[Field] | dict[str, type]):
146
"""
147
Structured type with named fields.
148
149
Parameters:
150
- fields: List of Field objects or dict of {name: type}
151
"""
152
153
class Field:
154
def __init__(self, name: str, dtype: type):
155
"""
156
Schema field definition.
157
158
Parameters:
159
- name: Field name
160
- dtype: Field data type
161
"""
162
```
163
164
### Categorical Types
165
166
Types for handling categorical data with efficient storage and operations.
167
168
```python { .api }
169
class Categorical:
170
def __init__(self, ordering: str = "physical"):
171
"""
172
Categorical type for string categories.
173
174
Parameters:
175
- ordering: Ordering method ('physical' or 'lexical')
176
"""
177
178
class Enum:
179
def __init__(self, categories: list[str]):
180
"""
181
Enumerated type with fixed categories.
182
183
Parameters:
184
- categories: List of allowed category values
185
"""
186
187
class Categories:
188
"""Categorical metadata container"""
189
```
190
191
### Special Types
192
193
Additional types for Python object storage and type system utilities.
194
195
```python { .api }
196
class Object:
197
"""Python object type for arbitrary Python objects"""
198
199
class DataType:
200
"""Base class for all data types"""
201
```
202
203
## Type System Utilities
204
205
### Type Checking and Conversion
206
207
```python { .api }
208
def is_polars_dtype(dtype: Any) -> bool:
209
"""
210
Check if object is a Polars data type.
211
212
Parameters:
213
- dtype: Object to check
214
215
Returns:
216
- bool: True if dtype is a Polars type
217
"""
218
219
def dtype_to_py_type(dtype: type) -> type:
220
"""
221
Convert Polars data type to Python type.
222
223
Parameters:
224
- dtype: Polars data type
225
226
Returns:
227
- type: Corresponding Python type
228
"""
229
230
def parse_into_dtype(dtype: str | type) -> type:
231
"""
232
Parse string or type into Polars data type.
233
234
Parameters:
235
- dtype: String representation or type object
236
237
Returns:
238
- type: Polars data type
239
"""
240
```
241
242
### Schema Operations
243
244
```python { .api }
245
class Schema:
246
def __init__(self, schema: dict[str, type] | list[tuple[str, type]] | None = None):
247
"""
248
Schema definition for DataFrames.
249
250
Parameters:
251
- schema: Column definitions as dict or list of (name, type) tuples
252
"""
253
254
def __getitem__(self, key: str) -> type:
255
"""Get column type by name."""
256
257
def __contains__(self, key: str) -> bool:
258
"""Check if column exists in schema."""
259
260
def names(self) -> list[str]:
261
"""Get column names."""
262
263
def dtypes(self) -> list[type]:
264
"""Get column types."""
265
266
def to_python(self) -> dict[str, type]:
267
"""Convert to Python dict."""
268
```
269
270
## Usage Examples
271
272
### Basic Type Usage
273
274
```python
275
import polars as pl
276
277
# Create DataFrame with explicit types
278
df = pl.DataFrame({
279
"id": pl.Series([1, 2, 3], dtype=pl.Int32),
280
"name": pl.Series(["Alice", "Bob", "Charlie"], dtype=pl.String),
281
"score": pl.Series([95.5, 87.2, 92.1], dtype=pl.Float64),
282
"active": pl.Series([True, False, True], dtype=pl.Boolean),
283
"created": pl.Series(["2023-01-01", "2023-01-02", "2023-01-03"], dtype=pl.Date)
284
})
285
286
print(df.dtypes)
287
# [Int32, String, Float64, Boolean, Date]
288
```
289
290
### Complex Types
291
292
```python
293
# List type
294
df_with_lists = pl.DataFrame({
295
"id": [1, 2, 3],
296
"scores": [[95, 87, 92], [88, 91], [85, 89, 93, 87]]
297
}, schema={"id": pl.Int32, "scores": pl.List(pl.Int32)})
298
299
# Struct type
300
df_with_struct = pl.DataFrame({
301
"person": [
302
{"name": "Alice", "age": 25},
303
{"name": "Bob", "age": 30},
304
{"name": "Charlie", "age": 35}
305
]
306
}, schema={"person": pl.Struct([
307
pl.Field("name", pl.String),
308
pl.Field("age", pl.Int32)
309
])})
310
311
# Array type (fixed length)
312
df_with_arrays = pl.DataFrame({
313
"coordinates": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
314
}, schema={"coordinates": pl.Array(pl.Float64, 3)})
315
```
316
317
### Datetime with Timezone
318
319
```python
320
# Datetime with timezone
321
df_with_tz = pl.DataFrame({
322
"timestamp": ["2023-01-01 12:00:00", "2023-01-01 15:30:00"],
323
}, schema={"timestamp": pl.Datetime("us", "UTC")})
324
325
# Duration type
326
df_with_duration = pl.DataFrame({
327
"elapsed": ["1h 30m", "2h 15m", "45m"]
328
}, schema={"elapsed": pl.Duration("us")})
329
```
330
331
### Categorical Types
332
333
```python
334
# Categorical type
335
df_categorical = pl.DataFrame({
336
"category": ["A", "B", "A", "C", "B"]
337
}, schema={"category": pl.Categorical()})
338
339
# Enum type with fixed categories
340
df_enum = pl.DataFrame({
341
"grade": ["A", "B", "A", "C"]
342
}, schema={"grade": pl.Enum(["A", "B", "C", "D", "F"])})
343
```
344
345
### Decimal Type
346
347
```python
348
# Decimal type for precise arithmetic
349
df_decimal = pl.DataFrame({
350
"price": ["19.99", "25.50", "12.75"]
351
}, schema={"price": pl.Decimal(precision=10, scale=2)})
352
```
353
354
### Type Casting
355
356
```python
357
# Cast between types
358
df_cast = df.select([
359
pl.col("id").cast(pl.Int64),
360
pl.col("score").cast(pl.Float32),
361
pl.col("created").cast(pl.Datetime("us"))
362
])
363
364
# Cast with error handling
365
df_safe_cast = df.select([
366
pl.col("score").cast(pl.Int32, strict=False) # Returns null on cast failure
367
])
368
```
369
370
### Schema Definition
371
372
```python
373
# Define schema explicitly
374
schema = pl.Schema({
375
"id": pl.Int64,
376
"name": pl.String,
377
"timestamp": pl.Datetime("us", "UTC"),
378
"values": pl.List(pl.Float64)
379
})
380
381
# Use schema when reading data
382
df_with_schema = pl.read_csv("data.csv", schema=schema)
383
384
# Schema overrides for specific columns
385
df_override = pl.read_csv("data.csv", schema_overrides={
386
"id": pl.Int32, # Override inferred type
387
"date": pl.Date # Override inferred type
388
})
389
```
390
391
### Type Checking
392
393
```python
394
# Check if value is a Polars type
395
assert pl.is_polars_dtype(pl.Int64)
396
assert not pl.is_polars_dtype(int)
397
398
# Convert to Python type
399
py_type = pl.dtype_to_py_type(pl.Float64) # Returns float
400
401
# Parse string to type
402
parsed_type = pl.parse_into_dtype("int64") # Returns pl.Int64
403
```
404
405
## Type Hierarchies
406
407
### Numeric Type Hierarchy
408
409
```
410
DataType
411
├── Int8, Int16, Int32, Int64, Int128
412
├── UInt8, UInt16, UInt32, UInt64
413
├── Float32, Float64
414
└── Decimal
415
```
416
417
### Temporal Type Hierarchy
418
419
```
420
DataType
421
├── Date
422
├── Datetime
423
├── Time
424
└── Duration
425
```
426
427
### Complex Type Hierarchy
428
429
```
430
DataType
431
├── List
432
├── Array
433
├── Struct
434
├── Categorical
435
├── Enum
436
└── Object
437
```
438
439
## Memory Efficiency
440
441
Polars types are designed for optimal memory usage:
442
443
- **Integer types**: Choose smallest type that fits your data range
444
- **Categorical**: Use for repeated string values to save memory
445
- **List vs Array**: Use Array for fixed-size data, List for variable-size
446
- **String interning**: Enable string cache for categorical-like string data
447
- **Null representation**: Efficient null handling with validity bitmaps
448
449
## Type Compatibility
450
451
### Arrow Integration
452
453
All Polars types map directly to Apache Arrow types for zero-copy interoperability:
454
455
```python
456
# Convert to Arrow
457
arrow_table = df.to_arrow()
458
459
# Convert from Arrow
460
df_from_arrow = pl.from_arrow(arrow_table)
461
```
462
463
### Pandas Integration
464
465
Polars types convert to pandas types with appropriate handling:
466
467
```python
468
# Convert to pandas
469
pandas_df = df.to_pandas()
470
471
# Convert from pandas
472
df_from_pandas = pl.from_pandas(pandas_df)
473
```
474
475
### NumPy Integration
476
477
Numeric types integrate seamlessly with NumPy:
478
479
```python
480
# Convert to numpy
481
numpy_array = df.select(pl.col("score")).to_numpy()
482
483
# Convert from numpy
484
df_from_numpy = pl.from_numpy(numpy_array, schema=["values"])
485
```