0
# Type Checking & Validation
1
2
cuDF provides comprehensive type checking utilities for validating and working with GPU data types. The type system extends pandas' type checking to handle cuDF-specific types including nested data structures and GPU-accelerated dtypes.
3
4
## Import Statements
5
6
```python
7
# Main type utilities
8
from cudf.api.types import dtype
9
10
# Data type checking functions
11
from cudf.api.types import (
12
is_numeric_dtype, is_string_dtype, is_integer_dtype, is_float_dtype,
13
is_bool_dtype, is_categorical_dtype, is_datetime64_dtype, is_timedelta64_dtype
14
)
15
16
# cuDF-specific type checking
17
from cudf.api.types import (
18
is_decimal_dtype, is_list_dtype, is_struct_dtype, is_interval_dtype
19
)
20
21
# Value type checking
22
from cudf.api.types import is_scalar, is_list_like
23
```
24
25
## Data Type Utilities
26
27
Core utilities for working with cuDF data types and conversions.
28
29
```{ .api }
30
def dtype(dtype_obj) -> cudf.core.dtypes.ExtensionDtype:
31
"""
32
Convert input to cuDF-compatible data type
33
34
Normalizes various dtype specifications into cuDF ExtensionDtype objects.
35
Handles pandas dtypes, numpy dtypes, and cuDF-specific extension types.
36
37
Parameters:
38
dtype_obj: str, numpy.dtype, pandas.ExtensionDtype, or cuDF ExtensionDtype
39
Data type specification to convert
40
41
Returns:
42
cudf.core.dtypes.ExtensionDtype: Normalized cuDF data type
43
44
Raises:
45
TypeError: If dtype cannot be converted to cuDF-compatible type
46
47
Examples:
48
# String dtype specifications
49
dt = cudf.api.types.dtype('int64')
50
dt = cudf.api.types.dtype('float32')
51
dt = cudf.api.types.dtype('category')
52
53
# NumPy dtype conversion
54
import numpy as np
55
dt = cudf.api.types.dtype(np.dtype('datetime64[ns]'))
56
57
# cuDF extension types
58
dt = cudf.api.types.dtype(cudf.ListDtype('int32'))
59
dt = cudf.api.types.dtype(cudf.Decimal64Dtype(10, 2))
60
61
# Pandas compatibility
62
import pandas as pd
63
dt = cudf.api.types.dtype(pd.CategoricalDtype(['a', 'b', 'c']))
64
"""
65
```
66
67
## Standard Data Type Checking
68
69
Functions for checking standard data types with GPU acceleration.
70
71
```{ .api }
72
def is_numeric_dtype(arr_or_dtype) -> bool:
73
"""
74
Check whether the provided array or dtype is numeric
75
76
Returns True for integer, float, complex, and decimal dtypes.
77
Compatible with cuDF extension types and GPU arrays.
78
79
Parameters:
80
arr_or_dtype: array-like, Series, Index, or data type
81
Object or dtype to check
82
83
Returns:
84
bool: True if array/dtype is numeric, False otherwise
85
86
Examples:
87
# Check Series dtype
88
s_int = cudf.Series([1, 2, 3])
89
assert cudf.api.types.is_numeric_dtype(s_int) # True
90
91
s_str = cudf.Series(['a', 'b', 'c'])
92
assert not cudf.api.types.is_numeric_dtype(s_str) # False
93
94
# Check dtype directly
95
assert cudf.api.types.is_numeric_dtype('int64') # True
96
assert cudf.api.types.is_numeric_dtype('float32') # True
97
assert not cudf.api.types.is_numeric_dtype('object') # False
98
99
# cuDF decimal types
100
decimal_dtype = cudf.Decimal64Dtype(10, 2)
101
assert cudf.api.types.is_numeric_dtype(decimal_dtype) # True
102
"""
103
104
def is_string_dtype(arr_or_dtype) -> bool:
105
"""
106
Check whether the provided array or dtype is string
107
108
Returns True for string/object dtypes that contain text data.
109
Handles cuDF string columns and object columns with string data.
110
111
Parameters:
112
arr_or_dtype: array-like, Series, Index, or data type
113
Object or dtype to check
114
115
Returns:
116
bool: True if array/dtype contains string data, False otherwise
117
118
Examples:
119
# String Series
120
s_str = cudf.Series(['hello', 'world'])
121
assert cudf.api.types.is_string_dtype(s_str) # True
122
123
# Object Series with strings
124
s_obj = cudf.Series(['a', 'b'], dtype='object')
125
assert cudf.api.types.is_string_dtype(s_obj) # True
126
127
# Non-string data
128
s_int = cudf.Series([1, 2, 3])
129
assert not cudf.api.types.is_string_dtype(s_int) # False
130
131
# Check dtype string
132
assert cudf.api.types.is_string_dtype('object') # True
133
assert not cudf.api.types.is_string_dtype('int64') # False
134
"""
135
136
def is_integer_dtype(arr_or_dtype) -> bool:
137
"""
138
Check whether the provided array or dtype is integer
139
140
Returns True for signed and unsigned integer dtypes of all bit widths.
141
Excludes floating-point and other numeric types.
142
143
Parameters:
144
arr_or_dtype: array-like, Series, Index, or data type
145
Object or dtype to check
146
147
Returns:
148
bool: True if array/dtype is integer, False otherwise
149
150
Examples:
151
# Integer Series
152
s_int32 = cudf.Series([1, 2, 3], dtype='int32')
153
assert cudf.api.types.is_integer_dtype(s_int32) # True
154
155
s_uint64 = cudf.Series([1, 2, 3], dtype='uint64')
156
assert cudf.api.types.is_integer_dtype(s_uint64) # True
157
158
# Non-integer numeric types
159
s_float = cudf.Series([1.0, 2.0, 3.0])
160
assert not cudf.api.types.is_integer_dtype(s_float) # False
161
162
# Check various integer dtypes
163
assert cudf.api.types.is_integer_dtype('int8') # True
164
assert cudf.api.types.is_integer_dtype('uint32') # True
165
assert not cudf.api.types.is_integer_dtype('float64') # False
166
"""
167
168
def is_float_dtype(arr_or_dtype) -> bool:
169
"""
170
Check whether the provided array or dtype is floating point
171
172
Returns True for single and double precision floating-point dtypes.
173
Excludes integer, decimal, and other numeric types.
174
175
Parameters:
176
arr_or_dtype: array-like, Series, Index, or data type
177
Object or dtype to check
178
179
Returns:
180
bool: True if array/dtype is floating point, False otherwise
181
182
Examples:
183
# Float Series
184
s_float32 = cudf.Series([1.1, 2.2, 3.3], dtype='float32')
185
assert cudf.api.types.is_float_dtype(s_float32) # True
186
187
s_float64 = cudf.Series([1.0, 2.0, 3.0]) # Default float64
188
assert cudf.api.types.is_float_dtype(s_float64) # True
189
190
# Non-float types
191
s_int = cudf.Series([1, 2, 3])
192
assert not cudf.api.types.is_float_dtype(s_int) # False
193
194
# Check dtype strings
195
assert cudf.api.types.is_float_dtype('float32') # True
196
assert cudf.api.types.is_float_dtype('float64') # True
197
assert not cudf.api.types.is_float_dtype('int32') # False
198
"""
199
200
def is_bool_dtype(arr_or_dtype) -> bool:
201
"""
202
Check whether the provided array or dtype is boolean
203
204
Returns True for boolean dtypes. Handles cuDF boolean columns
205
and boolean masks used in filtering operations.
206
207
Parameters:
208
arr_or_dtype: array-like, Series, Index, or data type
209
Object or dtype to check
210
211
Returns:
212
bool: True if array/dtype is boolean, False otherwise
213
214
Examples:
215
# Boolean Series
216
s_bool = cudf.Series([True, False, True])
217
assert cudf.api.types.is_bool_dtype(s_bool) # True
218
219
# Boolean mask from comparison
220
s_int = cudf.Series([1, 2, 3])
221
mask = s_int > 1 # Boolean Series
222
assert cudf.api.types.is_bool_dtype(mask) # True
223
224
# Non-boolean types
225
assert not cudf.api.types.is_bool_dtype(s_int) # False
226
227
# Check dtype
228
assert cudf.api.types.is_bool_dtype('bool') # True
229
assert not cudf.api.types.is_bool_dtype('int64') # False
230
"""
231
232
def is_categorical_dtype(arr_or_dtype) -> bool:
233
"""
234
Check whether the provided array or dtype is categorical
235
236
Returns True for cuDF categorical dtypes and pandas CategoricalDtype.
237
Handles both ordered and unordered categorical data.
238
239
Parameters:
240
arr_or_dtype: array-like, Series, Index, or data type
241
Object or dtype to check
242
243
Returns:
244
bool: True if array/dtype is categorical, False otherwise
245
246
Examples:
247
# Categorical Series
248
categories = ['red', 'blue', 'green']
249
s_cat = cudf.Series(['red', 'blue', 'red'], dtype='category')
250
assert cudf.api.types.is_categorical_dtype(s_cat) # True
251
252
# CategoricalIndex
253
idx_cat = cudf.CategoricalIndex(['a', 'b', 'c'])
254
assert cudf.api.types.is_categorical_dtype(idx_cat) # True
255
256
# Non-categorical
257
s_str = cudf.Series(['red', 'blue', 'green'])
258
assert not cudf.api.types.is_categorical_dtype(s_str) # False
259
260
# Check CategoricalDtype
261
cat_dtype = cudf.CategoricalDtype(categories)
262
assert cudf.api.types.is_categorical_dtype(cat_dtype) # True
263
"""
264
```
265
266
## Date and Time Type Checking
267
268
Specialized functions for temporal data types.
269
270
```{ .api }
271
def is_datetime64_dtype(arr_or_dtype) -> bool:
272
"""
273
Check whether the provided array or dtype is datetime64
274
275
Returns True for datetime64 dtypes with any time unit resolution.
276
Handles cuDF DatetimeIndex and datetime columns.
277
278
Parameters:
279
arr_or_dtype: array-like, Series, Index, or data type
280
Object or dtype to check
281
282
Returns:
283
bool: True if array/dtype is datetime64, False otherwise
284
285
Examples:
286
# Datetime Series
287
dates = cudf.to_datetime(['2023-01-01', '2023-01-02'])
288
assert cudf.api.types.is_datetime64_dtype(dates) # True
289
290
# DatetimeIndex
291
date_idx = cudf.DatetimeIndex(['2023-01-01', '2023-01-02'])
292
assert cudf.api.types.is_datetime64_dtype(date_idx) # True
293
294
# Non-datetime types
295
s_str = cudf.Series(['2023-01-01', '2023-01-02']) # String, not parsed
296
assert not cudf.api.types.is_datetime64_dtype(s_str) # False
297
298
# Check dtype strings
299
assert cudf.api.types.is_datetime64_dtype('datetime64[ns]') # True
300
assert cudf.api.types.is_datetime64_dtype('datetime64[ms]') # True
301
assert not cudf.api.types.is_datetime64_dtype('int64') # False
302
"""
303
304
def is_timedelta64_dtype(arr_or_dtype) -> bool:
305
"""
306
Check whether the provided array or dtype is timedelta64
307
308
Returns True for timedelta64 dtypes representing time durations.
309
Handles cuDF TimedeltaIndex and timedelta columns.
310
311
Parameters:
312
arr_or_dtype: array-like, Series, Index, or data type
313
Object or dtype to check
314
315
Returns:
316
bool: True if array/dtype is timedelta64, False otherwise
317
318
Examples:
319
# Timedelta Series
320
deltas = cudf.Series(['1 day', '2 hours', '30 minutes'])
321
deltas = cudf.to_timedelta(deltas)
322
assert cudf.api.types.is_timedelta64_dtype(deltas) # True
323
324
# TimedeltaIndex
325
td_idx = cudf.TimedeltaIndex(['1D', '2H', '30min'])
326
assert cudf.api.types.is_timedelta64_dtype(td_idx) # True
327
328
# Computed timedeltas
329
date1 = cudf.to_datetime('2023-01-02')
330
date2 = cudf.to_datetime('2023-01-01')
331
diff = date1 - date2 # Timedelta
332
assert cudf.api.types.is_timedelta64_dtype(diff) # True
333
334
# Check dtype
335
assert cudf.api.types.is_timedelta64_dtype('timedelta64[ns]') # True
336
"""
337
```
338
339
## cuDF Extension Type Checking
340
341
Functions for checking cuDF-specific extension data types.
342
343
```{ .api }
344
def is_decimal_dtype(arr_or_dtype) -> bool:
345
"""
346
Check whether the provided array or dtype is decimal
347
348
Returns True for cuDF decimal dtypes (Decimal32, Decimal64, Decimal128).
349
These provide exact decimal arithmetic without floating-point errors.
350
351
Parameters:
352
arr_or_dtype: array-like, Series, Index, or data type
353
Object or dtype to check
354
355
Returns:
356
bool: True if array/dtype is decimal, False otherwise
357
358
Examples:
359
# Decimal Series
360
decimal_dtype = cudf.Decimal64Dtype(precision=10, scale=2)
361
s_decimal = cudf.Series([1.23, 4.56], dtype=decimal_dtype)
362
assert cudf.api.types.is_decimal_dtype(s_decimal) # True
363
364
# Different decimal precisions
365
dec32 = cudf.Decimal32Dtype(7, 2)
366
dec128 = cudf.Decimal128Dtype(20, 4)
367
assert cudf.api.types.is_decimal_dtype(dec32) # True
368
assert cudf.api.types.is_decimal_dtype(dec128) # True
369
370
# Non-decimal numeric types
371
s_float = cudf.Series([1.23, 4.56], dtype='float64')
372
assert not cudf.api.types.is_decimal_dtype(s_float) # False
373
374
# Check from dtype object
375
assert cudf.api.types.is_decimal_dtype(decimal_dtype) # True
376
"""
377
378
def is_list_dtype(arr_or_dtype) -> bool:
379
"""
380
Check whether the provided array or dtype is list
381
382
Returns True for cuDF list dtypes representing nested list data.
383
Each row contains a variable-length list of elements.
384
385
Parameters:
386
arr_or_dtype: array-like, Series, Index, or data type
387
Object or dtype to check
388
389
Returns:
390
bool: True if array/dtype is list, False otherwise
391
392
Examples:
393
# List Series
394
list_dtype = cudf.ListDtype('int64')
395
s_list = cudf.Series([[1, 2, 3], [4, 5], [6]], dtype=list_dtype)
396
assert cudf.api.types.is_list_dtype(s_list) # True
397
398
# Nested lists with different element types
399
str_list_dtype = cudf.ListDtype('str')
400
s_str_list = cudf.Series([['a', 'b'], ['c']], dtype=str_list_dtype)
401
assert cudf.api.types.is_list_dtype(s_str_list) # True
402
403
# Non-list types
404
s_regular = cudf.Series([1, 2, 3])
405
assert not cudf.api.types.is_list_dtype(s_regular) # False
406
407
# Check dtype object
408
assert cudf.api.types.is_list_dtype(list_dtype) # True
409
"""
410
411
def is_struct_dtype(arr_or_dtype) -> bool:
412
"""
413
Check whether the provided array or dtype is struct
414
415
Returns True for cuDF struct dtypes representing nested structured data.
416
Each row contains multiple named fields with potentially different types.
417
418
Parameters:
419
arr_or_dtype: array-like, Series, Index, or data type
420
Object or dtype to check
421
422
Returns:
423
bool: True if array/dtype is struct, False otherwise
424
425
Examples:
426
# Struct dtype
427
struct_dtype = cudf.StructDtype({
428
'x': 'int64',
429
'y': 'float64',
430
'name': 'str'
431
})
432
s_struct = cudf.Series([
433
{'x': 1, 'y': 1.1, 'name': 'first'},
434
{'x': 2, 'y': 2.2, 'name': 'second'}
435
], dtype=struct_dtype)
436
assert cudf.api.types.is_struct_dtype(s_struct) # True
437
438
# Check dtype object directly
439
assert cudf.api.types.is_struct_dtype(struct_dtype) # True
440
441
# Non-struct types
442
s_dict = cudf.Series([{'a': 1}, {'b': 2}]) # Object, not struct
443
assert not cudf.api.types.is_struct_dtype(s_dict) # False
444
445
# Regular Series
446
s_int = cudf.Series([1, 2, 3])
447
assert not cudf.api.types.is_struct_dtype(s_int) # False
448
"""
449
450
def is_interval_dtype(arr_or_dtype) -> bool:
451
"""
452
Check whether the provided array or dtype is interval
453
454
Returns True for cuDF interval dtypes representing interval data.
455
Intervals have left and right bounds with configurable closure.
456
457
Parameters:
458
arr_or_dtype: array-like, Series, Index, or data type
459
Object or dtype to check
460
461
Returns:
462
bool: True if array/dtype is interval, False otherwise
463
464
Examples:
465
# Interval dtype and data
466
interval_dtype = cudf.IntervalDtype('int64', closed='right')
467
intervals = cudf.interval_range(0, 10, periods=5)
468
assert cudf.api.types.is_interval_dtype(intervals) # True
469
470
# IntervalIndex
471
idx_interval = cudf.IntervalIndex.from_arrays([0, 1, 2], [1, 2, 3])
472
assert cudf.api.types.is_interval_dtype(idx_interval) # True
473
474
# Check dtype object
475
assert cudf.api.types.is_interval_dtype(interval_dtype) # True
476
477
# Non-interval types
478
s_float = cudf.Series([1.0, 2.0, 3.0])
479
assert not cudf.api.types.is_interval_dtype(s_float) # False
480
"""
481
```
482
483
## Value Type Checking
484
485
Functions for checking properties of values and objects.
486
487
```{ .api }
488
def is_scalar(val) -> bool:
489
"""
490
Check whether the provided value is scalar
491
492
Returns True for single values (not collections). Handles cuDF-specific
493
scalar types including decimal and datetime scalars.
494
495
Parameters:
496
val: Any
497
Value to check for scalar nature
498
499
Returns:
500
bool: True if value is scalar, False otherwise
501
502
Examples:
503
# Scalar values
504
assert cudf.api.types.is_scalar(1) # True
505
assert cudf.api.types.is_scalar(1.5) # True
506
assert cudf.api.types.is_scalar('hello') # True
507
assert cudf.api.types.is_scalar(True) # True
508
509
# cuDF-specific scalars
510
assert cudf.api.types.is_scalar(cudf.NA) # True
511
assert cudf.api.types.is_scalar(cudf.NaT) # True
512
513
# Date/time scalars
514
date_scalar = cudf.to_datetime('2023-01-01')
515
assert cudf.api.types.is_scalar(date_scalar) # True (single date)
516
517
# Non-scalar collections
518
assert not cudf.api.types.is_scalar([1, 2, 3]) # False
519
assert not cudf.api.types.is_scalar(cudf.Series([1, 2])) # False
520
assert not cudf.api.types.is_scalar({'a': 1}) # False
521
522
# Edge cases
523
import numpy as np
524
assert cudf.api.types.is_scalar(np.int64(5)) # True
525
assert not cudf.api.types.is_scalar(np.array([1])) # False (array)
526
"""
527
528
def is_list_like(obj) -> bool:
529
"""
530
Check whether the provided object is list-like
531
532
Returns True for objects that can be iterated over like lists,
533
excluding strings and dicts. Includes cuDF Series, Index, and arrays.
534
535
Parameters:
536
obj: Any
537
Object to check for list-like properties
538
539
Returns:
540
bool: True if object is list-like, False otherwise
541
542
Examples:
543
# List-like objects
544
assert cudf.api.types.is_list_like([1, 2, 3]) # True
545
assert cudf.api.types.is_list_like((1, 2, 3)) # True
546
assert cudf.api.types.is_list_like({1, 2, 3}) # True (set)
547
548
# cuDF objects
549
s = cudf.Series([1, 2, 3])
550
assert cudf.api.types.is_list_like(s) # True
551
552
idx = cudf.Index([1, 2, 3])
553
assert cudf.api.types.is_list_like(idx) # True
554
555
# NumPy/CuPy arrays
556
import numpy as np
557
assert cudf.api.types.is_list_like(np.array([1, 2, 3])) # True
558
559
# Non-list-like objects
560
assert not cudf.api.types.is_list_like('hello') # False (string)
561
assert not cudf.api.types.is_list_like({'a': 1}) # False (dict)
562
assert not cudf.api.types.is_list_like(5) # False (scalar)
563
assert not cudf.api.types.is_list_like(None) # False
564
565
# DataFrame (debatable, but typically False)
566
df = cudf.DataFrame({'A': [1, 2]})
567
assert not cudf.api.types.is_list_like(df) # False
568
"""
569
```
570
571
## Type Validation Patterns
572
573
Common patterns for type validation in cuDF code:
574
575
### Input Validation
576
577
```python
578
def process_numeric_data(data):
579
"""Example function with type validation"""
580
if not cudf.api.types.is_numeric_dtype(data):
581
raise TypeError("Input data must be numeric")
582
583
# Safe to perform numeric operations
584
return data.sum()
585
586
def process_categorical_data(data):
587
"""Handle categorical data specifically"""
588
if cudf.api.types.is_categorical_dtype(data):
589
# Use categorical-specific operations
590
return data.cat.categories
591
else:
592
# Convert to categorical first
593
return cudf.Series(data, dtype='category').cat.categories
594
```
595
596
### Type-Specific Operations
597
598
```python
599
def describe_column(series):
600
"""Provide type-aware column description"""
601
if cudf.api.types.is_numeric_dtype(series):
602
return series.describe() # Statistical summary
603
elif cudf.api.types.is_categorical_dtype(series):
604
return series.value_counts() # Category frequencies
605
elif cudf.api.types.is_datetime64_dtype(series):
606
return {
607
'min': series.min(),
608
'max': series.max(),
609
'range': series.max() - series.min()
610
}
611
else:
612
return series.value_counts() # General frequency count
613
```
614
615
### Extension Type Handling
616
617
```python
618
def process_nested_data(series):
619
"""Handle cuDF extension types"""
620
if cudf.api.types.is_list_dtype(series):
621
# Process list data
622
return series.list.len().mean() # Average list length
623
elif cudf.api.types.is_struct_dtype(series):
624
# Process struct data
625
return list(series.dtype.fields.keys()) # Field names
626
elif cudf.api.types.is_decimal_dtype(series):
627
# Exact decimal arithmetic
628
return series.sum() # No precision loss
629
else:
630
# Standard processing
631
return series.describe()
632
```
633
634
## Performance Notes
635
636
### GPU-Accelerated Type Checking
637
- **Vectorized Operations**: Type checking leverages GPU parallelism for large arrays
638
- **Memory Efficiency**: Checks operate on metadata when possible, avoiding data movement
639
- **Kernel Fusion**: Multiple type checks can be combined into single GPU operations
640
641
### Best Practices
642
- **Early Validation**: Check types at function entry points to fail fast
643
- **Type Caching**: Cache type information for repeated operations on same data
644
- **Batch Checking**: Use vectorized operations instead of element-wise type checks
645
- **Extension Types**: Prefer cuDF extension types for nested and specialized data
646
647
### Integration with pandas
648
- **Compatibility Layer**: Type checking functions work with pandas objects
649
- **Conversion Awareness**: Functions handle type differences between pandas and cuDF
650
- **Fallback Support**: Graceful handling of unsupported type combinations