or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-data-structures.mddata-manipulation.mdindex.mdio-operations.mdpandas-compatibility.mdtesting-utilities.mdtype-checking.md

type-checking.mddocs/

0

# Type Checking & Validation

1

2

cuDF provides comprehensive type checking utilities for validating and working with GPU data types. The type system extends pandas' type checking to handle cuDF-specific types including nested data structures and GPU-accelerated dtypes.

3

4

## Import Statements

5

6

```python

7

# Main type utilities

8

from cudf.api.types import dtype

9

10

# Data type checking functions

11

from cudf.api.types import (

12

is_numeric_dtype, is_string_dtype, is_integer_dtype, is_float_dtype,

13

is_bool_dtype, is_categorical_dtype, is_datetime64_dtype, is_timedelta64_dtype

14

)

15

16

# cuDF-specific type checking

17

from cudf.api.types import (

18

is_decimal_dtype, is_list_dtype, is_struct_dtype, is_interval_dtype

19

)

20

21

# Value type checking

22

from cudf.api.types import is_scalar, is_list_like

23

```

24

25

## Data Type Utilities

26

27

Core utilities for working with cuDF data types and conversions.

28

29

```{ .api }

30

def dtype(dtype_obj) -> cudf.core.dtypes.ExtensionDtype:

31

"""

32

Convert input to cuDF-compatible data type

33

34

Normalizes various dtype specifications into cuDF ExtensionDtype objects.

35

Handles pandas dtypes, numpy dtypes, and cuDF-specific extension types.

36

37

Parameters:

38

dtype_obj: str, numpy.dtype, pandas.ExtensionDtype, or cuDF ExtensionDtype

39

Data type specification to convert

40

41

Returns:

42

cudf.core.dtypes.ExtensionDtype: Normalized cuDF data type

43

44

Raises:

45

TypeError: If dtype cannot be converted to cuDF-compatible type

46

47

Examples:

48

# String dtype specifications

49

dt = cudf.api.types.dtype('int64')

50

dt = cudf.api.types.dtype('float32')

51

dt = cudf.api.types.dtype('category')

52

53

# NumPy dtype conversion

54

import numpy as np

55

dt = cudf.api.types.dtype(np.dtype('datetime64[ns]'))

56

57

# cuDF extension types

58

dt = cudf.api.types.dtype(cudf.ListDtype('int32'))

59

dt = cudf.api.types.dtype(cudf.Decimal64Dtype(10, 2))

60

61

# Pandas compatibility

62

import pandas as pd

63

dt = cudf.api.types.dtype(pd.CategoricalDtype(['a', 'b', 'c']))

64

"""

65

```

66

67

## Standard Data Type Checking

68

69

Functions for checking standard data types with GPU acceleration.

70

71

```{ .api }

72

def is_numeric_dtype(arr_or_dtype) -> bool:

73

"""

74

Check whether the provided array or dtype is numeric

75

76

Returns True for integer, float, complex, and decimal dtypes.

77

Compatible with cuDF extension types and GPU arrays.

78

79

Parameters:

80

arr_or_dtype: array-like, Series, Index, or data type

81

Object or dtype to check

82

83

Returns:

84

bool: True if array/dtype is numeric, False otherwise

85

86

Examples:

87

# Check Series dtype

88

s_int = cudf.Series([1, 2, 3])

89

assert cudf.api.types.is_numeric_dtype(s_int) # True

90

91

s_str = cudf.Series(['a', 'b', 'c'])

92

assert not cudf.api.types.is_numeric_dtype(s_str) # False

93

94

# Check dtype directly

95

assert cudf.api.types.is_numeric_dtype('int64') # True

96

assert cudf.api.types.is_numeric_dtype('float32') # True

97

assert not cudf.api.types.is_numeric_dtype('object') # False

98

99

# cuDF decimal types

100

decimal_dtype = cudf.Decimal64Dtype(10, 2)

101

assert cudf.api.types.is_numeric_dtype(decimal_dtype) # True

102

"""

103

104

def is_string_dtype(arr_or_dtype) -> bool:

105

"""

106

Check whether the provided array or dtype is string

107

108

Returns True for string/object dtypes that contain text data.

109

Handles cuDF string columns and object columns with string data.

110

111

Parameters:

112

arr_or_dtype: array-like, Series, Index, or data type

113

Object or dtype to check

114

115

Returns:

116

bool: True if array/dtype contains string data, False otherwise

117

118

Examples:

119

# String Series

120

s_str = cudf.Series(['hello', 'world'])

121

assert cudf.api.types.is_string_dtype(s_str) # True

122

123

# Object Series with strings

124

s_obj = cudf.Series(['a', 'b'], dtype='object')

125

assert cudf.api.types.is_string_dtype(s_obj) # True

126

127

# Non-string data

128

s_int = cudf.Series([1, 2, 3])

129

assert not cudf.api.types.is_string_dtype(s_int) # False

130

131

# Check dtype string

132

assert cudf.api.types.is_string_dtype('object') # True

133

assert not cudf.api.types.is_string_dtype('int64') # False

134

"""

135

136

def is_integer_dtype(arr_or_dtype) -> bool:

137

"""

138

Check whether the provided array or dtype is integer

139

140

Returns True for signed and unsigned integer dtypes of all bit widths.

141

Excludes floating-point and other numeric types.

142

143

Parameters:

144

arr_or_dtype: array-like, Series, Index, or data type

145

Object or dtype to check

146

147

Returns:

148

bool: True if array/dtype is integer, False otherwise

149

150

Examples:

151

# Integer Series

152

s_int32 = cudf.Series([1, 2, 3], dtype='int32')

153

assert cudf.api.types.is_integer_dtype(s_int32) # True

154

155

s_uint64 = cudf.Series([1, 2, 3], dtype='uint64')

156

assert cudf.api.types.is_integer_dtype(s_uint64) # True

157

158

# Non-integer numeric types

159

s_float = cudf.Series([1.0, 2.0, 3.0])

160

assert not cudf.api.types.is_integer_dtype(s_float) # False

161

162

# Check various integer dtypes

163

assert cudf.api.types.is_integer_dtype('int8') # True

164

assert cudf.api.types.is_integer_dtype('uint32') # True

165

assert not cudf.api.types.is_integer_dtype('float64') # False

166

"""

167

168

def is_float_dtype(arr_or_dtype) -> bool:

169

"""

170

Check whether the provided array or dtype is floating point

171

172

Returns True for single and double precision floating-point dtypes.

173

Excludes integer, decimal, and other numeric types.

174

175

Parameters:

176

arr_or_dtype: array-like, Series, Index, or data type

177

Object or dtype to check

178

179

Returns:

180

bool: True if array/dtype is floating point, False otherwise

181

182

Examples:

183

# Float Series

184

s_float32 = cudf.Series([1.1, 2.2, 3.3], dtype='float32')

185

assert cudf.api.types.is_float_dtype(s_float32) # True

186

187

s_float64 = cudf.Series([1.0, 2.0, 3.0]) # Default float64

188

assert cudf.api.types.is_float_dtype(s_float64) # True

189

190

# Non-float types

191

s_int = cudf.Series([1, 2, 3])

192

assert not cudf.api.types.is_float_dtype(s_int) # False

193

194

# Check dtype strings

195

assert cudf.api.types.is_float_dtype('float32') # True

196

assert cudf.api.types.is_float_dtype('float64') # True

197

assert not cudf.api.types.is_float_dtype('int32') # False

198

"""

199

200

def is_bool_dtype(arr_or_dtype) -> bool:

201

"""

202

Check whether the provided array or dtype is boolean

203

204

Returns True for boolean dtypes. Handles cuDF boolean columns

205

and boolean masks used in filtering operations.

206

207

Parameters:

208

arr_or_dtype: array-like, Series, Index, or data type

209

Object or dtype to check

210

211

Returns:

212

bool: True if array/dtype is boolean, False otherwise

213

214

Examples:

215

# Boolean Series

216

s_bool = cudf.Series([True, False, True])

217

assert cudf.api.types.is_bool_dtype(s_bool) # True

218

219

# Boolean mask from comparison

220

s_int = cudf.Series([1, 2, 3])

221

mask = s_int > 1 # Boolean Series

222

assert cudf.api.types.is_bool_dtype(mask) # True

223

224

# Non-boolean types

225

assert not cudf.api.types.is_bool_dtype(s_int) # False

226

227

# Check dtype

228

assert cudf.api.types.is_bool_dtype('bool') # True

229

assert not cudf.api.types.is_bool_dtype('int64') # False

230

"""

231

232

def is_categorical_dtype(arr_or_dtype) -> bool:

233

"""

234

Check whether the provided array or dtype is categorical

235

236

Returns True for cuDF categorical dtypes and pandas CategoricalDtype.

237

Handles both ordered and unordered categorical data.

238

239

Parameters:

240

arr_or_dtype: array-like, Series, Index, or data type

241

Object or dtype to check

242

243

Returns:

244

bool: True if array/dtype is categorical, False otherwise

245

246

Examples:

247

# Categorical Series

248

categories = ['red', 'blue', 'green']

249

s_cat = cudf.Series(['red', 'blue', 'red'], dtype='category')

250

assert cudf.api.types.is_categorical_dtype(s_cat) # True

251

252

# CategoricalIndex

253

idx_cat = cudf.CategoricalIndex(['a', 'b', 'c'])

254

assert cudf.api.types.is_categorical_dtype(idx_cat) # True

255

256

# Non-categorical

257

s_str = cudf.Series(['red', 'blue', 'green'])

258

assert not cudf.api.types.is_categorical_dtype(s_str) # False

259

260

# Check CategoricalDtype

261

cat_dtype = cudf.CategoricalDtype(categories)

262

assert cudf.api.types.is_categorical_dtype(cat_dtype) # True

263

"""

264

```

265

266

## Date and Time Type Checking

267

268

Specialized functions for temporal data types.

269

270

```{ .api }

271

def is_datetime64_dtype(arr_or_dtype) -> bool:

272

"""

273

Check whether the provided array or dtype is datetime64

274

275

Returns True for datetime64 dtypes with any time unit resolution.

276

Handles cuDF DatetimeIndex and datetime columns.

277

278

Parameters:

279

arr_or_dtype: array-like, Series, Index, or data type

280

Object or dtype to check

281

282

Returns:

283

bool: True if array/dtype is datetime64, False otherwise

284

285

Examples:

286

# Datetime Series

287

dates = cudf.to_datetime(['2023-01-01', '2023-01-02'])

288

assert cudf.api.types.is_datetime64_dtype(dates) # True

289

290

# DatetimeIndex

291

date_idx = cudf.DatetimeIndex(['2023-01-01', '2023-01-02'])

292

assert cudf.api.types.is_datetime64_dtype(date_idx) # True

293

294

# Non-datetime types

295

s_str = cudf.Series(['2023-01-01', '2023-01-02']) # String, not parsed

296

assert not cudf.api.types.is_datetime64_dtype(s_str) # False

297

298

# Check dtype strings

299

assert cudf.api.types.is_datetime64_dtype('datetime64[ns]') # True

300

assert cudf.api.types.is_datetime64_dtype('datetime64[ms]') # True

301

assert not cudf.api.types.is_datetime64_dtype('int64') # False

302

"""

303

304

def is_timedelta64_dtype(arr_or_dtype) -> bool:

305

"""

306

Check whether the provided array or dtype is timedelta64

307

308

Returns True for timedelta64 dtypes representing time durations.

309

Handles cuDF TimedeltaIndex and timedelta columns.

310

311

Parameters:

312

arr_or_dtype: array-like, Series, Index, or data type

313

Object or dtype to check

314

315

Returns:

316

bool: True if array/dtype is timedelta64, False otherwise

317

318

Examples:

319

# Timedelta Series

320

deltas = cudf.Series(['1 day', '2 hours', '30 minutes'])

321

deltas = cudf.to_timedelta(deltas)

322

assert cudf.api.types.is_timedelta64_dtype(deltas) # True

323

324

# TimedeltaIndex

325

td_idx = cudf.TimedeltaIndex(['1D', '2H', '30min'])

326

assert cudf.api.types.is_timedelta64_dtype(td_idx) # True

327

328

# Computed timedeltas

329

date1 = cudf.to_datetime('2023-01-02')

330

date2 = cudf.to_datetime('2023-01-01')

331

diff = date1 - date2 # Timedelta

332

assert cudf.api.types.is_timedelta64_dtype(diff) # True

333

334

# Check dtype

335

assert cudf.api.types.is_timedelta64_dtype('timedelta64[ns]') # True

336

"""

337

```

338

339

## cuDF Extension Type Checking

340

341

Functions for checking cuDF-specific extension data types.

342

343

```{ .api }

344

def is_decimal_dtype(arr_or_dtype) -> bool:

345

"""

346

Check whether the provided array or dtype is decimal

347

348

Returns True for cuDF decimal dtypes (Decimal32, Decimal64, Decimal128).

349

These provide exact decimal arithmetic without floating-point errors.

350

351

Parameters:

352

arr_or_dtype: array-like, Series, Index, or data type

353

Object or dtype to check

354

355

Returns:

356

bool: True if array/dtype is decimal, False otherwise

357

358

Examples:

359

# Decimal Series

360

decimal_dtype = cudf.Decimal64Dtype(precision=10, scale=2)

361

s_decimal = cudf.Series([1.23, 4.56], dtype=decimal_dtype)

362

assert cudf.api.types.is_decimal_dtype(s_decimal) # True

363

364

# Different decimal precisions

365

dec32 = cudf.Decimal32Dtype(7, 2)

366

dec128 = cudf.Decimal128Dtype(20, 4)

367

assert cudf.api.types.is_decimal_dtype(dec32) # True

368

assert cudf.api.types.is_decimal_dtype(dec128) # True

369

370

# Non-decimal numeric types

371

s_float = cudf.Series([1.23, 4.56], dtype='float64')

372

assert not cudf.api.types.is_decimal_dtype(s_float) # False

373

374

# Check from dtype object

375

assert cudf.api.types.is_decimal_dtype(decimal_dtype) # True

376

"""

377

378

def is_list_dtype(arr_or_dtype) -> bool:

379

"""

380

Check whether the provided array or dtype is list

381

382

Returns True for cuDF list dtypes representing nested list data.

383

Each row contains a variable-length list of elements.

384

385

Parameters:

386

arr_or_dtype: array-like, Series, Index, or data type

387

Object or dtype to check

388

389

Returns:

390

bool: True if array/dtype is list, False otherwise

391

392

Examples:

393

# List Series

394

list_dtype = cudf.ListDtype('int64')

395

s_list = cudf.Series([[1, 2, 3], [4, 5], [6]], dtype=list_dtype)

396

assert cudf.api.types.is_list_dtype(s_list) # True

397

398

# Nested lists with different element types

399

str_list_dtype = cudf.ListDtype('str')

400

s_str_list = cudf.Series([['a', 'b'], ['c']], dtype=str_list_dtype)

401

assert cudf.api.types.is_list_dtype(s_str_list) # True

402

403

# Non-list types

404

s_regular = cudf.Series([1, 2, 3])

405

assert not cudf.api.types.is_list_dtype(s_regular) # False

406

407

# Check dtype object

408

assert cudf.api.types.is_list_dtype(list_dtype) # True

409

"""

410

411

def is_struct_dtype(arr_or_dtype) -> bool:

412

"""

413

Check whether the provided array or dtype is struct

414

415

Returns True for cuDF struct dtypes representing nested structured data.

416

Each row contains multiple named fields with potentially different types.

417

418

Parameters:

419

arr_or_dtype: array-like, Series, Index, or data type

420

Object or dtype to check

421

422

Returns:

423

bool: True if array/dtype is struct, False otherwise

424

425

Examples:

426

# Struct dtype

427

struct_dtype = cudf.StructDtype({

428

'x': 'int64',

429

'y': 'float64',

430

'name': 'str'

431

})

432

s_struct = cudf.Series([

433

{'x': 1, 'y': 1.1, 'name': 'first'},

434

{'x': 2, 'y': 2.2, 'name': 'second'}

435

], dtype=struct_dtype)

436

assert cudf.api.types.is_struct_dtype(s_struct) # True

437

438

# Check dtype object directly

439

assert cudf.api.types.is_struct_dtype(struct_dtype) # True

440

441

# Non-struct types

442

s_dict = cudf.Series([{'a': 1}, {'b': 2}]) # Object, not struct

443

assert not cudf.api.types.is_struct_dtype(s_dict) # False

444

445

# Regular Series

446

s_int = cudf.Series([1, 2, 3])

447

assert not cudf.api.types.is_struct_dtype(s_int) # False

448

"""

449

450

def is_interval_dtype(arr_or_dtype) -> bool:

451

"""

452

Check whether the provided array or dtype is interval

453

454

Returns True for cuDF interval dtypes representing interval data.

455

Intervals have left and right bounds with configurable closure.

456

457

Parameters:

458

arr_or_dtype: array-like, Series, Index, or data type

459

Object or dtype to check

460

461

Returns:

462

bool: True if array/dtype is interval, False otherwise

463

464

Examples:

465

# Interval dtype and data

466

interval_dtype = cudf.IntervalDtype('int64', closed='right')

467

intervals = cudf.interval_range(0, 10, periods=5)

468

assert cudf.api.types.is_interval_dtype(intervals) # True

469

470

# IntervalIndex

471

idx_interval = cudf.IntervalIndex.from_arrays([0, 1, 2], [1, 2, 3])

472

assert cudf.api.types.is_interval_dtype(idx_interval) # True

473

474

# Check dtype object

475

assert cudf.api.types.is_interval_dtype(interval_dtype) # True

476

477

# Non-interval types

478

s_float = cudf.Series([1.0, 2.0, 3.0])

479

assert not cudf.api.types.is_interval_dtype(s_float) # False

480

"""

481

```

482

483

## Value Type Checking

484

485

Functions for checking properties of values and objects.

486

487

```{ .api }

488

def is_scalar(val) -> bool:

489

"""

490

Check whether the provided value is scalar

491

492

Returns True for single values (not collections). Handles cuDF-specific

493

scalar types including decimal and datetime scalars.

494

495

Parameters:

496

val: Any

497

Value to check for scalar nature

498

499

Returns:

500

bool: True if value is scalar, False otherwise

501

502

Examples:

503

# Scalar values

504

assert cudf.api.types.is_scalar(1) # True

505

assert cudf.api.types.is_scalar(1.5) # True

506

assert cudf.api.types.is_scalar('hello') # True

507

assert cudf.api.types.is_scalar(True) # True

508

509

# cuDF-specific scalars

510

assert cudf.api.types.is_scalar(cudf.NA) # True

511

assert cudf.api.types.is_scalar(cudf.NaT) # True

512

513

# Date/time scalars

514

date_scalar = cudf.to_datetime('2023-01-01')

515

assert cudf.api.types.is_scalar(date_scalar) # True (single date)

516

517

# Non-scalar collections

518

assert not cudf.api.types.is_scalar([1, 2, 3]) # False

519

assert not cudf.api.types.is_scalar(cudf.Series([1, 2])) # False

520

assert not cudf.api.types.is_scalar({'a': 1}) # False

521

522

# Edge cases

523

import numpy as np

524

assert cudf.api.types.is_scalar(np.int64(5)) # True

525

assert not cudf.api.types.is_scalar(np.array([1])) # False (array)

526

"""

527

528

def is_list_like(obj) -> bool:

529

"""

530

Check whether the provided object is list-like

531

532

Returns True for objects that can be iterated over like lists,

533

excluding strings and dicts. Includes cuDF Series, Index, and arrays.

534

535

Parameters:

536

obj: Any

537

Object to check for list-like properties

538

539

Returns:

540

bool: True if object is list-like, False otherwise

541

542

Examples:

543

# List-like objects

544

assert cudf.api.types.is_list_like([1, 2, 3]) # True

545

assert cudf.api.types.is_list_like((1, 2, 3)) # True

546

assert cudf.api.types.is_list_like({1, 2, 3}) # True (set)

547

548

# cuDF objects

549

s = cudf.Series([1, 2, 3])

550

assert cudf.api.types.is_list_like(s) # True

551

552

idx = cudf.Index([1, 2, 3])

553

assert cudf.api.types.is_list_like(idx) # True

554

555

# NumPy/CuPy arrays

556

import numpy as np

557

assert cudf.api.types.is_list_like(np.array([1, 2, 3])) # True

558

559

# Non-list-like objects

560

assert not cudf.api.types.is_list_like('hello') # False (string)

561

assert not cudf.api.types.is_list_like({'a': 1}) # False (dict)

562

assert not cudf.api.types.is_list_like(5) # False (scalar)

563

assert not cudf.api.types.is_list_like(None) # False

564

565

# DataFrame (debatable, but typically False)

566

df = cudf.DataFrame({'A': [1, 2]})

567

assert not cudf.api.types.is_list_like(df) # False

568

"""

569

```

570

571

## Type Validation Patterns

572

573

Common patterns for type validation in cuDF code:

574

575

### Input Validation

576

577

```python

578

def process_numeric_data(data):

579

"""Example function with type validation"""

580

if not cudf.api.types.is_numeric_dtype(data):

581

raise TypeError("Input data must be numeric")

582

583

# Safe to perform numeric operations

584

return data.sum()

585

586

def process_categorical_data(data):

587

"""Handle categorical data specifically"""

588

if cudf.api.types.is_categorical_dtype(data):

589

# Use categorical-specific operations

590

return data.cat.categories

591

else:

592

# Convert to categorical first

593

return cudf.Series(data, dtype='category').cat.categories

594

```

595

596

### Type-Specific Operations

597

598

```python

599

def describe_column(series):

600

"""Provide type-aware column description"""

601

if cudf.api.types.is_numeric_dtype(series):

602

return series.describe() # Statistical summary

603

elif cudf.api.types.is_categorical_dtype(series):

604

return series.value_counts() # Category frequencies

605

elif cudf.api.types.is_datetime64_dtype(series):

606

return {

607

'min': series.min(),

608

'max': series.max(),

609

'range': series.max() - series.min()

610

}

611

else:

612

return series.value_counts() # General frequency count

613

```

614

615

### Extension Type Handling

616

617

```python

618

def process_nested_data(series):

619

"""Handle cuDF extension types"""

620

if cudf.api.types.is_list_dtype(series):

621

# Process list data

622

return series.list.len().mean() # Average list length

623

elif cudf.api.types.is_struct_dtype(series):

624

# Process struct data

625

return list(series.dtype.fields.keys()) # Field names

626

elif cudf.api.types.is_decimal_dtype(series):

627

# Exact decimal arithmetic

628

return series.sum() # No precision loss

629

else:

630

# Standard processing

631

return series.describe()

632

```

633

634

## Performance Notes

635

636

### GPU-Accelerated Type Checking

637

- **Vectorized Operations**: Type checking leverages GPU parallelism for large arrays

638

- **Memory Efficiency**: Checks operate on metadata when possible, avoiding data movement

639

- **Kernel Fusion**: Multiple type checks can be combined into single GPU operations

640

641

### Best Practices

642

- **Early Validation**: Check types at function entry points to fail fast

643

- **Type Caching**: Cache type information for repeated operations on same data

644

- **Batch Checking**: Use vectorized operations instead of element-wise type checks

645

- **Extension Types**: Prefer cuDF extension types for nested and specialized data

646

647

### Integration with pandas

648

- **Compatibility Layer**: Type checking functions work with pandas objects

649

- **Conversion Awareness**: Functions handle type differences between pandas and cuDF

650

- **Fallback Support**: Graceful handling of unsupported type combinations