or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-data-structures.mddata-manipulation.mdindex.mdio-operations.mdpandas-compatibility.mdtesting-utilities.mdtype-checking.md

testing-utilities.mddocs/

0

# Testing Utilities

1

2

cuDF provides comprehensive testing utilities for GPU-aware testing of DataFrames, Series, and Index objects. These functions provide detailed comparison capabilities that handle GPU memory, floating-point precision, and cuDF-specific data types.

3

4

## Import Statements

5

6

```python

7

# Core testing functions

8

from cudf.testing import (

9

assert_eq, assert_neq,

10

assert_frame_equal, assert_series_equal, assert_index_equal

11

)

12

13

# For use in test suites

14

import cudf.testing as cudf_testing

15

```

16

17

## Generic Equality Assertions

18

19

Universal equality testing function that handles all cuDF object types.

20

21

```{ .api }

22

def assert_eq(

23

left,

24

right,

25

check_dtype=True,

26

check_exact=False,

27

check_datetimelike_compat=False,

28

check_categorical=True,

29

check_category_order=True,

30

rtol=1e-05,

31

atol=1e-08,

32

**kwargs

33

) -> None:

34

"""

35

Generic equality assertion for cuDF objects with GPU-aware comparison

36

37

Comprehensive equality testing that automatically detects object type

38

and applies appropriate comparison logic. Handles DataFrames, Series,

39

Index objects, and scalar values with GPU memory considerations.

40

41

Parameters:

42

left: cuDF object, pandas object, or scalar

43

Expected result object

44

right: cuDF object, pandas object, or scalar

45

Actual result object

46

check_dtype: bool, default True

47

Whether to check dtype compatibility exactly

48

check_exact: bool, default False

49

Whether to check exact equality (no floating-point tolerance)

50

check_datetimelike_compat: bool, default False

51

Whether to compare datetime-like objects across types

52

check_categorical: bool, default True

53

Whether to check categorical data consistency

54

check_category_order: bool, default True

55

Whether categorical category order must match

56

rtol: float, default 1e-05

57

Relative tolerance for floating-point comparisons

58

atol: float, default 1e-08

59

Absolute tolerance for floating-point comparisons

60

**kwargs: additional arguments

61

Type-specific comparison options

62

63

Raises:

64

AssertionError: If objects are not equal according to specified criteria

65

66

Examples:

67

# DataFrame comparison

68

expected = cudf.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})

69

actual = cudf.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})

70

cudf.testing.assert_eq(expected, actual)

71

72

# Series comparison with tolerance

73

expected = cudf.Series([1.1, 2.2, 3.3])

74

actual = cudf.Series([1.100001, 2.200001, 3.300001])

75

cudf.testing.assert_eq(expected, actual, rtol=1e-4)

76

77

# Mixed cuDF/pandas comparison

78

cudf_series = cudf.Series([1, 2, 3])

79

pandas_series = cudf_series.to_pandas()

80

cudf.testing.assert_eq(cudf_series, pandas_series)

81

82

# Scalar comparison

83

cudf.testing.assert_eq(5, 5)

84

cudf.testing.assert_eq(3.14159, 3.14160, rtol=1e-4)

85

86

# Categorical comparison

87

cat1 = cudf.Series(['a', 'b', 'c'], dtype='category')

88

cat2 = cudf.Series(['a', 'b', 'c'], dtype='category')

89

cudf.testing.assert_eq(cat1, cat2, check_categorical=True)

90

"""

91

92

def assert_neq(

93

left,

94

right,

95

**kwargs

96

) -> None:

97

"""

98

Assert that two objects are not equal

99

100

Inverse of assert_eq - ensures objects are different according to

101

the same comparison criteria used by assert_eq.

102

103

Parameters:

104

left: cuDF object, pandas object, or scalar

105

First object to compare

106

right: cuDF object, pandas object, or scalar

107

Second object to compare

108

**kwargs: additional arguments

109

Passed to underlying comparison functions

110

111

Raises:

112

AssertionError: If objects are equal according to comparison criteria

113

114

Examples:

115

# Different DataFrames

116

df1 = cudf.DataFrame({'A': [1, 2, 3]})

117

df2 = cudf.DataFrame({'A': [4, 5, 6]})

118

cudf.testing.assert_neq(df1, df2)

119

120

# Different dtypes

121

series1 = cudf.Series([1, 2, 3], dtype='int32')

122

series2 = cudf.Series([1, 2, 3], dtype='int64')

123

cudf.testing.assert_neq(series1, series2, check_dtype=True)

124

125

# Different values

126

cudf.testing.assert_neq(5, 6)

127

cudf.testing.assert_neq([1, 2, 3], [1, 2, 4])

128

"""

129

```

130

131

## DataFrame Equality Assertions

132

133

Detailed DataFrame comparison with comprehensive options for handling edge cases.

134

135

```{ .api }

136

def assert_frame_equal(

137

left,

138

right,

139

check_dtype=True,

140

check_index_type=True,

141

check_column_type=True,

142

check_frame_type=True,

143

check_names=True,

144

check_exact=False,

145

check_datetimelike_compat=False,

146

check_categorical=True,

147

check_category_order=True,

148

check_like=False,

149

rtol=1e-05,

150

atol=1e-08,

151

**kwargs

152

) -> None:

153

"""

154

Assert DataFrame equality with comprehensive GPU-aware comparison

155

156

Detailed DataFrame comparison that checks data values, dtypes, indexes,

157

column names, and metadata. Optimized for GPU DataFrames with support

158

for floating-point tolerance and categorical data.

159

160

Parameters:

161

left: DataFrame

162

Expected DataFrame result

163

right: DataFrame

164

Actual DataFrame result

165

check_dtype: bool, default True

166

Whether to check that dtypes match exactly

167

check_index_type: bool, default True

168

Whether to check index type compatibility

169

check_column_type: bool, default True

170

Whether to check column type compatibility

171

check_frame_type: bool, default True

172

Whether to check that both objects are DataFrames

173

check_names: bool, default True

174

Whether to check index and column names match

175

check_exact: bool, default False

176

Whether to use exact equality (no floating-point tolerance)

177

check_datetimelike_compat: bool, default False

178

Whether to allow comparison of different datetime-like types

179

check_categorical: bool, default True

180

Whether to check categorical data consistency

181

check_category_order: bool, default True

182

Whether categorical category order must match exactly

183

check_like: bool, default False

184

Whether to ignore order of index and columns

185

rtol: float, default 1e-05

186

Relative tolerance for floating-point comparison

187

atol: float, default 1e-08

188

Absolute tolerance for floating-point comparison

189

**kwargs: additional arguments

190

Additional comparison options

191

192

Raises:

193

AssertionError: If DataFrames are not equal with detailed diff message

194

195

Examples:

196

# Basic DataFrame comparison

197

expected = cudf.DataFrame({

198

'A': [1, 2, 3],

199

'B': [4.0, 5.0, 6.0],

200

'C': ['x', 'y', 'z']

201

})

202

actual = cudf.DataFrame({

203

'A': [1, 2, 3],

204

'B': [4.0, 5.0, 6.0],

205

'C': ['x', 'y', 'z']

206

})

207

cudf.testing.assert_frame_equal(expected, actual)

208

209

# With custom index

210

expected.index = ['row1', 'row2', 'row3']

211

actual.index = ['row1', 'row2', 'row3']

212

cudf.testing.assert_frame_equal(expected, actual, check_names=True)

213

214

# Floating-point tolerance

215

expected = cudf.DataFrame({'vals': [1.1, 2.2, 3.3]})

216

actual = cudf.DataFrame({'vals': [1.100001, 2.200001, 3.300001]})

217

cudf.testing.assert_frame_equal(expected, actual, rtol=1e-4)

218

219

# Ignore column/index order

220

expected = cudf.DataFrame({'A': [1, 2], 'B': [3, 4]})

221

actual = cudf.DataFrame({'B': [3, 4], 'A': [1, 2]})

222

cudf.testing.assert_frame_equal(expected, actual, check_like=True)

223

224

# Mixed cuDF/pandas comparison

225

cudf_df = cudf.DataFrame({'x': [1, 2, 3]})

226

pandas_df = cudf_df.to_pandas()

227

cudf.testing.assert_frame_equal(cudf_df, pandas_df)

228

229

# Categorical data

230

cat_df1 = cudf.DataFrame({

231

'cat_col': cudf.Series(['a', 'b', 'c'], dtype='category')

232

})

233

cat_df2 = cudf.DataFrame({

234

'cat_col': cudf.Series(['a', 'b', 'c'], dtype='category')

235

})

236

cudf.testing.assert_frame_equal(cat_df1, cat_df2, check_categorical=True)

237

"""

238

```

239

240

## Series Equality Assertions

241

242

Detailed Series comparison with support for all cuDF data types.

243

244

```{ .api }

245

def assert_series_equal(

246

left,

247

right,

248

check_dtype=True,

249

check_index_type=True,

250

check_series_type=True,

251

check_names=True,

252

check_exact=False,

253

check_datetimelike_compat=False,

254

check_categorical=True,

255

check_category_order=True,

256

rtol=1e-05,

257

atol=1e-08,

258

**kwargs

259

) -> None:

260

"""

261

Assert Series equality with GPU-aware detailed comparison

262

263

Comprehensive Series comparison that validates data values, dtype,

264

index, name, and metadata. Handles cuDF-specific data types including

265

nested types (lists, structs) and extension types (decimals).

266

267

Parameters:

268

left: Series

269

Expected Series result

270

right: Series

271

Actual Series result

272

check_dtype: bool, default True

273

Whether to check dtype compatibility exactly

274

check_index_type: bool, default True

275

Whether to check index type compatibility

276

check_series_type: bool, default True

277

Whether to check that both objects are Series

278

check_names: bool, default True

279

Whether to check Series and index names match

280

check_exact: bool, default False

281

Whether to use exact equality (no floating-point tolerance)

282

check_datetimelike_compat: bool, default False

283

Whether to allow comparison of different datetime-like types

284

check_categorical: bool, default True

285

Whether to check categorical data consistency

286

check_category_order: bool, default True

287

Whether categorical category order must match

288

rtol: float, default 1e-05

289

Relative tolerance for floating-point comparison

290

atol: float, default 1e-08

291

Absolute tolerance for floating-point comparison

292

**kwargs: additional arguments

293

Additional comparison options

294

295

Raises:

296

AssertionError: If Series are not equal with detailed diff message

297

298

Examples:

299

# Basic Series comparison

300

expected = cudf.Series([1, 2, 3, 4, 5])

301

actual = cudf.Series([1, 2, 3, 4, 5])

302

cudf.testing.assert_series_equal(expected, actual)

303

304

# With custom index and name

305

expected = cudf.Series([10, 20, 30],

306

index=['a', 'b', 'c'],

307

name='values')

308

actual = cudf.Series([10, 20, 30],

309

index=['a', 'b', 'c'],

310

name='values')

311

cudf.testing.assert_series_equal(expected, actual, check_names=True)

312

313

# Floating-point data with tolerance

314

expected = cudf.Series([1.1, 2.2, 3.3])

315

actual = cudf.Series([1.100001, 2.200001, 3.300001])

316

cudf.testing.assert_series_equal(expected, actual, rtol=1e-4)

317

318

# String data

319

expected = cudf.Series(['hello', 'world', 'cudf'])

320

actual = cudf.Series(['hello', 'world', 'cudf'])

321

cudf.testing.assert_series_equal(expected, actual)

322

323

# Categorical data

324

expected = cudf.Series(['red', 'blue', 'red'], dtype='category')

325

actual = cudf.Series(['red', 'blue', 'red'], dtype='category')

326

cudf.testing.assert_series_equal(expected, actual, check_categorical=True)

327

328

# Datetime data

329

dates = ['2023-01-01', '2023-01-02', '2023-01-03']

330

expected = cudf.to_datetime(cudf.Series(dates))

331

actual = cudf.to_datetime(cudf.Series(dates))

332

cudf.testing.assert_series_equal(expected, actual)

333

334

# List data (nested type)

335

expected = cudf.Series([[1, 2], [3, 4, 5], [6]])

336

actual = cudf.Series([[1, 2], [3, 4, 5], [6]])

337

cudf.testing.assert_series_equal(expected, actual)

338

339

# Decimal data

340

decimal_dtype = cudf.Decimal64Dtype(10, 2)

341

expected = cudf.Series([1.23, 4.56], dtype=decimal_dtype)

342

actual = cudf.Series([1.23, 4.56], dtype=decimal_dtype)

343

cudf.testing.assert_series_equal(expected, actual, check_exact=True)

344

"""

345

```

346

347

## Index Equality Assertions

348

349

Comprehensive Index comparison for all cuDF Index types.

350

351

```{ .api }

352

def assert_index_equal(

353

left,

354

right,

355

exact='equiv',

356

check_names=True,

357

check_exact=False,

358

check_categorical=True,

359

check_order=True,

360

rtol=1e-05,

361

atol=1e-08,

362

**kwargs

363

) -> None:

364

"""

365

Assert Index equality with support for all cuDF Index types

366

367

Detailed comparison of Index objects including RangeIndex, DatetimeIndex,

368

CategoricalIndex, MultiIndex, and other specialized Index types.

369

370

Parameters:

371

left: Index

372

Expected Index result

373

right: Index

374

Actual Index result

375

exact: str or bool, default 'equiv'

376

Level of exactness ('equiv' for equivalent, True for exact, False for basic)

377

check_names: bool, default True

378

Whether to check Index name compatibility

379

check_exact: bool, default False

380

Whether to use exact equality (no floating-point tolerance)

381

check_categorical: bool, default True

382

Whether to check categorical index data consistency

383

check_order: bool, default True

384

Whether to check that order of elements matches

385

rtol: float, default 1e-05

386

Relative tolerance for floating-point comparison

387

atol: float, default 1e-08

388

Absolute tolerance for floating-point comparison

389

**kwargs: additional arguments

390

Index-type specific comparison options

391

392

Raises:

393

AssertionError: If indexes are not equal with detailed diff message

394

395

Examples:

396

# Basic Index comparison

397

expected = cudf.Index([1, 2, 3, 4, 5])

398

actual = cudf.Index([1, 2, 3, 4, 5])

399

cudf.testing.assert_index_equal(expected, actual)

400

401

# Named Index

402

expected = cudf.Index([10, 20, 30], name='values')

403

actual = cudf.Index([10, 20, 30], name='values')

404

cudf.testing.assert_index_equal(expected, actual, check_names=True)

405

406

# RangeIndex comparison

407

expected = cudf.RangeIndex(10) # 0-9

408

actual = cudf.RangeIndex(start=0, stop=10, step=1)

409

cudf.testing.assert_index_equal(expected, actual)

410

411

# DatetimeIndex comparison

412

dates = ['2023-01-01', '2023-01-02', '2023-01-03']

413

expected = cudf.DatetimeIndex(dates)

414

actual = cudf.DatetimeIndex(dates)

415

cudf.testing.assert_index_equal(expected, actual)

416

417

# CategoricalIndex comparison

418

categories = ['red', 'blue', 'green']

419

expected = cudf.CategoricalIndex(['red', 'blue', 'red'])

420

actual = cudf.CategoricalIndex(['red', 'blue', 'red'])

421

cudf.testing.assert_index_equal(expected, actual, check_categorical=True)

422

423

# MultiIndex comparison

424

arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]

425

expected = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])

426

actual = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])

427

cudf.testing.assert_index_equal(expected, actual, check_names=True)

428

429

# IntervalIndex comparison

430

expected = cudf.interval_range(0, 10, periods=5)

431

actual = cudf.interval_range(0, 10, periods=5)

432

cudf.testing.assert_index_equal(expected, actual)

433

434

# Float Index with tolerance

435

expected = cudf.Index([1.1, 2.2, 3.3])

436

actual = cudf.Index([1.100001, 2.200001, 3.300001])

437

cudf.testing.assert_index_equal(expected, actual, rtol=1e-4)

438

"""

439

```

440

441

## Advanced Testing Patterns

442

443

### Parameterized Testing

444

445

```python

446

import pytest

447

import cudf

448

import cudf.testing

449

450

class TestDataFrameOperations:

451

"""Example test class using cuDF testing utilities"""

452

453

@pytest.mark.parametrize("data", [

454

{'A': [1, 2, 3], 'B': [4, 5, 6]},

455

{'x': [1.1, 2.2], 'y': [3.3, 4.4]},

456

{'str_col': ['a', 'b', 'c']}

457

])

458

def test_dataframe_creation(self, data):

459

"""Test DataFrame creation with various data types"""

460

df = cudf.DataFrame(data)

461

expected = cudf.DataFrame(data)

462

cudf.testing.assert_frame_equal(df, expected)

463

464

@pytest.mark.parametrize("dtype", ['int32', 'int64', 'float32', 'float64'])

465

def test_series_dtypes(self, dtype):

466

"""Test Series with different numeric dtypes"""

467

data = [1, 2, 3, 4, 5]

468

series = cudf.Series(data, dtype=dtype)

469

expected = cudf.Series(data, dtype=dtype)

470

cudf.testing.assert_series_equal(series, expected, check_dtype=True)

471

```

472

473

### GPU Memory Testing

474

475

```python

476

import cudf

477

import cudf.testing

478

479

def test_large_dataframe_operations():

480

"""Test operations on large DataFrames that require GPU memory management"""

481

482

# Create large DataFrame

483

n_rows = 1_000_000

484

df = cudf.DataFrame({

485

'A': range(n_rows),

486

'B': range(n_rows, 2 * n_rows),

487

'C': [f'str_{i}' for i in range(n_rows)]

488

})

489

490

# Perform operations and verify results

491

grouped = df.groupby('A').sum()

492

expected_b_sum = df['B'].sum() # All B values summed

493

494

# Use testing utilities to verify

495

assert len(grouped) <= n_rows # Sanity check

496

cudf.testing.assert_eq(grouped['B'].sum(), expected_b_sum)

497

498

def test_memory_efficient_operations():

499

"""Test that operations don't unnecessarily copy GPU memory"""

500

original_df = cudf.DataFrame({'x': range(100000)})

501

502

# Operation that should not copy data

503

view_df = original_df[['x']] # Column selection

504

505

# Verify data is shared (same underlying GPU memory)

506

# Note: Actual memory sharing verification would require

507

# more sophisticated GPU memory inspection

508

cudf.testing.assert_series_equal(original_df['x'], view_df['x'])

509

```

510

511

### Error Condition Testing

512

513

```python

514

import pytest

515

import cudf

516

import cudf.testing

517

518

def test_assertion_errors():

519

"""Test that assertion functions properly raise errors for different data"""

520

521

df1 = cudf.DataFrame({'A': [1, 2, 3]})

522

df2 = cudf.DataFrame({'A': [4, 5, 6]})

523

524

# This should raise AssertionError

525

with pytest.raises(AssertionError):

526

cudf.testing.assert_frame_equal(df1, df2)

527

528

# Test dtype mismatch

529

series1 = cudf.Series([1, 2, 3], dtype='int32')

530

series2 = cudf.Series([1, 2, 3], dtype='int64')

531

532

with pytest.raises(AssertionError):

533

cudf.testing.assert_series_equal(series1, series2, check_dtype=True)

534

535

# But should pass without dtype checking

536

cudf.testing.assert_series_equal(series1, series2, check_dtype=False)

537

538

def test_tolerance_behavior():

539

"""Test floating-point tolerance behavior"""

540

541

# Within tolerance - should pass

542

series1 = cudf.Series([1.0, 2.0, 3.0])

543

series2 = cudf.Series([1.0000001, 2.0000001, 3.0000001])

544

cudf.testing.assert_series_equal(series1, series2, rtol=1e-6)

545

546

# Outside tolerance - should fail

547

series3 = cudf.Series([1.1, 2.1, 3.1])

548

with pytest.raises(AssertionError):

549

cudf.testing.assert_series_equal(series1, series3, rtol=1e-6)

550

```

551

552

### Cross-Platform Testing

553

554

```python

555

import cudf

556

import pandas as pd

557

import cudf.testing

558

559

def test_cudf_pandas_compatibility():

560

"""Test that cuDF and pandas produce equivalent results"""

561

562

# Create equivalent data in both libraries

563

data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}

564

cudf_df = cudf.DataFrame(data)

565

pandas_df = pd.DataFrame(data)

566

567

# Perform same operation on both

568

cudf_result = cudf_df.groupby('A').sum()

569

pandas_result = pandas_df.groupby('A').sum()

570

571

# Compare results (cuDF testing handles cross-library comparison)

572

cudf.testing.assert_frame_equal(cudf_result, pandas_result)

573

574

def test_round_trip_conversion():

575

"""Test cuDF -> pandas -> cuDF conversion preserves data"""

576

577

original = cudf.DataFrame({

578

'ints': [1, 2, 3],

579

'floats': [1.1, 2.2, 3.3],

580

'strings': ['a', 'b', 'c']

581

})

582

583

# Convert to pandas and back

584

pandas_version = original.to_pandas()

585

round_trip = cudf.from_pandas(pandas_version)

586

587

# Should be identical

588

cudf.testing.assert_frame_equal(original, round_trip)

589

```

590

591

## Performance Considerations

592

593

### GPU Testing Efficiency

594

- **Minimize Data Transfer**: Keep test data on GPU when possible

595

- **Batch Assertions**: Combine multiple checks in single test function

596

- **Memory Management**: Use appropriate data sizes for test reproducibility

597

- **Parallel Testing**: Design tests to run independently for parallel execution

598

599

### Best Practices

600

- **Use Appropriate Tolerances**: Set `rtol`/`atol` based on expected precision

601

- **Check Dtypes When Relevant**: Use `check_dtype=True` for type-sensitive tests

602

- **Test Edge Cases**: Include empty DataFrames, NaN values, and boundary conditions

603

- **Cross-Library Compatibility**: Test cuDF results against pandas equivalents

604

- **Memory Cleanup**: Ensure large test objects are properly garbage collected