0
# Testing Utilities
1
2
cuDF provides comprehensive testing utilities for GPU-aware testing of DataFrames, Series, and Index objects. These functions provide detailed comparison capabilities that handle GPU memory, floating-point precision, and cuDF-specific data types.
3
4
## Import Statements
5
6
```python
7
# Core testing functions
8
from cudf.testing import (
9
assert_eq, assert_neq,
10
assert_frame_equal, assert_series_equal, assert_index_equal
11
)
12
13
# For use in test suites
14
import cudf.testing as cudf_testing
15
```
16
17
## Generic Equality Assertions
18
19
Universal equality testing function that handles all cuDF object types.
20
21
```{ .api }
22
def assert_eq(
23
left,
24
right,
25
check_dtype=True,
26
check_exact=False,
27
check_datetimelike_compat=False,
28
check_categorical=True,
29
check_category_order=True,
30
rtol=1e-05,
31
atol=1e-08,
32
**kwargs
33
) -> None:
34
"""
35
Generic equality assertion for cuDF objects with GPU-aware comparison
36
37
Comprehensive equality testing that automatically detects object type
38
and applies appropriate comparison logic. Handles DataFrames, Series,
39
Index objects, and scalar values with GPU memory considerations.
40
41
Parameters:
42
left: cuDF object, pandas object, or scalar
43
Expected result object
44
right: cuDF object, pandas object, or scalar
45
Actual result object
46
check_dtype: bool, default True
47
Whether to check dtype compatibility exactly
48
check_exact: bool, default False
49
Whether to check exact equality (no floating-point tolerance)
50
check_datetimelike_compat: bool, default False
51
Whether to compare datetime-like objects across types
52
check_categorical: bool, default True
53
Whether to check categorical data consistency
54
check_category_order: bool, default True
55
Whether categorical category order must match
56
rtol: float, default 1e-05
57
Relative tolerance for floating-point comparisons
58
atol: float, default 1e-08
59
Absolute tolerance for floating-point comparisons
60
**kwargs: additional arguments
61
Type-specific comparison options
62
63
Raises:
64
AssertionError: If objects are not equal according to specified criteria
65
66
Examples:
67
# DataFrame comparison
68
expected = cudf.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})
69
actual = cudf.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})
70
cudf.testing.assert_eq(expected, actual)
71
72
# Series comparison with tolerance
73
expected = cudf.Series([1.1, 2.2, 3.3])
74
actual = cudf.Series([1.100001, 2.200001, 3.300001])
75
cudf.testing.assert_eq(expected, actual, rtol=1e-4)
76
77
# Mixed cuDF/pandas comparison
78
cudf_series = cudf.Series([1, 2, 3])
79
pandas_series = cudf_series.to_pandas()
80
cudf.testing.assert_eq(cudf_series, pandas_series)
81
82
# Scalar comparison
83
cudf.testing.assert_eq(5, 5)
84
cudf.testing.assert_eq(3.14159, 3.14160, rtol=1e-4)
85
86
# Categorical comparison
87
cat1 = cudf.Series(['a', 'b', 'c'], dtype='category')
88
cat2 = cudf.Series(['a', 'b', 'c'], dtype='category')
89
cudf.testing.assert_eq(cat1, cat2, check_categorical=True)
90
"""
91
92
def assert_neq(
93
left,
94
right,
95
**kwargs
96
) -> None:
97
"""
98
Assert that two objects are not equal
99
100
Inverse of assert_eq - ensures objects are different according to
101
the same comparison criteria used by assert_eq.
102
103
Parameters:
104
left: cuDF object, pandas object, or scalar
105
First object to compare
106
right: cuDF object, pandas object, or scalar
107
Second object to compare
108
**kwargs: additional arguments
109
Passed to underlying comparison functions
110
111
Raises:
112
AssertionError: If objects are equal according to comparison criteria
113
114
Examples:
115
# Different DataFrames
116
df1 = cudf.DataFrame({'A': [1, 2, 3]})
117
df2 = cudf.DataFrame({'A': [4, 5, 6]})
118
cudf.testing.assert_neq(df1, df2)
119
120
# Different dtypes
121
series1 = cudf.Series([1, 2, 3], dtype='int32')
122
series2 = cudf.Series([1, 2, 3], dtype='int64')
123
cudf.testing.assert_neq(series1, series2, check_dtype=True)
124
125
# Different values
126
cudf.testing.assert_neq(5, 6)
127
cudf.testing.assert_neq([1, 2, 3], [1, 2, 4])
128
"""
129
```
130
131
## DataFrame Equality Assertions
132
133
Detailed DataFrame comparison with comprehensive options for handling edge cases.
134
135
```{ .api }
136
def assert_frame_equal(
137
left,
138
right,
139
check_dtype=True,
140
check_index_type=True,
141
check_column_type=True,
142
check_frame_type=True,
143
check_names=True,
144
check_exact=False,
145
check_datetimelike_compat=False,
146
check_categorical=True,
147
check_category_order=True,
148
check_like=False,
149
rtol=1e-05,
150
atol=1e-08,
151
**kwargs
152
) -> None:
153
"""
154
Assert DataFrame equality with comprehensive GPU-aware comparison
155
156
Detailed DataFrame comparison that checks data values, dtypes, indexes,
157
column names, and metadata. Optimized for GPU DataFrames with support
158
for floating-point tolerance and categorical data.
159
160
Parameters:
161
left: DataFrame
162
Expected DataFrame result
163
right: DataFrame
164
Actual DataFrame result
165
check_dtype: bool, default True
166
Whether to check that dtypes match exactly
167
check_index_type: bool, default True
168
Whether to check index type compatibility
169
check_column_type: bool, default True
170
Whether to check column type compatibility
171
check_frame_type: bool, default True
172
Whether to check that both objects are DataFrames
173
check_names: bool, default True
174
Whether to check index and column names match
175
check_exact: bool, default False
176
Whether to use exact equality (no floating-point tolerance)
177
check_datetimelike_compat: bool, default False
178
Whether to allow comparison of different datetime-like types
179
check_categorical: bool, default True
180
Whether to check categorical data consistency
181
check_category_order: bool, default True
182
Whether categorical category order must match exactly
183
check_like: bool, default False
184
Whether to ignore order of index and columns
185
rtol: float, default 1e-05
186
Relative tolerance for floating-point comparison
187
atol: float, default 1e-08
188
Absolute tolerance for floating-point comparison
189
**kwargs: additional arguments
190
Additional comparison options
191
192
Raises:
193
AssertionError: If DataFrames are not equal with detailed diff message
194
195
Examples:
196
# Basic DataFrame comparison
197
expected = cudf.DataFrame({
198
'A': [1, 2, 3],
199
'B': [4.0, 5.0, 6.0],
200
'C': ['x', 'y', 'z']
201
})
202
actual = cudf.DataFrame({
203
'A': [1, 2, 3],
204
'B': [4.0, 5.0, 6.0],
205
'C': ['x', 'y', 'z']
206
})
207
cudf.testing.assert_frame_equal(expected, actual)
208
209
# With custom index
210
expected.index = ['row1', 'row2', 'row3']
211
actual.index = ['row1', 'row2', 'row3']
212
cudf.testing.assert_frame_equal(expected, actual, check_names=True)
213
214
# Floating-point tolerance
215
expected = cudf.DataFrame({'vals': [1.1, 2.2, 3.3]})
216
actual = cudf.DataFrame({'vals': [1.100001, 2.200001, 3.300001]})
217
cudf.testing.assert_frame_equal(expected, actual, rtol=1e-4)
218
219
# Ignore column/index order
220
expected = cudf.DataFrame({'A': [1, 2], 'B': [3, 4]})
221
actual = cudf.DataFrame({'B': [3, 4], 'A': [1, 2]})
222
cudf.testing.assert_frame_equal(expected, actual, check_like=True)
223
224
# Mixed cuDF/pandas comparison
225
cudf_df = cudf.DataFrame({'x': [1, 2, 3]})
226
pandas_df = cudf_df.to_pandas()
227
cudf.testing.assert_frame_equal(cudf_df, pandas_df)
228
229
# Categorical data
230
cat_df1 = cudf.DataFrame({
231
'cat_col': cudf.Series(['a', 'b', 'c'], dtype='category')
232
})
233
cat_df2 = cudf.DataFrame({
234
'cat_col': cudf.Series(['a', 'b', 'c'], dtype='category')
235
})
236
cudf.testing.assert_frame_equal(cat_df1, cat_df2, check_categorical=True)
237
"""
238
```
239
240
## Series Equality Assertions
241
242
Detailed Series comparison with support for all cuDF data types.
243
244
```{ .api }
245
def assert_series_equal(
246
left,
247
right,
248
check_dtype=True,
249
check_index_type=True,
250
check_series_type=True,
251
check_names=True,
252
check_exact=False,
253
check_datetimelike_compat=False,
254
check_categorical=True,
255
check_category_order=True,
256
rtol=1e-05,
257
atol=1e-08,
258
**kwargs
259
) -> None:
260
"""
261
Assert Series equality with GPU-aware detailed comparison
262
263
Comprehensive Series comparison that validates data values, dtype,
264
index, name, and metadata. Handles cuDF-specific data types including
265
nested types (lists, structs) and extension types (decimals).
266
267
Parameters:
268
left: Series
269
Expected Series result
270
right: Series
271
Actual Series result
272
check_dtype: bool, default True
273
Whether to check dtype compatibility exactly
274
check_index_type: bool, default True
275
Whether to check index type compatibility
276
check_series_type: bool, default True
277
Whether to check that both objects are Series
278
check_names: bool, default True
279
Whether to check Series and index names match
280
check_exact: bool, default False
281
Whether to use exact equality (no floating-point tolerance)
282
check_datetimelike_compat: bool, default False
283
Whether to allow comparison of different datetime-like types
284
check_categorical: bool, default True
285
Whether to check categorical data consistency
286
check_category_order: bool, default True
287
Whether categorical category order must match
288
rtol: float, default 1e-05
289
Relative tolerance for floating-point comparison
290
atol: float, default 1e-08
291
Absolute tolerance for floating-point comparison
292
**kwargs: additional arguments
293
Additional comparison options
294
295
Raises:
296
AssertionError: If Series are not equal with detailed diff message
297
298
Examples:
299
# Basic Series comparison
300
expected = cudf.Series([1, 2, 3, 4, 5])
301
actual = cudf.Series([1, 2, 3, 4, 5])
302
cudf.testing.assert_series_equal(expected, actual)
303
304
# With custom index and name
305
expected = cudf.Series([10, 20, 30],
306
index=['a', 'b', 'c'],
307
name='values')
308
actual = cudf.Series([10, 20, 30],
309
index=['a', 'b', 'c'],
310
name='values')
311
cudf.testing.assert_series_equal(expected, actual, check_names=True)
312
313
# Floating-point data with tolerance
314
expected = cudf.Series([1.1, 2.2, 3.3])
315
actual = cudf.Series([1.100001, 2.200001, 3.300001])
316
cudf.testing.assert_series_equal(expected, actual, rtol=1e-4)
317
318
# String data
319
expected = cudf.Series(['hello', 'world', 'cudf'])
320
actual = cudf.Series(['hello', 'world', 'cudf'])
321
cudf.testing.assert_series_equal(expected, actual)
322
323
# Categorical data
324
expected = cudf.Series(['red', 'blue', 'red'], dtype='category')
325
actual = cudf.Series(['red', 'blue', 'red'], dtype='category')
326
cudf.testing.assert_series_equal(expected, actual, check_categorical=True)
327
328
# Datetime data
329
dates = ['2023-01-01', '2023-01-02', '2023-01-03']
330
expected = cudf.to_datetime(cudf.Series(dates))
331
actual = cudf.to_datetime(cudf.Series(dates))
332
cudf.testing.assert_series_equal(expected, actual)
333
334
# List data (nested type)
335
expected = cudf.Series([[1, 2], [3, 4, 5], [6]])
336
actual = cudf.Series([[1, 2], [3, 4, 5], [6]])
337
cudf.testing.assert_series_equal(expected, actual)
338
339
# Decimal data
340
decimal_dtype = cudf.Decimal64Dtype(10, 2)
341
expected = cudf.Series([1.23, 4.56], dtype=decimal_dtype)
342
actual = cudf.Series([1.23, 4.56], dtype=decimal_dtype)
343
cudf.testing.assert_series_equal(expected, actual, check_exact=True)
344
"""
345
```
346
347
## Index Equality Assertions
348
349
Comprehensive Index comparison for all cuDF Index types.
350
351
```{ .api }
352
def assert_index_equal(
353
left,
354
right,
355
exact='equiv',
356
check_names=True,
357
check_exact=False,
358
check_categorical=True,
359
check_order=True,
360
rtol=1e-05,
361
atol=1e-08,
362
**kwargs
363
) -> None:
364
"""
365
Assert Index equality with support for all cuDF Index types
366
367
Detailed comparison of Index objects including RangeIndex, DatetimeIndex,
368
CategoricalIndex, MultiIndex, and other specialized Index types.
369
370
Parameters:
371
left: Index
372
Expected Index result
373
right: Index
374
Actual Index result
375
exact: str or bool, default 'equiv'
376
Level of exactness ('equiv' for equivalent, True for exact, False for basic)
377
check_names: bool, default True
378
Whether to check Index name compatibility
379
check_exact: bool, default False
380
Whether to use exact equality (no floating-point tolerance)
381
check_categorical: bool, default True
382
Whether to check categorical index data consistency
383
check_order: bool, default True
384
Whether to check that order of elements matches
385
rtol: float, default 1e-05
386
Relative tolerance for floating-point comparison
387
atol: float, default 1e-08
388
Absolute tolerance for floating-point comparison
389
**kwargs: additional arguments
390
Index-type specific comparison options
391
392
Raises:
393
AssertionError: If indexes are not equal with detailed diff message
394
395
Examples:
396
# Basic Index comparison
397
expected = cudf.Index([1, 2, 3, 4, 5])
398
actual = cudf.Index([1, 2, 3, 4, 5])
399
cudf.testing.assert_index_equal(expected, actual)
400
401
# Named Index
402
expected = cudf.Index([10, 20, 30], name='values')
403
actual = cudf.Index([10, 20, 30], name='values')
404
cudf.testing.assert_index_equal(expected, actual, check_names=True)
405
406
# RangeIndex comparison
407
expected = cudf.RangeIndex(10) # 0-9
408
actual = cudf.RangeIndex(start=0, stop=10, step=1)
409
cudf.testing.assert_index_equal(expected, actual)
410
411
# DatetimeIndex comparison
412
dates = ['2023-01-01', '2023-01-02', '2023-01-03']
413
expected = cudf.DatetimeIndex(dates)
414
actual = cudf.DatetimeIndex(dates)
415
cudf.testing.assert_index_equal(expected, actual)
416
417
# CategoricalIndex comparison
418
categories = ['red', 'blue', 'green']
419
expected = cudf.CategoricalIndex(['red', 'blue', 'red'])
420
actual = cudf.CategoricalIndex(['red', 'blue', 'red'])
421
cudf.testing.assert_index_equal(expected, actual, check_categorical=True)
422
423
# MultiIndex comparison
424
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
425
expected = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
426
actual = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
427
cudf.testing.assert_index_equal(expected, actual, check_names=True)
428
429
# IntervalIndex comparison
430
expected = cudf.interval_range(0, 10, periods=5)
431
actual = cudf.interval_range(0, 10, periods=5)
432
cudf.testing.assert_index_equal(expected, actual)
433
434
# Float Index with tolerance
435
expected = cudf.Index([1.1, 2.2, 3.3])
436
actual = cudf.Index([1.100001, 2.200001, 3.300001])
437
cudf.testing.assert_index_equal(expected, actual, rtol=1e-4)
438
"""
439
```
440
441
## Advanced Testing Patterns
442
443
### Parameterized Testing
444
445
```python
446
import pytest
447
import cudf
448
import cudf.testing
449
450
class TestDataFrameOperations:
451
"""Example test class using cuDF testing utilities"""
452
453
@pytest.mark.parametrize("data", [
454
{'A': [1, 2, 3], 'B': [4, 5, 6]},
455
{'x': [1.1, 2.2], 'y': [3.3, 4.4]},
456
{'str_col': ['a', 'b', 'c']}
457
])
458
def test_dataframe_creation(self, data):
459
"""Test DataFrame creation with various data types"""
460
df = cudf.DataFrame(data)
461
expected = cudf.DataFrame(data)
462
cudf.testing.assert_frame_equal(df, expected)
463
464
@pytest.mark.parametrize("dtype", ['int32', 'int64', 'float32', 'float64'])
465
def test_series_dtypes(self, dtype):
466
"""Test Series with different numeric dtypes"""
467
data = [1, 2, 3, 4, 5]
468
series = cudf.Series(data, dtype=dtype)
469
expected = cudf.Series(data, dtype=dtype)
470
cudf.testing.assert_series_equal(series, expected, check_dtype=True)
471
```
472
473
### GPU Memory Testing
474
475
```python
476
import cudf
477
import cudf.testing
478
479
def test_large_dataframe_operations():
480
"""Test operations on large DataFrames that require GPU memory management"""
481
482
# Create large DataFrame
483
n_rows = 1_000_000
484
df = cudf.DataFrame({
485
'A': range(n_rows),
486
'B': range(n_rows, 2 * n_rows),
487
'C': [f'str_{i}' for i in range(n_rows)]
488
})
489
490
# Perform operations and verify results
491
grouped = df.groupby('A').sum()
492
expected_b_sum = df['B'].sum() # All B values summed
493
494
# Use testing utilities to verify
495
assert len(grouped) <= n_rows # Sanity check
496
cudf.testing.assert_eq(grouped['B'].sum(), expected_b_sum)
497
498
def test_memory_efficient_operations():
499
"""Test that operations don't unnecessarily copy GPU memory"""
500
original_df = cudf.DataFrame({'x': range(100000)})
501
502
# Operation that should not copy data
503
view_df = original_df[['x']] # Column selection
504
505
# Verify data is shared (same underlying GPU memory)
506
# Note: Actual memory sharing verification would require
507
# more sophisticated GPU memory inspection
508
cudf.testing.assert_series_equal(original_df['x'], view_df['x'])
509
```
510
511
### Error Condition Testing
512
513
```python
514
import pytest
515
import cudf
516
import cudf.testing
517
518
def test_assertion_errors():
519
"""Test that assertion functions properly raise errors for different data"""
520
521
df1 = cudf.DataFrame({'A': [1, 2, 3]})
522
df2 = cudf.DataFrame({'A': [4, 5, 6]})
523
524
# This should raise AssertionError
525
with pytest.raises(AssertionError):
526
cudf.testing.assert_frame_equal(df1, df2)
527
528
# Test dtype mismatch
529
series1 = cudf.Series([1, 2, 3], dtype='int32')
530
series2 = cudf.Series([1, 2, 3], dtype='int64')
531
532
with pytest.raises(AssertionError):
533
cudf.testing.assert_series_equal(series1, series2, check_dtype=True)
534
535
# But should pass without dtype checking
536
cudf.testing.assert_series_equal(series1, series2, check_dtype=False)
537
538
def test_tolerance_behavior():
539
"""Test floating-point tolerance behavior"""
540
541
# Within tolerance - should pass
542
series1 = cudf.Series([1.0, 2.0, 3.0])
543
series2 = cudf.Series([1.0000001, 2.0000001, 3.0000001])
544
cudf.testing.assert_series_equal(series1, series2, rtol=1e-6)
545
546
# Outside tolerance - should fail
547
series3 = cudf.Series([1.1, 2.1, 3.1])
548
with pytest.raises(AssertionError):
549
cudf.testing.assert_series_equal(series1, series3, rtol=1e-6)
550
```
551
552
### Cross-Platform Testing
553
554
```python
555
import cudf
556
import pandas as pd
557
import cudf.testing
558
559
def test_cudf_pandas_compatibility():
560
"""Test that cuDF and pandas produce equivalent results"""
561
562
# Create equivalent data in both libraries
563
data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
564
cudf_df = cudf.DataFrame(data)
565
pandas_df = pd.DataFrame(data)
566
567
# Perform same operation on both
568
cudf_result = cudf_df.groupby('A').sum()
569
pandas_result = pandas_df.groupby('A').sum()
570
571
# Compare results (cuDF testing handles cross-library comparison)
572
cudf.testing.assert_frame_equal(cudf_result, pandas_result)
573
574
def test_round_trip_conversion():
575
"""Test cuDF -> pandas -> cuDF conversion preserves data"""
576
577
original = cudf.DataFrame({
578
'ints': [1, 2, 3],
579
'floats': [1.1, 2.2, 3.3],
580
'strings': ['a', 'b', 'c']
581
})
582
583
# Convert to pandas and back
584
pandas_version = original.to_pandas()
585
round_trip = cudf.from_pandas(pandas_version)
586
587
# Should be identical
588
cudf.testing.assert_frame_equal(original, round_trip)
589
```
590
591
## Performance Considerations
592
593
### GPU Testing Efficiency
594
- **Minimize Data Transfer**: Keep test data on GPU when possible
595
- **Batch Assertions**: Combine multiple checks in single test function
596
- **Memory Management**: Use appropriate data sizes for test reproducibility
597
- **Parallel Testing**: Design tests to run independently for parallel execution
598
599
### Best Practices
600
- **Use Appropriate Tolerances**: Set `rtol`/`atol` based on expected precision
601
- **Check Dtypes When Relevant**: Use `check_dtype=True` for type-sensitive tests
602
- **Test Edge Cases**: Include empty DataFrames, NaN values, and boundary conditions
603
- **Cross-Library Compatibility**: Test cuDF results against pandas equivalents
604
- **Memory Cleanup**: Ensure large test objects are properly garbage collected