Tessl Tile for pypi/cudf-cu12@25.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-data-structures.md data-manipulation.md index.md io-operations.md pandas-compatibility.md testing-utilities.md type-checking.md

testing-utilities.mddocs/

0
# Testing Utilities
1

2
cuDF provides comprehensive testing utilities for GPU-aware testing of DataFrames, Series, and Index objects. These functions provide detailed comparison capabilities that handle GPU memory, floating-point precision, and cuDF-specific data types.
3

4
## Import Statements
5

6
```python
7
# Core testing functions
8
from cudf.testing import (
9
    assert_eq, assert_neq,
10
    assert_frame_equal, assert_series_equal, assert_index_equal
11
)
12

13
# For use in test suites
14
import cudf.testing as cudf_testing
15
```
16

17
## Generic Equality Assertions
18

19
Universal equality testing function that handles all cuDF object types.
20

21
```{ .api }
22
def assert_eq(
23
    left,
24
    right,
25
    check_dtype=True,
26
    check_exact=False,
27
    check_datetimelike_compat=False,
28
    check_categorical=True,
29
    check_category_order=True,
30
    rtol=1e-05,
31
    atol=1e-08,
32
    **kwargs
33
) -> None:
34
    """
35
    Generic equality assertion for cuDF objects with GPU-aware comparison
36
    
37
    Comprehensive equality testing that automatically detects object type
38
    and applies appropriate comparison logic. Handles DataFrames, Series,
39
    Index objects, and scalar values with GPU memory considerations.
40
    
41
    Parameters:
42
        left: cuDF object, pandas object, or scalar
43
            Expected result object
44
        right: cuDF object, pandas object, or scalar  
45
            Actual result object
46
        check_dtype: bool, default True
47
            Whether to check dtype compatibility exactly
48
        check_exact: bool, default False
49
            Whether to check exact equality (no floating-point tolerance)
50
        check_datetimelike_compat: bool, default False
51
            Whether to compare datetime-like objects across types
52
        check_categorical: bool, default True
53
            Whether to check categorical data consistency
54
        check_category_order: bool, default True
55
            Whether categorical category order must match
56
        rtol: float, default 1e-05
57
            Relative tolerance for floating-point comparisons
58
        atol: float, default 1e-08
59
            Absolute tolerance for floating-point comparisons
60
        **kwargs: additional arguments
61
            Type-specific comparison options
62
            
63
    Raises:
64
        AssertionError: If objects are not equal according to specified criteria
65
        
66
    Examples:
67
        # DataFrame comparison
68
        expected = cudf.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})
69
        actual = cudf.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})
70
        cudf.testing.assert_eq(expected, actual)
71
        
72
        # Series comparison with tolerance
73
        expected = cudf.Series([1.1, 2.2, 3.3])
74
        actual = cudf.Series([1.100001, 2.200001, 3.300001])
75
        cudf.testing.assert_eq(expected, actual, rtol=1e-4)
76
        
77
        # Mixed cuDF/pandas comparison
78
        cudf_series = cudf.Series([1, 2, 3])
79
        pandas_series = cudf_series.to_pandas()
80
        cudf.testing.assert_eq(cudf_series, pandas_series)
81
        
82
        # Scalar comparison
83
        cudf.testing.assert_eq(5, 5)
84
        cudf.testing.assert_eq(3.14159, 3.14160, rtol=1e-4)
85
        
86
        # Categorical comparison
87
        cat1 = cudf.Series(['a', 'b', 'c'], dtype='category')
88
        cat2 = cudf.Series(['a', 'b', 'c'], dtype='category') 
89
        cudf.testing.assert_eq(cat1, cat2, check_categorical=True)
90
    """
91

92
def assert_neq(
93
    left,
94
    right,
95
    **kwargs
96
) -> None:
97
    """
98
    Assert that two objects are not equal
99
    
100
    Inverse of assert_eq - ensures objects are different according to
101
    the same comparison criteria used by assert_eq.
102
    
103
    Parameters:
104
        left: cuDF object, pandas object, or scalar
105
            First object to compare
106
        right: cuDF object, pandas object, or scalar
107
            Second object to compare  
108
        **kwargs: additional arguments
109
            Passed to underlying comparison functions
110
            
111
    Raises:
112
        AssertionError: If objects are equal according to comparison criteria
113
        
114
    Examples:
115
        # Different DataFrames
116
        df1 = cudf.DataFrame({'A': [1, 2, 3]})
117
        df2 = cudf.DataFrame({'A': [4, 5, 6]})
118
        cudf.testing.assert_neq(df1, df2)
119
        
120
        # Different dtypes
121
        series1 = cudf.Series([1, 2, 3], dtype='int32')
122
        series2 = cudf.Series([1, 2, 3], dtype='int64')
123
        cudf.testing.assert_neq(series1, series2, check_dtype=True)
124
        
125
        # Different values
126
        cudf.testing.assert_neq(5, 6)
127
        cudf.testing.assert_neq([1, 2, 3], [1, 2, 4])
128
    """
129
```
130

131
## DataFrame Equality Assertions
132

133
Detailed DataFrame comparison with comprehensive options for handling edge cases.
134

135
```{ .api }
136
def assert_frame_equal(
137
    left,
138
    right,
139
    check_dtype=True,
140
    check_index_type=True,
141
    check_column_type=True,
142
    check_frame_type=True,
143
    check_names=True,
144
    check_exact=False,
145
    check_datetimelike_compat=False,
146
    check_categorical=True,
147
    check_category_order=True,
148
    check_like=False,
149
    rtol=1e-05,
150
    atol=1e-08,
151
    **kwargs
152
) -> None:
153
    """
154
    Assert DataFrame equality with comprehensive GPU-aware comparison
155
    
156
    Detailed DataFrame comparison that checks data values, dtypes, indexes,
157
    column names, and metadata. Optimized for GPU DataFrames with support
158
    for floating-point tolerance and categorical data.
159
    
160
    Parameters:
161
        left: DataFrame
162
            Expected DataFrame result
163
        right: DataFrame
164
            Actual DataFrame result
165
        check_dtype: bool, default True
166
            Whether to check that dtypes match exactly
167
        check_index_type: bool, default True
168
            Whether to check index type compatibility
169
        check_column_type: bool, default True
170
            Whether to check column type compatibility
171
        check_frame_type: bool, default True
172
            Whether to check that both objects are DataFrames
173
        check_names: bool, default True
174
            Whether to check index and column names match
175
        check_exact: bool, default False
176
            Whether to use exact equality (no floating-point tolerance)
177
        check_datetimelike_compat: bool, default False
178
            Whether to allow comparison of different datetime-like types
179
        check_categorical: bool, default True
180
            Whether to check categorical data consistency
181
        check_category_order: bool, default True
182
            Whether categorical category order must match exactly
183
        check_like: bool, default False
184
            Whether to ignore order of index and columns
185
        rtol: float, default 1e-05
186
            Relative tolerance for floating-point comparison
187
        atol: float, default 1e-08
188
            Absolute tolerance for floating-point comparison
189
        **kwargs: additional arguments
190
            Additional comparison options
191
            
192
    Raises:
193
        AssertionError: If DataFrames are not equal with detailed diff message
194
        
195
    Examples:
196
        # Basic DataFrame comparison
197
        expected = cudf.DataFrame({
198
            'A': [1, 2, 3],
199
            'B': [4.0, 5.0, 6.0],
200
            'C': ['x', 'y', 'z']
201
        })
202
        actual = cudf.DataFrame({
203
            'A': [1, 2, 3], 
204
            'B': [4.0, 5.0, 6.0],
205
            'C': ['x', 'y', 'z']
206
        })
207
        cudf.testing.assert_frame_equal(expected, actual)
208
        
209
        # With custom index
210
        expected.index = ['row1', 'row2', 'row3']
211
        actual.index = ['row1', 'row2', 'row3']
212
        cudf.testing.assert_frame_equal(expected, actual, check_names=True)
213
        
214
        # Floating-point tolerance
215
        expected = cudf.DataFrame({'vals': [1.1, 2.2, 3.3]})
216
        actual = cudf.DataFrame({'vals': [1.100001, 2.200001, 3.300001]})
217
        cudf.testing.assert_frame_equal(expected, actual, rtol=1e-4)
218
        
219
        # Ignore column/index order
220
        expected = cudf.DataFrame({'A': [1, 2], 'B': [3, 4]})
221
        actual = cudf.DataFrame({'B': [3, 4], 'A': [1, 2]})
222
        cudf.testing.assert_frame_equal(expected, actual, check_like=True)
223
        
224
        # Mixed cuDF/pandas comparison
225
        cudf_df = cudf.DataFrame({'x': [1, 2, 3]})
226
        pandas_df = cudf_df.to_pandas()
227
        cudf.testing.assert_frame_equal(cudf_df, pandas_df)
228
        
229
        # Categorical data
230
        cat_df1 = cudf.DataFrame({
231
            'cat_col': cudf.Series(['a', 'b', 'c'], dtype='category')
232
        })
233
        cat_df2 = cudf.DataFrame({
234
            'cat_col': cudf.Series(['a', 'b', 'c'], dtype='category')
235
        })
236
        cudf.testing.assert_frame_equal(cat_df1, cat_df2, check_categorical=True)
237
    """
238
```
239

240
## Series Equality Assertions
241

242
Detailed Series comparison with support for all cuDF data types.
243

244
```{ .api }
245
def assert_series_equal(
246
    left,
247
    right,
248
    check_dtype=True,
249
    check_index_type=True,
250
    check_series_type=True,
251
    check_names=True,
252
    check_exact=False,
253
    check_datetimelike_compat=False,
254
    check_categorical=True,
255
    check_category_order=True,
256
    rtol=1e-05,
257
    atol=1e-08,
258
    **kwargs
259
) -> None:
260
    """
261
    Assert Series equality with GPU-aware detailed comparison
262
    
263
    Comprehensive Series comparison that validates data values, dtype,
264
    index, name, and metadata. Handles cuDF-specific data types including
265
    nested types (lists, structs) and extension types (decimals).
266
    
267
    Parameters:
268
        left: Series
269
            Expected Series result
270
        right: Series
271
            Actual Series result
272
        check_dtype: bool, default True
273
            Whether to check dtype compatibility exactly
274
        check_index_type: bool, default True  
275
            Whether to check index type compatibility
276
        check_series_type: bool, default True
277
            Whether to check that both objects are Series
278
        check_names: bool, default True
279
            Whether to check Series and index names match
280
        check_exact: bool, default False
281
            Whether to use exact equality (no floating-point tolerance)
282
        check_datetimelike_compat: bool, default False
283
            Whether to allow comparison of different datetime-like types
284
        check_categorical: bool, default True
285
            Whether to check categorical data consistency
286
        check_category_order: bool, default True
287
            Whether categorical category order must match
288
        rtol: float, default 1e-05
289
            Relative tolerance for floating-point comparison
290
        atol: float, default 1e-08
291
            Absolute tolerance for floating-point comparison
292
        **kwargs: additional arguments
293
            Additional comparison options
294
            
295
    Raises:
296
        AssertionError: If Series are not equal with detailed diff message
297
        
298
    Examples:
299
        # Basic Series comparison
300
        expected = cudf.Series([1, 2, 3, 4, 5])
301
        actual = cudf.Series([1, 2, 3, 4, 5])
302
        cudf.testing.assert_series_equal(expected, actual)
303
        
304
        # With custom index and name
305
        expected = cudf.Series([10, 20, 30], 
306
                              index=['a', 'b', 'c'], 
307
                              name='values')
308
        actual = cudf.Series([10, 20, 30],
309
                            index=['a', 'b', 'c'],
310
                            name='values')
311
        cudf.testing.assert_series_equal(expected, actual, check_names=True)
312
        
313
        # Floating-point data with tolerance
314
        expected = cudf.Series([1.1, 2.2, 3.3])
315
        actual = cudf.Series([1.100001, 2.200001, 3.300001])
316
        cudf.testing.assert_series_equal(expected, actual, rtol=1e-4)
317
        
318
        # String data
319
        expected = cudf.Series(['hello', 'world', 'cudf'])
320
        actual = cudf.Series(['hello', 'world', 'cudf'])
321
        cudf.testing.assert_series_equal(expected, actual)
322
        
323
        # Categorical data
324
        expected = cudf.Series(['red', 'blue', 'red'], dtype='category')
325
        actual = cudf.Series(['red', 'blue', 'red'], dtype='category')
326
        cudf.testing.assert_series_equal(expected, actual, check_categorical=True)
327
        
328
        # Datetime data
329
        dates = ['2023-01-01', '2023-01-02', '2023-01-03']
330
        expected = cudf.to_datetime(cudf.Series(dates))
331
        actual = cudf.to_datetime(cudf.Series(dates))
332
        cudf.testing.assert_series_equal(expected, actual)
333
        
334
        # List data (nested type)
335
        expected = cudf.Series([[1, 2], [3, 4, 5], [6]])
336
        actual = cudf.Series([[1, 2], [3, 4, 5], [6]])
337
        cudf.testing.assert_series_equal(expected, actual)
338
        
339
        # Decimal data  
340
        decimal_dtype = cudf.Decimal64Dtype(10, 2)
341
        expected = cudf.Series([1.23, 4.56], dtype=decimal_dtype)
342
        actual = cudf.Series([1.23, 4.56], dtype=decimal_dtype)
343
        cudf.testing.assert_series_equal(expected, actual, check_exact=True)
344
    """
345
```
346

347
## Index Equality Assertions
348

349
Comprehensive Index comparison for all cuDF Index types.
350

351
```{ .api }
352
def assert_index_equal(
353
    left,
354
    right,
355
    exact='equiv',
356
    check_names=True,
357
    check_exact=False,
358
    check_categorical=True,
359
    check_order=True,
360
    rtol=1e-05,
361
    atol=1e-08,
362
    **kwargs
363
) -> None:
364
    """
365
    Assert Index equality with support for all cuDF Index types
366
    
367
    Detailed comparison of Index objects including RangeIndex, DatetimeIndex,
368
    CategoricalIndex, MultiIndex, and other specialized Index types.
369
    
370
    Parameters:
371
        left: Index
372
            Expected Index result
373
        right: Index
374
            Actual Index result
375
        exact: str or bool, default 'equiv'
376
            Level of exactness ('equiv' for equivalent, True for exact, False for basic)
377
        check_names: bool, default True
378
            Whether to check Index name compatibility
379
        check_exact: bool, default False
380
            Whether to use exact equality (no floating-point tolerance)
381
        check_categorical: bool, default True
382
            Whether to check categorical index data consistency
383
        check_order: bool, default True
384
            Whether to check that order of elements matches
385
        rtol: float, default 1e-05
386
            Relative tolerance for floating-point comparison  
387
        atol: float, default 1e-08
388
            Absolute tolerance for floating-point comparison
389
        **kwargs: additional arguments
390
            Index-type specific comparison options
391
            
392
    Raises:
393
        AssertionError: If indexes are not equal with detailed diff message
394
        
395
    Examples:
396
        # Basic Index comparison
397
        expected = cudf.Index([1, 2, 3, 4, 5])
398
        actual = cudf.Index([1, 2, 3, 4, 5])
399
        cudf.testing.assert_index_equal(expected, actual)
400
        
401
        # Named Index
402
        expected = cudf.Index([10, 20, 30], name='values')
403
        actual = cudf.Index([10, 20, 30], name='values')
404
        cudf.testing.assert_index_equal(expected, actual, check_names=True)
405
        
406
        # RangeIndex comparison
407
        expected = cudf.RangeIndex(10)  # 0-9
408
        actual = cudf.RangeIndex(start=0, stop=10, step=1)
409
        cudf.testing.assert_index_equal(expected, actual)
410
        
411
        # DatetimeIndex comparison
412
        dates = ['2023-01-01', '2023-01-02', '2023-01-03']
413
        expected = cudf.DatetimeIndex(dates)
414
        actual = cudf.DatetimeIndex(dates)
415
        cudf.testing.assert_index_equal(expected, actual)
416
        
417
        # CategoricalIndex comparison
418
        categories = ['red', 'blue', 'green']
419
        expected = cudf.CategoricalIndex(['red', 'blue', 'red'])
420
        actual = cudf.CategoricalIndex(['red', 'blue', 'red'])
421
        cudf.testing.assert_index_equal(expected, actual, check_categorical=True)
422
        
423
        # MultiIndex comparison
424
        arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
425
        expected = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
426
        actual = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
427
        cudf.testing.assert_index_equal(expected, actual, check_names=True)
428
        
429
        # IntervalIndex comparison  
430
        expected = cudf.interval_range(0, 10, periods=5)
431
        actual = cudf.interval_range(0, 10, periods=5)
432
        cudf.testing.assert_index_equal(expected, actual)
433
        
434
        # Float Index with tolerance
435
        expected = cudf.Index([1.1, 2.2, 3.3])
436
        actual = cudf.Index([1.100001, 2.200001, 3.300001])
437
        cudf.testing.assert_index_equal(expected, actual, rtol=1e-4)
438
    """
439
```
440

441
## Advanced Testing Patterns
442

443
### Parameterized Testing
444

445
```python
446
import pytest
447
import cudf
448
import cudf.testing
449

450
class TestDataFrameOperations:
451
    """Example test class using cuDF testing utilities"""
452
    
453
    @pytest.mark.parametrize("data", [
454
        {'A': [1, 2, 3], 'B': [4, 5, 6]},
455
        {'x': [1.1, 2.2], 'y': [3.3, 4.4]},
456
        {'str_col': ['a', 'b', 'c']}
457
    ])
458
    def test_dataframe_creation(self, data):
459
        """Test DataFrame creation with various data types"""
460
        df = cudf.DataFrame(data)
461
        expected = cudf.DataFrame(data)
462
        cudf.testing.assert_frame_equal(df, expected)
463
        
464
    @pytest.mark.parametrize("dtype", ['int32', 'int64', 'float32', 'float64'])
465
    def test_series_dtypes(self, dtype):
466
        """Test Series with different numeric dtypes"""
467
        data = [1, 2, 3, 4, 5]
468
        series = cudf.Series(data, dtype=dtype)
469
        expected = cudf.Series(data, dtype=dtype)
470
        cudf.testing.assert_series_equal(series, expected, check_dtype=True)
471
```
472

473
### GPU Memory Testing
474

475
```python
476
import cudf
477
import cudf.testing
478

479
def test_large_dataframe_operations():
480
    """Test operations on large DataFrames that require GPU memory management"""
481
    
482
    # Create large DataFrame
483
    n_rows = 1_000_000
484
    df = cudf.DataFrame({
485
        'A': range(n_rows),
486
        'B': range(n_rows, 2 * n_rows),
487
        'C': [f'str_{i}' for i in range(n_rows)]
488
    })
489
    
490
    # Perform operations and verify results
491
    grouped = df.groupby('A').sum()
492
    expected_b_sum = df['B'].sum()  # All B values summed
493
    
494
    # Use testing utilities to verify
495
    assert len(grouped) <= n_rows  # Sanity check
496
    cudf.testing.assert_eq(grouped['B'].sum(), expected_b_sum)
497

498
def test_memory_efficient_operations():
499
    """Test that operations don't unnecessarily copy GPU memory"""
500
    original_df = cudf.DataFrame({'x': range(100000)})
501
    
502
    # Operation that should not copy data
503
    view_df = original_df[['x']]  # Column selection
504
    
505
    # Verify data is shared (same underlying GPU memory)
506
    # Note: Actual memory sharing verification would require 
507
    # more sophisticated GPU memory inspection
508
    cudf.testing.assert_series_equal(original_df['x'], view_df['x'])
509
```
510

511
### Error Condition Testing
512

513
```python
514
import pytest
515
import cudf
516
import cudf.testing
517

518
def test_assertion_errors():
519
    """Test that assertion functions properly raise errors for different data"""
520
    
521
    df1 = cudf.DataFrame({'A': [1, 2, 3]})
522
    df2 = cudf.DataFrame({'A': [4, 5, 6]})
523
    
524
    # This should raise AssertionError
525
    with pytest.raises(AssertionError):
526
        cudf.testing.assert_frame_equal(df1, df2)
527
    
528
    # Test dtype mismatch
529
    series1 = cudf.Series([1, 2, 3], dtype='int32')
530
    series2 = cudf.Series([1, 2, 3], dtype='int64')
531
    
532
    with pytest.raises(AssertionError):
533
        cudf.testing.assert_series_equal(series1, series2, check_dtype=True)
534
    
535
    # But should pass without dtype checking
536
    cudf.testing.assert_series_equal(series1, series2, check_dtype=False)
537

538
def test_tolerance_behavior():
539
    """Test floating-point tolerance behavior"""
540
    
541
    # Within tolerance - should pass
542
    series1 = cudf.Series([1.0, 2.0, 3.0])
543
    series2 = cudf.Series([1.0000001, 2.0000001, 3.0000001])
544
    cudf.testing.assert_series_equal(series1, series2, rtol=1e-6)
545
    
546
    # Outside tolerance - should fail
547
    series3 = cudf.Series([1.1, 2.1, 3.1])
548
    with pytest.raises(AssertionError):
549
        cudf.testing.assert_series_equal(series1, series3, rtol=1e-6)
550
```
551

552
### Cross-Platform Testing
553

554
```python
555
import cudf
556
import pandas as pd
557
import cudf.testing
558

559
def test_cudf_pandas_compatibility():
560
    """Test that cuDF and pandas produce equivalent results"""
561
    
562
    # Create equivalent data in both libraries
563
    data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
564
    cudf_df = cudf.DataFrame(data)
565
    pandas_df = pd.DataFrame(data)
566
    
567
    # Perform same operation on both
568
    cudf_result = cudf_df.groupby('A').sum()
569
    pandas_result = pandas_df.groupby('A').sum()
570
    
571
    # Compare results (cuDF testing handles cross-library comparison)
572
    cudf.testing.assert_frame_equal(cudf_result, pandas_result)
573
    
574
def test_round_trip_conversion():
575
    """Test cuDF -> pandas -> cuDF conversion preserves data"""
576
    
577
    original = cudf.DataFrame({
578
        'ints': [1, 2, 3],
579
        'floats': [1.1, 2.2, 3.3],
580
        'strings': ['a', 'b', 'c']
581
    })
582
    
583
    # Convert to pandas and back
584
    pandas_version = original.to_pandas()
585
    round_trip = cudf.from_pandas(pandas_version)
586
    
587
    # Should be identical
588
    cudf.testing.assert_frame_equal(original, round_trip)
589
```
590

591
## Performance Considerations
592

593
### GPU Testing Efficiency
594
- **Minimize Data Transfer**: Keep test data on GPU when possible
595
- **Batch Assertions**: Combine multiple checks in single test function
596
- **Memory Management**: Use appropriate data sizes for test reproducibility
597
- **Parallel Testing**: Design tests to run independently for parallel execution
598

599
### Best Practices
600
- **Use Appropriate Tolerances**: Set `rtol`/`atol` based on expected precision
601
- **Check Dtypes When Relevant**: Use `check_dtype=True` for type-sensitive tests
602
- **Test Edge Cases**: Include empty DataFrames, NaN values, and boundary conditions
603
- **Cross-Library Compatibility**: Test cuDF results against pandas equivalents
604
- **Memory Cleanup**: Ensure large test objects are properly garbage collected

Version

Tile

Files

testing-utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

testing-utilities.mddocs/