Tessl Tile for pypi/cudf-cu12@25.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-data-structures.md data-manipulation.md index.md io-operations.md pandas-compatibility.md testing-utilities.md type-checking.md

core-data-structures.mddocs/

0
# Core Data Structures
1

2
cuDF provides GPU-accelerated versions of pandas' core data structures with enhanced capabilities for handling large datasets and complex data types. All structures leverage GPU memory for optimal performance.
3

4
## DataFrame
5

6
The primary data structure for two-dimensional, tabular data with labeled axes.
7

8
```{ .api }
9
class DataFrame:
10
    """
11
    GPU-accelerated DataFrame with pandas-like API
12
    
13
    Two-dimensional, size-mutable, potentially heterogeneous tabular data structure
14
    with labeled axes (rows and columns). Stored in GPU memory with columnar layout
15
    for optimal performance.
16
    
17
    Parameters:
18
        data: dict, list, ndarray, Series, DataFrame, optional
19
            Data to initialize DataFrame from various sources
20
        index: Index or array-like, optional
21
            Index (row labels) for the DataFrame
22
        columns: Index or array-like, optional  
23
            Column labels for the DataFrame
24
        dtype: dtype, optional
25
            Data type to force, otherwise infer
26
        copy: bool, default False
27
            Copy data if True
28
    
29
    Attributes:
30
        index: Index representing row labels
31
        columns: Index representing column labels  
32
        dtypes: Series with column data types
33
        shape: tuple representing DataFrame dimensions
34
        size: int representing total number of elements
35
        ndim: int representing number of dimensions (always 2)
36
        empty: bool indicating if DataFrame is empty
37
        
38
    Examples:
39
        # Create from dictionary
40
        df = cudf.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.1, 6.2]})
41
        
42
        # Create with custom index
43
        df = cudf.DataFrame(
44
            {'x': [1, 2], 'y': [3, 4]},
45
            index=['row1', 'row2']
46
        )
47
    """
48
```
49

50
## Series
51

52
One-dimensional labeled array capable of holding any data type.
53

54
```{ .api }
55
class Series:
56
    """
57
    GPU-accelerated one-dimensional array with axis labels
58
    
59
    One-dimensional ndarray-like object containing an array of data and
60
    associated array of labels, called its index. Optimized for GPU computation
61
    with automatic memory management.
62
    
63
    Parameters:
64
        data: array-like, dict, scalar value
65
            Contains data stored in Series
66
        index: array-like or Index, optional
67
            Values must be hashable and same length as data
68
        dtype: dtype, optional
69
            Data type for the output Series
70
        name: str, optional
71
            Name to give to the Series
72
        copy: bool, default False
73
            Copy input data if True
74
            
75
    Attributes:
76
        index: Index representing the axis labels
77
        dtype: numpy.dtype representing data type
78
        shape: tuple representing Series dimensions  
79
        size: int representing number of elements
80
        ndim: int representing number of dimensions (always 1)
81
        name: str or None representing Series name
82
        values: cupy.ndarray representing underlying data
83
        
84
    Examples:
85
        # Create from list
86
        s = cudf.Series([1, 2, 3, 4, 5])
87
        
88
        # Create with index and name
89
        s = cudf.Series([1.1, 2.2, 3.3], 
90
                       index=['a', 'b', 'c'], 
91
                       name='values')
92
    """
93
```
94

95
## Index Classes
96

97
Immutable sequences used for axis labels and data selection.
98

99
### Base Index
100

101
```{ .api }
102
class Index:
103
    """
104
    Immutable sequence used for axis labels and selection
105
    
106
    Base class for all index types in cuDF. Provides common functionality
107
    for indexing, selection, and alignment operations. GPU-accelerated for
108
    large-scale operations.
109
    
110
    Parameters:
111
        data: array-like (1-D)
112
            Data to create index from
113
        dtype: numpy.dtype, optional
114
            Data type for index
115
        copy: bool, default False
116
            Copy input data if True
117
        name: str, optional
118
            Name for the index
119
            
120
    Attributes:
121
        dtype: numpy.dtype representing data type
122
        shape: tuple representing index dimensions
123
        size: int representing number of elements
124
        ndim: int representing number of dimensions (always 1)
125
        name: str or None representing index name
126
        values: cupy.ndarray representing underlying data
127
        is_unique: bool indicating if all values are unique
128
        
129
    Examples:
130
        # Create from list
131
        idx = cudf.Index([1, 2, 3, 4])
132
        
133
        # Create with name
134
        idx = cudf.Index(['a', 'b', 'c'], name='letters')
135
    """
136
```
137

138
### RangeIndex
139

140
```{ .api }
141
class RangeIndex(Index):
142
    """
143
    Memory-efficient index representing a range of integers
144
    
145
    Immutable index implementing a monotonic integer range. Optimized for
146
    memory efficiency by storing only start, stop, and step values rather
147
    than materializing the entire range.
148
    
149
    Parameters:
150
        start: int, optional (default 0)
151
            Start value of the range
152
        stop: int, optional
153
            Stop value of the range (exclusive)
154
        step: int, optional (default 1)
155
            Step size of the range
156
        name: str, optional
157
            Name for the index
158
            
159
    Attributes:
160
        start: int representing range start
161
        stop: int representing range stop  
162
        step: int representing range step
163
        
164
    Examples:
165
        # Create range index
166
        idx = cudf.RangeIndex(10)  # 0 to 9
167
        idx = cudf.RangeIndex(1, 11, 2)  # 1, 3, 5, 7, 9
168
    """
169
```
170

171
### CategoricalIndex
172

173
```{ .api }
174
class CategoricalIndex(Index):
175
    """
176
    Index for categorical data with GPU acceleration
177
    
178
    Immutable index for categorical data. Provides memory efficiency for
179
    repeated string or numeric values by storing categories and codes
180
    separately. GPU-accelerated for large categorical datasets.
181
    
182
    Parameters:
183
        data: array-like
184
            Categorical data for the index
185
        categories: array-like, optional
186
            Unique categories for the data
187
        ordered: bool, default False
188
            Whether categories have a meaningful order
189
        dtype: CategoricalDtype, optional
190
            Categorical data type
191
        name: str, optional
192
            Name for the index
193
            
194
    Attributes:
195
        categories: Index representing unique categories
196
        codes: cupy.ndarray representing category codes
197
        ordered: bool indicating if categories are ordered
198
        
199
    Examples:
200
        # Create categorical index
201
        idx = cudf.CategoricalIndex(['red', 'blue', 'red', 'green'])
202
        
203
        # With explicit categories  
204
        idx = cudf.CategoricalIndex(
205
            ['small', 'large', 'medium'],
206
            categories=['small', 'medium', 'large'],
207
            ordered=True
208
        )
209
    """
210
```
211

212
### DatetimeIndex
213

214
```{ .api }
215
class DatetimeIndex(Index):
216
    """
217
    Index for datetime values with GPU acceleration
218
    
219
    Immutable index containing datetime64 values. Provides fast temporal
220
    operations and date-based selection. GPU-accelerated for time series
221
    operations on large datasets.
222
    
223
    Parameters:
224
        data: array-like
225
            Datetime-like data for the index
226
        freq: str or DateOffset, optional
227
            Frequency of the datetime data
228
        tz: str or timezone, optional
229
            Timezone for localized datetime index
230
        normalize: bool, default False
231
            Normalize start/end dates to midnight
232
        name: str, optional
233
            Name for the index
234
            
235
    Attributes:
236
        freq: str or None representing frequency
237
        tz: timezone or None representing timezone
238
        year: Series representing year values
239
        month: Series representing month values  
240
        day: Series representing day values
241
        hour: Series representing hour values
242
        minute: Series representing minute values
243
        second: Series representing second values
244
        
245
    Examples:
246
        # Create from date strings
247
        idx = cudf.DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'])
248
        
249
        # With timezone
250
        idx = cudf.DatetimeIndex(
251
            ['2023-01-01', '2023-01-02'], 
252
            tz='UTC'
253
        )
254
    """
255
```
256

257
### TimedeltaIndex
258

259
```{ .api }
260
class TimedeltaIndex(Index):
261
    """
262
    Index for timedelta values with GPU acceleration
263
    
264
    Immutable index containing timedelta64 values. Represents durations
265
    and time differences. GPU-accelerated for temporal arithmetic operations.
266
    
267
    Parameters:
268
        data: array-like
269
            Timedelta-like data for the index
270
        unit: str, optional
271
            Unit of the timedelta data ('D', 'h', 'm', 's', etc.)
272
        freq: str or DateOffset, optional  
273
            Frequency of the timedelta data
274
        name: str, optional
275
            Name for the index
276
            
277
    Attributes:
278
        freq: str or None representing frequency
279
        components: DataFrame with timedelta components
280
        days: Series representing days component
281
        seconds: Series representing seconds component
282
        microseconds: Series representing microseconds component
283
        nanoseconds: Series representing nanoseconds component
284
        
285
    Examples:
286
        # Create from timedelta strings
287
        idx = cudf.TimedeltaIndex(['1 day', '2 hours', '30 minutes'])
288
        
289
        # From numeric values with unit
290
        idx = cudf.TimedeltaIndex([1, 2, 3], unit='D')
291
    """
292
```
293

294
### IntervalIndex
295

296
```{ .api }
297
class IntervalIndex(Index):
298
    """
299
    Index for interval data with GPU acceleration
300
    
301
    Immutable index containing Interval objects. Represents closed, open,
302
    or half-open intervals. GPU-accelerated for interval-based operations
303
    and overlapping queries.
304
    
305
    Parameters:
306
        data: array-like
307
            Interval-like data for the index
308
        closed: str, default 'right'
309
            Whether intervals are closed ('left', 'right', 'both', 'neither')
310
        dtype: IntervalDtype, optional
311
            Interval data type
312
        name: str, optional
313
            Name for the index
314
            
315
    Attributes:
316
        closed: str representing interval closure type
317
        left: Index representing left bounds
318
        right: Index representing right bounds  
319
        mid: Index representing interval midpoints
320
        length: Index representing interval lengths
321
        
322
    Examples:
323
        # Create from arrays
324
        left = [0, 1, 2]
325
        right = [1, 2, 3]
326
        idx = cudf.IntervalIndex.from_arrays(left, right)
327
        
328
        # From tuples
329
        intervals = [(0, 1), (1, 2), (2, 3)]
330
        idx = cudf.IntervalIndex.from_tuples(intervals)
331
    """
332
```
333

334
### MultiIndex
335

336
```{ .api }
337
class MultiIndex(Index):
338
    """
339
    Multi-level/hierarchical index for GPU DataFrames
340
    
341
    Multi-level index object. Represents multiple levels of indexing
342
    on a single axis. GPU-accelerated for hierarchical data operations
343
    and multi-dimensional selections.
344
    
345
    Parameters:
346
        levels: sequence of arrays
347
            Unique labels for each level
348
        codes: sequence of arrays  
349
            Integers for each level indicating label positions
350
        names: sequence of str, optional
351
            Names for each level
352
        
353
    Attributes:
354
        levels: list of Index objects representing each level
355
        codes: list of arrays representing level codes
356
        names: list of str representing level names
357
        nlevels: int representing number of levels
358
        
359
    Examples:
360
        # Create from arrays
361
        arrays = [
362
            ['A', 'A', 'B', 'B'],
363
            [1, 2, 1, 2]
364
        ]
365
        idx = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
366
        
367
        # From tuples
368
        tuples = [('A', 1), ('A', 2), ('B', 1), ('B', 2)]
369
        idx = cudf.MultiIndex.from_tuples(tuples)
370
    """
371
```
372

373
## Data Types
374

375
Extended data type system supporting nested and specialized types.
376

377
### CategoricalDtype
378

379
```{ .api }
380
class CategoricalDtype:
381
    """
382
    Extension dtype for categorical data
383
    
384
    Data type for categorical data with optional ordering. Provides memory
385
    efficiency for repeated values and supports ordered categorical operations.
386
    
387
    Parameters:
388
        categories: Index-like, optional
389
            Unique categories for the data
390
        ordered: bool, default False
391
            Whether categories have meaningful order
392
            
393
    Attributes:
394
        categories: Index representing unique categories
395
        ordered: bool indicating if categories are ordered
396
        
397
    Examples:
398
        # Create categorical dtype
399
        dtype = cudf.CategoricalDtype(['red', 'blue', 'green'])
400
        
401
        # With ordering
402
        dtype = cudf.CategoricalDtype(
403
            ['small', 'medium', 'large'], 
404
            ordered=True
405
        )
406
    """
407
```
408

409
### Decimal Data Types
410

411
```{ .api }
412
class Decimal32Dtype:
413
    """
414
    32-bit fixed-point decimal data type
415
    
416
    Extension dtype for 32-bit decimal numbers with configurable precision
417
    and scale. Provides exact decimal arithmetic without floating-point errors.
418
    
419
    Parameters:
420
        precision: int (1-9)
421
            Total number of digits
422
        scale: int (0-precision)
423
            Number of digits after decimal point
424
            
425
    Examples:
426
        # Create decimal32 dtype
427
        dtype = cudf.Decimal32Dtype(precision=7, scale=2)  # 99999.99 max
428
    """
429

430
class Decimal64Dtype:
431
    """
432
    64-bit fixed-point decimal data type
433
    
434
    Extension dtype for 64-bit decimal numbers with configurable precision
435
    and scale. Provides exact decimal arithmetic for financial calculations.
436
    
437
    Parameters:
438
        precision: int (1-18)
439
            Total number of digits
440
        scale: int (0-precision) 
441
            Number of digits after decimal point
442
            
443
    Examples:
444
        # Create decimal64 dtype
445
        dtype = cudf.Decimal64Dtype(precision=10, scale=4)  # 999999.9999 max
446
    """
447

448
class Decimal128Dtype:
449
    """
450
    128-bit fixed-point decimal data type
451
    
452
    Extension dtype for 128-bit decimal numbers with configurable precision
453
    and scale. Provides highest precision decimal arithmetic.
454
    
455
    Parameters:
456
        precision: int (1-38)
457
            Total number of digits  
458
        scale: int (0-precision)
459
            Number of digits after decimal point
460
            
461
    Examples:
462
        # Create decimal128 dtype  
463
        dtype = cudf.Decimal128Dtype(precision=20, scale=6)
464
    """
465
```
466

467
### Nested Data Types
468

469
```{ .api }
470
class ListDtype:
471
    """
472
    Extension dtype for nested list data
473
    
474
    Data type representing lists of elements where each row can contain
475
    a variable-length list. Supports nested operations and list processing
476
    on GPU.
477
    
478
    Parameters:
479
        element_type: dtype
480
            Data type of list elements
481
            
482
    Attributes:
483
        element_type: dtype representing element data type
484
        
485
    Examples:
486
        # Create list dtype
487
        dtype = cudf.ListDtype('int64')  # Lists of integers
488
        dtype = cudf.ListDtype('float32')  # Lists of floats
489
    """
490

491
class StructDtype:
492
    """
493
    Extension dtype for nested struct data
494
    
495
    Data type representing structured data where each row contains
496
    multiple named fields. Similar to database records or JSON objects.
497
    
498
    Parameters:
499
        fields: dict
500
            Mapping of field names to data types
501
            
502
    Attributes:
503
        fields: dict representing field name to dtype mapping
504
        
505
    Examples:
506
        # Create struct dtype
507
        fields = {'x': 'int64', 'y': 'float64', 'name': 'object'}
508
        dtype = cudf.StructDtype(fields)
509
    """
510
```
511

512
### IntervalDtype
513

514
```{ .api }
515
class IntervalDtype:
516
    """
517
    Extension dtype for interval data
518
    
519
    Data type for interval objects with configurable closure behavior
520
    and subtype. Used for representing ranges and interval-based operations.
521
    
522
    Parameters:
523
        subtype: dtype, optional (default 'float64')
524
            Data type for interval bounds
525
        closed: str, optional (default 'right')
526
            Whether intervals are closed ('left', 'right', 'both', 'neither')
527
            
528
    Attributes:
529
        subtype: dtype representing bounds data type
530
        closed: str representing closure behavior
531
        
532
    Examples:
533
        # Create interval dtype
534
        dtype = cudf.IntervalDtype('int64', closed='both')
535
        dtype = cudf.IntervalDtype('float32', closed='left')
536
    """
537
```
538

539
## Special Values
540

541
Constants for representing missing and special values.
542

543
```{ .api }
544
NA = cudf.NA
545
"""
546
Scalar representation of missing value
547

548
cuDF's representation of a missing value that is compatible across
549
all data types including nested types. Distinct from None and np.nan.
550

551
Examples:
552
    # Create Series with missing values
553
    s = cudf.Series([1, cudf.NA, 3])
554
    
555
    # Check for missing values  
556
    mask = s.isna()  # Returns boolean mask
557
"""
558

559
NaT = cudf.NaT  
560
"""
561
Not-a-Time representation for datetime/timedelta
562

563
Pandas-compatible representation of missing datetime or timedelta values.
564
Used specifically for temporal data types.
565

566
Examples:
567
    # Create datetime series with NaT
568
    dates = cudf.Series(['2023-01-01', cudf.NaT, '2023-01-03'])
569
    dates = cudf.to_datetime(dates)
570
"""
571
```
572

573
## Memory Management
574

575
cuDF data structures leverage RAPIDS Memory Manager (RMM) for optimal GPU memory usage:
576

577
- **Columnar Storage**: Apache Arrow format for cache efficiency
578
- **Memory Pools**: Reduces allocation overhead for frequent operations  
579
- **Zero-Copy**: Minimal data movement between operations
580
- **Automatic Cleanup**: Garbage collection integration for GPU memory
581
- **Memory Mapping**: Support for memory-mapped files
582

583
## Type Conversions
584

585
```python
586
# GPU to CPU conversion
587
df_pandas = cudf_df.to_pandas()
588
series_pandas = cudf_series.to_pandas()
589

590
# CPU to GPU conversion  
591
cudf_df = cudf.from_pandas(pandas_df)
592
cudf_series = cudf.from_pandas(pandas_series)
593

594
# Arrow integration
595
arrow_table = cudf_df.to_arrow()
596
cudf_df = cudf.from_arrow(arrow_table)
597

598
# NumPy/CuPy arrays
599
cupy_array = cudf_series.values  # Get underlying CuPy array
600
cudf_series = cudf.Series(cupy_array)  # Create from CuPy array
601
```
602

603
## Performance Characteristics
604

605
- **Memory Bandwidth**: 10-100x improvement over pandas for large datasets
606
- **Parallel Operations**: Leverages thousands of GPU cores
607
- **Cache Efficiency**: Columnar layout optimizes memory access patterns
608
- **Kernel Fusion**: Combines multiple operations into single GPU kernels
609
- **Lazy Evaluation**: Defers computation until results are needed

Version

Tile

Files

core-data-structures.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

core-data-structures.mddocs/