Tessl Tile for pypi/cudf-cu12@25.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-data-structures.md data-manipulation.md index.md io-operations.md pandas-compatibility.md testing-utilities.md type-checking.md

data-manipulation.mddocs/

0
# Data Manipulation
1

2
cuDF provides GPU-accelerated operations for reshaping, joining, aggregating, and transforming data. All operations leverage GPU parallelism for optimal performance on large datasets.
3

4
## Import Statements
5

6
```python
7
# Core manipulation functions
8
from cudf import concat, merge, pivot, pivot_table, melt, crosstab
9
from cudf import unstack, get_dummies
10

11
# Algorithm functions
12
from cudf import factorize, unique, cut
13

14
# Time/date operations
15
from cudf import date_range, to_datetime, interval_range, DateOffset
16
from cudf import to_numeric
17

18
# Groupby operations
19
from cudf import Grouper, NamedAgg
20
```
21

22
## Concatenation
23

24
Combine cuDF objects along axes with flexible alignment and indexing options.
25

26
```{ .api }
27
def concat(
28
    objs,
29
    axis=0,
30
    join='outer',
31
    ignore_index=False,
32
    keys=None,
33
    levels=None,
34
    names=None,
35
    verify_integrity=False,
36
    sort=False,
37
    copy=True
38
) -> Union[DataFrame, Series]:
39
    """
40
    Concatenate cuDF objects along a particular axis with GPU acceleration
41
    
42
    Efficiently combines multiple DataFrames or Series along rows or columns
43
    with flexible joining and indexing options. GPU-optimized for large datasets.
44
    
45
    Parameters:
46
        objs: sequence of DataFrame, Series, or dict
47
            Objects to concatenate (list, tuple, or dict of objects)
48
        axis: int or str, default 0
49
            Axis to concatenate along (0/'index' for rows, 1/'columns' for columns)
50
        join: str, default 'outer'
51
            How to handle indexes on other axis ('inner' or 'outer')
52
        ignore_index: bool, default False
53
            If True, reset index to default integer index
54
        keys: sequence, optional
55
            Construct hierarchical index using keys as outermost level
56
        levels: list of sequences, optional
57
            Specific levels to use for MultiIndex construction
58
        names: list, optional
59
            Names for levels in resulting hierarchical index
60
        verify_integrity: bool, default False
61
            Check whether new concatenated axis contains duplicates
62
        sort: bool, default False
63
            Sort non-concatenation axis if not already aligned
64
        copy: bool, default True
65
            Copy data if False and possible to avoid copy
66
            
67
    Returns:
68
        Union[DataFrame, Series]: Concatenated result of same type as input objects
69
        
70
    Examples:
71
        # Concatenate DataFrames vertically (rows)
72
        df1 = cudf.DataFrame({'A': [1, 2], 'B': [3, 4]})
73
        df2 = cudf.DataFrame({'A': [5, 6], 'B': [7, 8]}) 
74
        result = cudf.concat([df1, df2])  # 4 rows, 2 columns
75
        
76
        # Concatenate horizontally (columns)
77
        df3 = cudf.DataFrame({'C': [9, 10], 'D': [11, 12]})
78
        result = cudf.concat([df1, df3], axis=1)  # 2 rows, 4 columns
79
        
80
        # With hierarchical indexing
81
        result = cudf.concat([df1, df2], keys=['first', 'second'])
82
        
83
        # Ignore original indexes
84
        result = cudf.concat([df1, df2], ignore_index=True)
85
    """
86
```
87

88
## Merging and Joining
89

90
Database-style join operations with various merge strategies and optimizations.
91

92
```{ .api }
93
def merge(
94
    left,
95
    right,
96
    how='inner',
97
    on=None,
98
    left_on=None,
99
    right_on=None,
100
    left_index=False,
101
    right_index=False,
102
    sort=False,
103
    suffixes=('_x', '_y'),
104
    copy=True,
105
    indicator=False,
106
    validate=None,
107
    method='hash'
108
) -> DataFrame:
109
    """
110
    Merge DataFrame objects with database-style join operations
111
    
112
    High-performance GPU joins with automatic optimization and support
113
    for various join algorithms. Handles large datasets efficiently.
114
    
115
    Parameters:
116
        left: DataFrame
117
            Left DataFrame to merge
118
        right: DataFrame  
119
            Right DataFrame to merge
120
        how: str, default 'inner'
121
            Type of merge ('left', 'right', 'outer', 'inner', 'cross')
122
        on: label or list, optional
123
            Column or index level names to join on (must exist in both objects)
124
        left_on: label or list, optional
125
            Column or index level names to join on in left DataFrame
126
        right_on: label or list, optional
127
            Column or index level names to join on in right DataFrame  
128
        left_index: bool, default False
129
            Use left DataFrame's index as join key
130
        right_index: bool, default False
131
            Use right DataFrame's index as join key
132
        sort: bool, default False
133
            Sort join keys lexicographically in result
134
        suffixes: tuple of str, default ('_x', '_y')
135
            Suffixes to apply to overlapping column names
136
        copy: bool, default True
137
            Always copy data, set False to avoid copies when possible
138
        indicator: bool or str, default False
139
            Add column indicating source of each row
140
        validate: str, optional
141
            Check uniqueness of merge keys ('one_to_one', 'one_to_many', etc.)
142
        method: str, default 'hash'
143
            Join algorithm ('hash', 'sort')
144
            
145
    Returns:
146
        DataFrame: Merged DataFrame combining left and right
147
        
148
    Examples:
149
        # Inner join on common column
150
        left = cudf.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
151
        right = cudf.DataFrame({'key': ['A', 'B', 'D'], 'value2': [4, 5, 6]})
152
        result = cudf.merge(left, right, on='key')  # Returns A, B rows
153
        
154
        # Left join with different column names
155
        result = cudf.merge(
156
            left, right,
157
            left_on='key', right_on='key',
158
            how='left'
159
        )
160
        
161
        # Multiple key join
162
        result = cudf.merge(df1, df2, on=['key1', 'key2'], how='outer')
163
        
164
        # Index-based join
165
        result = cudf.merge(
166
            left, right,
167
            left_index=True, right_index=True,
168
            how='inner'
169
        )
170
    """
171
```
172

173
## Reshaping Operations
174

175
Transform data layout between wide and long formats with pivoting and melting.
176

177
```{ .api }
178
def pivot(
179
    data,
180
    index=None,
181
    columns=None,
182
    values=None
183
) -> DataFrame:
184
    """
185
    Pivot data to reshape from long to wide format
186
    
187
    Reorganizes data by pivoting column values into new columns.
188
    GPU-accelerated for large pivot operations.
189
    
190
    Parameters:
191
        data: DataFrame
192
            Input DataFrame to pivot
193
        index: str, list, or array, optional
194
            Column(s) to use to make new DataFrame's index
195
        columns: str, list, or array
196
            Column(s) to use to make new DataFrame's columns
197
        values: str, list, or array, optional
198
            Column(s) to use for populating new DataFrame's values
199
            
200
    Returns:
201
        DataFrame: Pivoted DataFrame with reshaped data
202
        
203
    Examples:
204
        # Basic pivot
205
        df = cudf.DataFrame({
206
            'date': ['2023-01', '2023-01', '2023-02', '2023-02'],
207
            'variable': ['A', 'B', 'A', 'B'], 
208
            'value': [1, 2, 3, 4]
209
        })
210
        result = cudf.pivot(df, index='date', columns='variable', values='value')
211
        
212
        # Multiple values columns
213
        result = cudf.pivot(df, columns='variable', values=['value1', 'value2'])
214
    """
215

216
def pivot_table(
217
    data,
218
    values=None,
219
    index=None,
220
    columns=None,
221
    aggfunc='mean',
222
    fill_value=None,
223
    margins=False,
224
    dropna=True,
225
    margins_name='All',
226
    sort=True
227
) -> DataFrame:
228
    """
229
    Create pivot table with aggregation functions
230
    
231
    Generalized pivot operation that applies aggregation functions to
232
    grouped data. Supports multiple aggregation functions and fill values.
233
    
234
    Parameters:
235
        data: DataFrame
236
            Input DataFrame to create pivot table from
237
        values: str, list, or array, optional
238
            Column(s) to aggregate
239
        index: str, list, or array, optional
240
            Keys to group by on pivot table index
241
        columns: str, list, or array, optional  
242
            Keys to group by on pivot table columns
243
        aggfunc: function, list, dict, default 'mean'
244
            Aggregation function(s) to apply ('mean', 'sum', 'count', etc.)
245
        fill_value: scalar, optional
246
            Value to replace missing values with
247
        margins: bool, default False
248
            Add row/column margins (subtotals)
249
        dropna: bool, default True
250
            Drop columns with all NaN values
251
        margins_name: str, default 'All'
252
            Name of margins row/column
253
        sort: bool, default True
254
            Sort resulting pivot table by index/columns
255
            
256
    Returns:
257
        DataFrame: Pivot table with aggregated values
258
        
259
    Examples:
260
        # Basic pivot table with aggregation
261
        df = cudf.DataFrame({
262
            'A': ['foo', 'foo', 'bar', 'bar'],
263
            'B': ['one', 'two', 'one', 'two'],
264
            'C': [1, 2, 3, 4],
265
            'D': [10, 20, 30, 40]
266
        })
267
        table = cudf.pivot_table(df, values='C', index='A', columns='B', aggfunc='sum')
268
        
269
        # Multiple aggregation functions
270
        table = cudf.pivot_table(
271
            df, values='C', index='A', columns='B',
272
            aggfunc=['sum', 'mean', 'count']
273
        )
274
        
275
        # With margins
276
        table = cudf.pivot_table(df, values='C', index='A', columns='B', margins=True)
277
    """
278

279
def melt(
280
    frame,
281
    id_vars=None,
282
    value_vars=None,
283
    var_name=None,
284
    value_name='value',
285
    col_level=None,
286
    ignore_index=True
287
) -> DataFrame:
288
    """
289
    Unpivot DataFrame from wide to long format (reverse of pivot)
290
    
291
    Transforms columns into rows by "melting" the DataFrame. Useful for
292
    converting wide-format data to long format for analysis.
293
    
294
    Parameters:
295
        frame: DataFrame
296
            DataFrame to melt
297
        id_vars: list of str, optional
298
            Column(s) to use as identifier variables
299
        value_vars: list of str, optional
300
            Column(s) to unpivot (default: all columns not in id_vars)
301
        var_name: str, optional
302
            Name for variable column (default: 'variable')
303
        value_name: str, default 'value'
304
            Name for value column
305
        col_level: int or str, optional
306
            Level to melt for MultiIndex columns
307
        ignore_index: bool, default True
308
            Reset index in result
309
            
310
    Returns:
311
        DataFrame: Melted DataFrame in long format
312
        
313
    Examples:
314
        # Basic melt
315
        df = cudf.DataFrame({
316
            'id': ['A', 'B'],
317
            'var1': [1, 3], 
318
            'var2': [2, 4]
319
        })
320
        result = cudf.melt(df, id_vars=['id'])  # Long format
321
        
322
        # Specify columns to melt
323
        result = cudf.melt(
324
            df,
325
            id_vars=['id'],
326
            value_vars=['var1', 'var2'],
327
            var_name='variable',
328
            value_name='measurement'
329
        )
330
    """
331
```
332

333
## Cross-tabulation and Dummy Variables
334

335
Statistical cross-tabulation and categorical variable encoding.
336

337
```{ .api }
338
def crosstab(
339
    index,
340
    columns,
341
    values=None,
342
    rownames=None,
343
    colnames=None,
344
    aggfunc=None,
345
    margins=False,
346
    margins_name='All',
347
    dropna=True,
348
    normalize=False
349
) -> DataFrame:
350
    """
351
    Compute cross-tabulation of two or more factors
352
    
353
    Creates frequency table showing relationship between categorical variables.
354
    GPU-accelerated for large categorical datasets.
355
    
356
    Parameters:
357
        index: array-like, Series, or list of arrays/Series
358
            Values to group by in rows
359
        columns: array-like, Series, or list of arrays/Series
360
            Values to group by in columns  
361
        values: array-like, optional
362
            Values to aggregate (default: frequency count)
363
        rownames: sequence, optional
364
            Names for row index levels
365
        colnames: sequence, optional
366
            Names for column index levels
367
        aggfunc: function, optional
368
            Aggregation function if values is specified
369
        margins: bool, default False
370
            Add row/column margins
371
        margins_name: str, default 'All'
372
            Name for margin row/column
373
        dropna: bool, default True
374
            Drop missing value combinations
375
        normalize: bool or str, default False
376
            Normalize by dividing by sum ('all', 'index', 'columns')
377
            
378
    Returns:
379
        DataFrame: Cross-tabulation table
380
        
381
    Examples:
382
        # Basic cross-tabulation
383
        a = cudf.Series(['foo', 'foo', 'bar', 'bar'])
384
        b = cudf.Series(['one', 'two', 'one', 'two'])
385
        result = cudf.crosstab(a, b)
386
        
387
        # With values and aggregation
388
        values = cudf.Series([1, 2, 3, 4])
389
        result = cudf.crosstab(a, b, values=values, aggfunc='sum')
390
        
391
        # Normalized
392
        result = cudf.crosstab(a, b, normalize=True)
393
    """
394

395
def get_dummies(
396
    data,
397
    prefix=None,
398
    prefix_sep='_',
399
    dummy_na=False,
400
    columns=None,
401
    sparse=False,
402
    drop_first=False,
403
    dtype=None
404
) -> DataFrame:
405
    """
406
    Convert categorical variables to dummy/indicator variables
407
    
408
    Creates binary columns for each category in categorical variables.
409
    Commonly used for machine learning feature encoding.
410
    
411
    Parameters:
412
        data: array-like, Series, or DataFrame
413
            Data to create dummy variables from
414
        prefix: str, list of str, or dict, optional
415
            Prefix for dummy column names
416
        prefix_sep: str, default '_'
417
            Separator between prefix and category name
418
        dummy_na: bool, default False
419
            Add column for missing values
420
        columns: list-like, optional
421
            Column names to encode (default: all categorical columns)
422
        sparse: bool, default False
423
            Return sparse matrix (not supported, included for compatibility)
424
        drop_first: bool, default False
425
            Drop first category to avoid multicollinearity
426
        dtype: numpy.dtype, optional
427
            Data type for dummy variables
428
            
429
    Returns:
430
        DataFrame: DataFrame with dummy variables
431
        
432
    Examples:
433
        # From Series
434
        s = cudf.Series(['a', 'b', 'c', 'a'])
435
        result = cudf.get_dummies(s)  # Creates 3 binary columns
436
        
437
        # From DataFrame with prefix
438
        df = cudf.DataFrame({'col': ['red', 'blue', 'red', 'green']})
439
        result = cudf.get_dummies(df, prefix='color')
440
        
441
        # Drop first category
442
        result = cudf.get_dummies(df, drop_first=True)
443
    """
444

445
def unstack(
446
    level=-1,
447
    fill_value=None
448
) -> DataFrame:
449
    """
450
    Pivot index level to columns (MultiIndex method)
451
    
452
    Transforms index level into columns, effectively pivoting the data.
453
    Used with MultiIndex DataFrames to reshape hierarchical data.
454
    
455
    Parameters:
456
        level: int, str, or list, default -1
457
            Level(s) of index to unstack
458
        fill_value: scalar, optional
459
            Value to use for missing combinations
460
            
461
    Returns:
462
        DataFrame: DataFrame with unstacked index level as columns
463
        
464
    Examples:
465
        # Create MultiIndex DataFrame
466
        arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
467
        index = cudf.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
468
        df = cudf.DataFrame({'value': [10, 20, 30, 40]}, index=index)
469
        
470
        # Unstack inner level
471
        result = df.unstack()  # number level becomes columns
472
        
473
        # Unstack specific level
474
        result = df.unstack(level='letter')
475
    """
476
```
477

478
## Algorithm Functions
479

480
Fundamental algorithms for data analysis and preprocessing.
481

482
```{ .api }
483
def factorize(
484
    values,
485
    sort=False,
486
    na_sentinel=-1,
487
    use_na_sentinel=True
488
) -> tuple[cupy.ndarray, Index]:
489
    """
490
    Encode input values as enumerated type or categorical variable
491
    
492
    Converts object array to integer codes and unique values. Useful for
493
    creating categorical encodings and memory-efficient representations.
494
    
495
    Parameters:
496
        values: array-like
497
            Sequence to factorize (Series, Index, or array-like)
498
        sort: bool, default False
499
            Sort unique values and codes  
500
        na_sentinel: int, default -1
501
            Value to mark missing values with
502
        use_na_sentinel: bool, default True
503
            Whether to use sentinel value for missing data
504
            
505
    Returns:
506
        tuple: (codes, uniques)
507
            codes: cupy.ndarray of integer codes
508
            uniques: Index of unique values
509
            
510
    Examples:
511
        # Basic factorization
512
        values = cudf.Series(['red', 'blue', 'red', 'green'])
513
        codes, uniques = cudf.factorize(values)
514
        # codes: [0, 1, 0, 2], uniques: ['red', 'blue', 'green']
515
        
516
        # With sorting
517
        codes, uniques = cudf.factorize(values, sort=True)
518
        
519
        # Handle missing values
520
        values_na = cudf.Series(['a', None, 'b', 'a'])
521
        codes, uniques = cudf.factorize(values_na)
522
    """
523

524
def unique(values) -> Union[cupy.ndarray, Index]:
525
    """
526
    Return unique values from array-like object
527
    
528
    GPU-accelerated unique value extraction with automatic deduplication.
529
    Preserves data type and handles missing values appropriately.
530
    
531
    Parameters:
532
        values: array-like
533
            Input array, Series, or Index
534
            
535
    Returns:
536
        Union[cupy.ndarray, Index]: Unique values in same type as input
537
        
538
    Examples:
539
        # From Series
540
        s = cudf.Series([1, 2, 2, 3, 1, 4])
541
        unique_vals = cudf.unique(s)  # [1, 2, 3, 4]
542
        
543
        # From array with strings
544
        arr = ['a', 'b', 'a', 'c', 'b']
545
        unique_vals = cudf.unique(arr)  # ['a', 'b', 'c']
546
        
547
        # Preserves data type
548
        dates = cudf.Series(['2023-01-01', '2023-01-02', '2023-01-01'])
549
        dates = cudf.to_datetime(dates)
550
        unique_dates = cudf.unique(dates)
551
    """
552

553
def cut(
554
    x,
555
    bins,
556
    right=True,
557
    labels=None,
558
    retbins=False,
559
    precision=3,
560
    include_lowest=False,
561
    duplicates='raise'
562
) -> Union[Series, tuple]:
563
    """
564
    Bin continuous values into discrete intervals
565
    
566
    Segments and sorts data values into bins. Useful for creating categorical
567
    variables from continuous data and histogram-like operations.
568
    
569
    Parameters:
570
        x: array-like
571
            Input array to be binned (1-dimensional)
572
        bins: int, sequence, or IntervalIndex
573
            Criteria for binning (number of bins or bin edges)
574
        right: bool, default True
575
            Whether intervals include right edge
576
        labels: array-like or False, optional
577
            Labels for returned bins (length must match number of bins)
578
        retbins: bool, default False
579
            Whether to return bins array
580
        precision: int, default 3
581
            Precision for bin edge display
582
        include_lowest: bool, default False
583
            Whether first interval should be left-inclusive
584
        duplicates: str, default 'raise'
585
            Treatment of duplicate bin edges ('raise' or 'drop')
586
            
587
    Returns:
588
        Union[Series, tuple]: Categorical Series with bin assignments
589
            If retbins=True, returns (binned_series, bin_edges)
590
            
591
    Examples:
592
        # Equal-width bins
593
        values = cudf.Series([1, 7, 5, 4, 6, 3])
594
        result = cudf.cut(values, bins=3)  # 3 equal-width bins
595
        
596
        # Custom bin edges
597
        result = cudf.cut(values, bins=[0, 3, 6, 9])
598
        
599
        # With custom labels
600
        result = cudf.cut(
601
            values, 
602
            bins=3, 
603
            labels=['low', 'medium', 'high']
604
        )
605
        
606
        # Return bin edges
607
        result, bin_edges = cudf.cut(values, bins=4, retbins=True)
608
    """
609
```
610

611
## Date and Time Operations
612

613
Comprehensive date/time functionality for temporal data analysis.
614

615
```{ .api }
616
def date_range(
617
    start=None,
618
    end=None,
619
    periods=None,
620
    freq=None,
621
    tz=None,
622
    normalize=False,
623
    name=None,
624
    closed=None
625
) -> DatetimeIndex:
626
    """
627
    Generate sequence of dates with GPU acceleration
628
    
629
    Creates DatetimeIndex with regular frequency between start and end dates.
630
    Supports various frequency specifications and timezone handling.
631
    
632
    Parameters:
633
        start: str or datetime-like, optional
634
            Left bound for generating dates
635
        end: str or datetime-like, optional  
636
            Right bound for generating dates
637
        periods: int, optional
638
            Number of periods to generate
639
        freq: str or DateOffset, default 'D'
640
            Frequency string ('D', 'H', 'min', 'S', 'MS', etc.)
641
        tz: str or tzinfo, optional
642
            Timezone name for localized DatetimeIndex
643
        normalize: bool, default False
644
            Normalize start/end dates to midnight
645
        name: str, optional
646
            Name of resulting DatetimeIndex
647
        closed: str, optional
648
            Make interval closed ('left', 'right', or None)
649
            
650
    Returns:
651
        DatetimeIndex: Fixed frequency DatetimeIndex
652
        
653
    Examples:
654
        # Basic date range
655
        dates = cudf.date_range('2023-01-01', '2023-01-10', freq='D')
656
        
657
        # By number of periods
658
        dates = cudf.date_range('2023-01-01', periods=10, freq='D')
659
        
660
        # Hourly frequency
661
        dates = cudf.date_range('2023-01-01', periods=24, freq='H')
662
        
663
        # With timezone
664
        dates = cudf.date_range('2023-01-01', periods=5, freq='D', tz='UTC')
665
        
666
        # Business days only
667
        dates = cudf.date_range('2023-01-01', periods=10, freq='B')
668
    """
669

670
def to_datetime(
671
    arg,
672
    errors='raise',
673
    dayfirst=False,
674
    yearfirst=False,
675
    utc=None,
676
    format=None,
677
    exact=True,
678
    unit=None,
679
    infer_datetime_format=False,
680
    origin='unix',
681
    cache=True
682
) -> Union[datetime, Series, DatetimeIndex]:
683
    """
684
    Convert argument to datetime with GPU acceleration
685
    
686
    Flexible datetime parsing with automatic format detection and
687
    error handling. Optimized for large-scale datetime conversions.
688
    
689
    Parameters:
690
        arg: int, float, str, datetime, list, tuple, array, Series, DataFrame
691
            Object to convert to datetime
692
        errors: str, default 'raise'
693
            Error handling ('raise', 'coerce', 'ignore')  
694
        dayfirst: bool, default False
695
            Interpret first value as day in ambiguous cases
696
        yearfirst: bool, default False
697
            Interpret first value as year in ambiguous cases
698
        utc: bool, optional
699
            Return UTC DatetimeIndex if True
700
        format: str, optional
701
            Strftime format to use for parsing
702
        exact: bool, default True
703
            Whether format must match exactly
704
        unit: str, optional
705
            Unit for numeric conversions ('D', 's', 'ms', 'us', 'ns')
706
        infer_datetime_format: bool, default False
707
            Attempt to infer format automatically
708
        origin: scalar, default 'unix'
709
            Define origin for numeric conversions  
710
        cache: bool, default True
711
            Use cache for repeated conversion patterns
712
            
713
    Returns:
714
        Union[datetime, Series, DatetimeIndex]: Converted datetime object
715
        
716
    Examples:
717
        # String conversion
718
        dates = cudf.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03'])
719
        
720
        # With format specification
721
        dates = cudf.to_datetime(
722
            ['01/01/2023', '01/02/2023'], 
723
            format='%m/%d/%Y'
724
        )
725
        
726
        # Numeric timestamps
727
        timestamps = [1609459200, 1609545600, 1609632000]  # Unix timestamps
728
        dates = cudf.to_datetime(timestamps, unit='s')
729
        
730
        # Error handling
731
        mixed = ['2023-01-01', 'invalid', '2023-01-03']
732
        dates = cudf.to_datetime(mixed, errors='coerce')  # Invalid -> NaT
733
    """
734

735
def interval_range(
736
    start=None,
737
    end=None,
738
    periods=None,
739
    freq=None,
740
    name=None,
741
    closed='right'
742
) -> IntervalIndex:
743
    """
744
    Generate sequence of intervals with fixed frequency
745
    
746
    Creates IntervalIndex with regular intervals between start and end.
747
    Useful for time-based and numeric interval operations.
748
    
749
    Parameters:
750
        start: numeric or datetime-like, optional
751
            Left bound for generating intervals
752
        end: numeric or datetime-like, optional
753
            Right bound for generating intervals
754
        periods: int, optional
755
            Number of intervals to generate
756
        freq: numeric, str, or DateOffset, optional
757
            Length of each interval
758
        name: str, optional
759
            Name of resulting IntervalIndex
760
        closed: str, default 'right'
761
            Which side of intervals is closed ('left', 'right', 'both', 'neither')
762
            
763
    Returns:
764
        IntervalIndex: Fixed frequency IntervalIndex
765
        
766
    Examples:
767
        # Numeric intervals
768
        intervals = cudf.interval_range(start=0, end=10, periods=5)
769
        
770
        # Date intervals  
771
        intervals = cudf.interval_range(
772
            start='2023-01-01',
773
            end='2023-01-10', 
774
            freq='2D'
775
        )
776
        
777
        # Custom frequency
778
        intervals = cudf.interval_range(start=0, periods=4, freq=2.5)
779
    """
780

781
class DateOffset:
782
    """
783
    Standard offset class for date arithmetic and frequency operations
784
    
785
    Base class for date offsets that can be added to datetime objects.
786
    Provides consistent interface for date manipulation operations.
787
    
788
    Parameters:
789
        n: int, default 1
790
            Number of offset periods
791
            
792
    Examples:
793
        # Create date offset
794
        offset = cudf.DateOffset(days=1)
795
        
796
        # Add to datetime
797
        date = cudf.to_datetime('2023-01-01')
798
        new_date = date + offset
799
        
800
        # Use in date_range
801
        dates = cudf.date_range('2023-01-01', periods=5, freq=offset)
802
    """
803

804
def to_numeric(
805
    arg,
806
    errors='raise',
807
    downcast=None
808
) -> Union[Series, scalar]:
809
    """
810
    Convert argument to numeric type with GPU acceleration
811
    
812
    Attempts to convert object to numeric type with flexible error handling
813
    and optional downcasting for memory efficiency.
814
    
815
    Parameters:
816
        arg: scalar, list, tuple, array, Series
817
            Object to convert to numeric type
818
        errors: str, default 'raise'
819
            Error handling ('raise', 'coerce', 'ignore')
820
        downcast: str, optional
821
            Downcast to smallest possible numeric type ('integer', 'signed', 'unsigned', 'float')
822
            
823
    Returns:
824
        Union[Series, scalar]: Converted numeric object
825
        
826
    Examples:
827
        # String to numeric conversion
828
        strings = cudf.Series(['1', '2', '3.5', '4'])
829
        numeric = cudf.to_numeric(strings)
830
        
831
        # Error handling
832
        mixed = cudf.Series(['1', '2', 'invalid', '4'])
833
        numeric = cudf.to_numeric(mixed, errors='coerce')  # Invalid -> NaN
834
        
835
        # Downcast for memory efficiency
836
        large_ints = cudf.Series([1, 2, 3, 4])  # Default int64
837
        small_ints = cudf.to_numeric(large_ints, downcast='integer')  # Smallest int type
838
    """
839
```
840

841
## Groupby Operations
842

843
Flexible grouping utilities for split-apply-combine operations.
844

845
```{ .api }
846
class Grouper:
847
    """
848
    Groupby specification object for complex grouping operations
849
    
850
    Provides detailed control over groupby operations including time-based
851
    grouping, level selection, and custom key functions.
852
    
853
    Parameters:
854
        key: str, optional
855
            Grouping key (column name for DataFrame, None for Series)
856
        level: int, str, or list, optional
857
            Level name or number for MultiIndex grouping
858
        freq: str or DateOffset, optional
859
            Frequency for time-based grouping
860
        axis: int, default 0
861
            Axis to group along
862
        sort: bool, default True
863
            Sort group keys
864
            
865
    Examples:
866
        # Time-based grouping
867
        df = cudf.DataFrame({
868
            'date': cudf.date_range('2023-01-01', periods=10, freq='D'),
869
            'value': range(10)
870
        })
871
        monthly = df.groupby(cudf.Grouper(key='date', freq='M')).sum()
872
        
873
        # MultiIndex grouping
874
        grouper = cudf.Grouper(level='category')  
875
        result = df.groupby(grouper).mean()
876
    """
877

878
class NamedAgg:
879
    """
880
    Named aggregation specification for groupby operations
881
    
882
    Provides clear naming for aggregation results when using multiple
883
    aggregation functions on the same column.
884
    
885
    Parameters:
886
        column: str
887
            Column name to aggregate
888
        aggfunc: str or callable
889
            Aggregation function name or function
890
            
891
    Examples:
892
        # Named aggregations
893
        df = cudf.DataFrame({
894
            'group': ['A', 'B', 'A', 'B'],
895
            'value': [1, 2, 3, 4]
896
        })
897
        
898
        result = df.groupby('group').agg(
899
            mean_value=cudf.NamedAgg('value', 'mean'),
900
            sum_value=cudf.NamedAgg('value', 'sum'),
901
            count_value=cudf.NamedAgg('value', 'count')
902
        )
903
    """
904
```
905

906
## Performance Optimizations
907

908
### GPU Memory Management
909
- **Columnar Operations**: Optimized for columnar data layout
910
- **Memory Pooling**: Efficient memory allocation for operations  
911
- **Zero-Copy**: Minimal data movement between manipulations
912
- **Automatic Broadcasting**: Efficient element-wise operations
913

914
### Parallel Algorithms
915
- **Hash-Based Joins**: GPU-optimized hash joins for merge operations
916
- **Parallel Sort**: Multi-key parallel sorting algorithms  
917
- **Grouped Operations**: SIMD optimized groupby aggregations
918
- **Vectorized Functions**: GPU kernels for element-wise operations
919

920
### Query Optimization
921
- **Kernel Fusion**: Combine multiple operations into single GPU kernels
922
- **Lazy Evaluation**: Defer computation until results needed
923
- **Memory-Aware**: Automatically choose algorithms based on available memory
924
- **Cache Locality**: Optimize memory access patterns for GPU caches

Version

Tile

Files

data-manipulation.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-manipulation.mddocs/