Tessl Tile for pypi/cudf-cu12@25.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-cudf-cu12

GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/cudf-cu12@25.8.x

To install, run

npx @tessl/cli install tessl/pypi-cudf-cu12@25.8.0

0
# cuDF: GPU-Accelerated DataFrames
1

2
cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
3

4
## Package Information
5

6
- **Package**: `cudf-cu12`
7
- **Import**: `cudf`
8
- **Version**: 25.8.0+
9
- **Installation**: `pip install cudf-cu12` or `conda install cudf`
10
- **Requirements**: NVIDIA GPU with CUDA support
11

12
## Core Imports
13

14
```python
15
# Main data structures
16
import cudf
17
from cudf import DataFrame, Series, Index
18

19
# I/O operations
20
from cudf import read_csv, read_parquet, read_json
21
from cudf.io import read_orc, read_avro, read_feather
22

23
# Data manipulation
24
from cudf import concat, merge, pivot_table
25
from cudf import cut, factorize, unique
26

27
# Type checking
28
from cudf.api.types import is_numeric_dtype, is_categorical_dtype
29
from cudf.api.types import dtype
30

31
# Configuration
32
from cudf.options import get_option, set_option
33

34
# Dataset generation
35
from cudf.datasets import timeseries, randomdata
36

37
# Version information
38
import cudf
39
print(cudf.__version__)  # Package version
40
```
41

42
## Basic Usage
43

44
```{ .api }
45
# Create DataFrame from dictionary
46
df = cudf.DataFrame({
47
    'x': [1, 2, 3, 4, 5],
48
    'y': [1.0, 2.5, 3.2, 4.1, 5.8],
49
    'z': ['red', 'green', 'blue', 'red', 'green']
50
})
51

52
# GPU-accelerated operations
53
result = df.groupby('z').agg({'x': 'sum', 'y': 'mean'})
54

55
# I/O operations leverage GPU memory
56
df_from_file = cudf.read_parquet('data.parquet')
57
df_from_file.to_csv('output.csv')
58

59
# Seamless pandas compatibility
60
df_pandas = df.to_pandas()  # Move to CPU
61
df_cudf = cudf.from_pandas(df_pandas)  # Move to GPU
62
```
63

64
## Architecture
65

66
cuDF leverages the RAPIDS ecosystem to provide GPU-accelerated data processing:
67

68
- **GPU Memory Management**: Built on RAPIDS Memory Manager (RMM) for efficient GPU memory allocation
69
- **Columnar Storage**: Uses Apache Arrow format for optimal GPU performance
70
- **libcudf Backend**: C++/CUDA library provides the computational engine
71
- **Pandas API**: Maintains familiar pandas interface while delivering GPU performance
72
- **Zero-Copy Interop**: Seamless integration with PyArrow, Numba, and other GPU libraries
73

74
## Core Data Structures
75

76
cuDF provides GPU-accelerated versions of pandas' core data structures with enhanced capabilities.
77

78
```{ .api }
79
class DataFrame:
80
    """GPU-accelerated DataFrame with pandas-like API"""
81
    
82
class Series:
83
    """One-dimensional GPU array with axis labels"""
84
    
85
class Index:
86
    """Immutable sequence used for axis labels and selection"""
87
    
88
class RangeIndex(Index):
89
    """Memory-efficient index for integer ranges"""
90
    
91
class CategoricalIndex(Index):
92
    """Index for categorical data with GPU acceleration"""
93
```
94

95
**Key Features**: GPU memory efficiency, nested data types (lists, structs), decimal precision support.
96

97
[**→ Learn more about Core Data Structures**](./core-data-structures.md)
98

99
## I/O Operations
100

101
High-performance GPU I/O for popular data formats with automatic memory management.
102

103
```{ .api }
104
def read_parquet(filepath_or_buffer, columns=None, **kwargs) -> DataFrame:
105
    """
106
    Read Apache Parquet file directly into GPU memory
107
    
108
    Parameters:
109
        filepath_or_buffer: File path, URL, or buffer-like object
110
        columns: List[str], optional column subset to read
111
        **kwargs: Additional parquet reading options
112
        
113
    Returns:
114
        DataFrame: GPU-accelerated DataFrame
115
    """
116

117
def read_csv(filepath_or_buffer, **kwargs) -> DataFrame:
118
    """
119
    Read CSV file with GPU acceleration
120
    
121
    Parameters:
122
        filepath_or_buffer: File path or buffer
123
        **kwargs: CSV parsing options (delimiter, header, etc.)
124
        
125
    Returns:
126
        DataFrame: GPU DataFrame with parsed CSV data
127
    """
128
```
129

130
**Supported Formats**: Parquet, ORC, CSV, JSON, Avro, Feather, HDF5, raw text files.
131

132
[**→ Learn more about I/O Operations**](./io-operations.md)
133

134
## Data Manipulation
135

136
GPU-accelerated operations for reshaping, joining, and transforming data.
137

138
```{ .api }
139
def concat(objs, axis=0, ignore_index=False, **kwargs) -> Union[DataFrame, Series]:
140
    """
141
    Concatenate cuDF objects along a particular axis
142
    
143
    Parameters:
144
        objs: Sequence of DataFrame/Series objects
145
        axis: int, axis to concatenate along (0='index', 1='columns')
146
        ignore_index: bool, reset index if True
147
        
148
    Returns:
149
        Union[DataFrame, Series]: Concatenated result
150
    """
151

152
def merge(left, right, how='inner', on=None, **kwargs) -> DataFrame:
153
    """
154
    Merge DataFrame objects with database-style join operations
155
    
156
    Parameters:
157
        left: DataFrame, left object to merge
158
        right: DataFrame, right object to merge  
159
        how: str, type of merge ('inner', 'outer', 'left', 'right')
160
        on: label or list, column names to join on
161
        
162
    Returns:
163
        DataFrame: Merged DataFrame
164
    """
165
```
166

167
**Operations**: Concatenation, merging, pivoting, melting, groupby, aggregation, sorting.
168

169
[**→ Learn more about Data Manipulation**](./data-manipulation.md)
170

171
## Type Checking & Validation
172

173
Comprehensive type checking system for GPU data types including nested types.
174

175
```{ .api }
176
def is_numeric_dtype(arr_or_dtype) -> bool:
177
    """
178
    Check whether the provided array or dtype is numeric
179
    
180
    Parameters:
181
        arr_or_dtype: Array-like or data type to check
182
        
183
    Returns:
184
        bool: True if numeric dtype
185
    """
186

187
def is_categorical_dtype(arr_or_dtype) -> bool:
188
    """
189
    Check whether the array or dtype is categorical
190
    
191
    Parameters:
192
        arr_or_dtype: Array-like or data type to check
193
        
194
    Returns:
195
        bool: True if categorical dtype  
196
    """
197
```
198

199
**Type Support**: Standard dtypes, categorical, decimal, list, struct, interval, datetime types.
200

201
[**→ Learn more about Type Checking**](./type-checking.md)
202

203
## Pandas Compatibility Layer
204

205
Drop-in acceleration for existing pandas code with cudf.pandas.
206

207
```{ .api }
208
def install() -> None:
209
    """
210
    Enable cuDF pandas accelerator mode
211
    
212
    Automatically accelerates pandas operations with GPU when beneficial,
213
    falls back to CPU pandas for unsupported operations.
214
    """
215

216
class Profiler:
217
    """
218
    Performance profiler for pandas acceleration opportunities
219
    
220
    Analyzes pandas code execution to identify GPU acceleration potential
221
    """
222
```
223

224
**Features**: Automatic fallback, transparent acceleration, performance profiling, IPython magic commands.
225

226
[**→ Learn more about Pandas Compatibility**](./pandas-compatibility.md)
227

228
## Testing Utilities
229

230
GPU-aware testing framework with specialized assertions for cuDF objects.
231

232
```{ .api }
233
def assert_frame_equal(left, right, check_dtype=True, **kwargs) -> None:
234
    """
235
    Assert DataFrame equality with GPU-aware comparison
236
    
237
    Parameters:
238
        left: DataFrame, expected result
239
        right: DataFrame, actual result
240
        check_dtype: bool, whether to check dtype compatibility
241
        **kwargs: Additional comparison options
242
    """
243
```
244

245
**Capabilities**: DataFrame/Series/Index comparison, GPU memory validation, performance assertions.
246

247
[**→ Learn more about Testing Utilities**](./testing-utilities.md)
248

249
## Configuration Management
250

251
Global configuration system for controlling GPU memory usage and behavior.
252

253
```{ .api }
254
def get_option(key: str) -> Any:
255
    """
256
    Get the value of a configuration option
257
    
258
    Parameters:
259
        key: str, configuration option key
260
        
261
    Returns:
262
        Any: Current option value
263
    """
264

265
def set_option(key: str, value: Any) -> None:
266
    """
267
    Set a configuration option value
268
    
269
    Parameters:  
270
        key: str, configuration option key
271
        value: Any, new option value
272
    """
273
```
274

275
**Options**: Memory management, display formatting, computation behavior, I/O settings.
276

277
## Error Handling
278

279
Specialized error types for GPU-specific issues and mixed-type operations.
280

281
```{ .api }
282
class UnsupportedCUDAError(Exception):
283
    """Raised when CUDA functionality is not supported"""
284

285
class MixedTypeError(Exception):
286
    """Raised when mixing incompatible GPU and CPU types"""
287
```
288

289
## Dataset Generation
290

291
Utilities for generating test data and benchmarking datasets directly in GPU memory.
292

293
```{ .api }
294
def timeseries(
295
    start='2000-01-01', 
296
    end='2000-01-31', 
297
    freq='1s', 
298
    dtypes=None, 
299
    nulls_frequency=0, 
300
    seed=None
301
) -> DataFrame:
302
    """
303
    Generate random timeseries data for testing and benchmarking
304
    
305
    Parameters:
306
        start: str or datetime-like, start date
307
        end: str or datetime-like, end date  
308
        freq: str, date frequency string (e.g., '1s', '1H', '1D')
309
        dtypes: dict, mapping of column names to types
310
        nulls_frequency: float, proportion of nulls to include (0-1)
311
        seed: int, random state seed for reproducibility
312
        
313
    Returns:
314
        DataFrame: GPU DataFrame with random timeseries data
315
    """
316

317
def randomdata(nrows=10, dtypes=None, seed=None) -> DataFrame:
318
    """
319
    Generate random data for testing and benchmarking
320
    
321
    Parameters:
322
        nrows: int, number of rows to generate
323
        dtypes: dict, mapping of column names to types
324
        seed: int, random state seed for reproducibility
325
        
326
    Returns:
327
        DataFrame: GPU DataFrame with random data
328
    """
329
```
330

331
## Performance Benefits
332

333
- **Memory Bandwidth**: 10-50x improvement over pandas for large datasets
334
- **Parallel Processing**: Leverages thousands of GPU cores for operations
335
- **Memory Efficiency**: Columnar storage reduces memory footprint  
336
- **Zero-Copy**: Minimal data movement between GPU operations
337
- **Automatic Optimization**: Query optimization and kernel fusion
338

339
## GPU Requirements
340

341
- NVIDIA GPU with Compute Capability 7.0+ (Volta architecture or newer)
342
- CUDA 11.2+ or CUDA 12.0+
343
- Sufficient GPU memory for dataset size
344
- Compatible NVIDIA drivers
345

346
## Version Information
347

348
Access package version and build information programmatically.
349

350
```{ .api }
351
import cudf
352

353
# Package version string  
354
__version__ = cudf.__version__  # e.g., "25.8.0"
355

356
# Git commit hash (if available)
357
__git_commit__ = cudf.__git_commit__  # e.g., "6cea3743b6"
358
```