0
# CuPy
1
2
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. It acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA platforms, providing comprehensive GPU acceleration for scientific computing, machine learning, and data analysis workflows while maintaining full compatibility with existing NumPy-based codebases.
3
4
## Package Information
5
6
- **Package Name**: cupy-cuda101
7
- **Language**: Python
8
- **Installation**: `pip install cupy-cuda101`
9
- **CUDA Version**: 10.1
10
11
## Core Imports
12
13
```python
14
import cupy as cp
15
```
16
17
Common for extending functionality:
18
19
```python
20
import cupyx
21
```
22
23
For CUDA-specific operations:
24
25
```python
26
import cupy.cuda
27
```
28
29
## Basic Usage
30
31
```python
32
import cupy as cp
33
import numpy as np
34
35
# Create arrays on GPU
36
x = cp.arange(6).reshape(2, 3).astype('f')
37
y = cp.ones((2, 3), dtype='float32')
38
39
# GPU array operations (NumPy-compatible API)
40
result = cp.dot(x, y.T)
41
print(result)
42
43
# Convert between CPU and GPU
44
gpu_array = cp.array([1, 2, 3, 4, 5])
45
cpu_array = cp.asnumpy(gpu_array) # Transfer to CPU
46
gpu_array2 = cp.asarray(cpu_array) # Transfer to GPU
47
48
# Memory management
49
mempool = cp.get_default_memory_pool()
50
print(f"Used bytes: {mempool.used_bytes()}")
51
print(f"Total bytes: {mempool.total_bytes()}")
52
```
53
54
## Architecture
55
56
CuPy provides a comprehensive GPU computing framework:
57
58
- **Core Arrays**: `cupy.ndarray` objects that mirror NumPy arrays but reside in GPU memory
59
- **NumPy Compatibility**: Full API compatibility with NumPy for seamless code migration
60
- **CUDA Integration**: Direct access to CUDA features including kernels, streams, and memory management
61
- **Extended Functionality**: `cupyx` module provides SciPy-compatible functions and advanced GPU optimizations
62
- **Memory Management**: Automatic memory pooling with customizable allocators
63
- **Kernel Fusion**: Automatic optimization of element-wise operations for improved performance
64
65
This design enables high-performance scientific computing by seamlessly transferring array operations to GPU while maintaining full compatibility with existing NumPy-based codebases.
66
67
## Capabilities
68
69
### Array Creation and Manipulation
70
71
Core array creation functions including basic arrays, ranges, matrices, and data conversion. These functions mirror NumPy's array creation API while creating arrays on GPU memory.
72
73
```python { .api }
74
def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
75
def zeros(shape, dtype=float, order='C'): ...
76
def ones(shape, dtype=None, order='C'): ...
77
def empty(shape, dtype=float, order='C'): ...
78
def arange(start, stop=None, step=None, dtype=None): ...
79
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
80
```
81
82
[Array Creation and Manipulation](./array-creation.md)
83
84
### Mathematical Functions
85
86
Comprehensive mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and special functions. All operations are performed on GPU with NumPy-compatible interfaces.
87
88
```python { .api }
89
def sin(x, out=None, **kwargs): ...
90
def cos(x, out=None, **kwargs): ...
91
def exp(x, out=None, **kwargs): ...
92
def log(x, out=None, **kwargs): ...
93
def sqrt(x, out=None, **kwargs): ...
94
def add(x1, x2, out=None, **kwargs): ...
95
def multiply(x1, x2, out=None, **kwargs): ...
96
```
97
98
[Mathematical Functions](./math-functions.md)
99
100
### Linear Algebra
101
102
GPU-accelerated linear algebra operations including matrix operations, decompositions, eigenvalue problems, and solving linear systems. Powered by cuBLAS and cuSOLVER libraries.
103
104
```python { .api }
105
def dot(a, b, out=None): ...
106
def matmul(x1, x2, out=None, **kwargs): ...
107
def einsum(subscripts, *operands, out=None, **kwargs): ...
108
```
109
110
[Linear Algebra](./linalg.md)
111
112
### Fast Fourier Transform
113
114
GPU-accelerated FFT operations for 1D, 2D, and N-dimensional transforms. Supports real and complex transforms with comprehensive frequency domain processing capabilities.
115
116
```python { .api }
117
def fft(a, n=None, axis=-1, norm=None): ...
118
def ifft(a, n=None, axis=-1, norm=None): ...
119
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
120
def fftn(a, s=None, axes=None, norm=None): ...
121
```
122
123
[Fast Fourier Transform](./fft.md)
124
125
### Random Number Generation
126
127
GPU-based random number generation with comprehensive probability distributions. Supports both legacy RandomState interface and modern Generator API with various bit generators.
128
129
```python { .api }
130
def random(size=None): ...
131
def randn(*size): ...
132
def randint(low, high=None, size=None, dtype=int): ...
133
def normal(loc=0.0, scale=1.0, size=None): ...
134
def uniform(low=0.0, high=1.0, size=None): ...
135
```
136
137
[Random Number Generation](./random.md)
138
139
### CUDA Programming Interface
140
141
Direct access to CUDA features including custom kernels, memory management, streams, and device control. Enables low-level GPU programming within Python.
142
143
```python { .api }
144
class RawKernel: ...
145
class ElementwiseKernel: ...
146
class ReductionKernel: ...
147
class Stream: ...
148
class Device: ...
149
```
150
151
[CUDA Programming Interface](./cuda.md)
152
153
### Statistics and Aggregation
154
155
Statistical functions and array aggregation operations including descriptive statistics, histograms, and correlation analysis. All operations are GPU-accelerated with NumPy-compatible interfaces.
156
157
```python { .api }
158
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
159
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
160
def histogram(a, bins=10, range=None, weights=None, density=None): ...
161
def corrcoef(x, y=None, rowvar=True, bias=False, ddof=None): ...
162
```
163
164
[Statistics and Aggregation](./statistics.md)
165
166
### Memory Management and Performance
167
168
Memory management functions, performance optimization utilities, and kernel fusion capabilities for maximizing GPU performance and managing memory usage efficiently.
169
170
```python { .api }
171
def get_default_memory_pool(): ...
172
def get_default_pinned_memory_pool(): ...
173
def fuse(*args, **kwargs): ...
174
def asnumpy(a, stream=None, order='C'): ...
175
```
176
177
[Memory Management and Performance](./memory-performance.md)
178
179
### Array Manipulation
180
181
Array manipulation operations including reshaping, transposing, joining, splitting, and rearranging arrays. Provides comprehensive tools for transforming array structure and organization.
182
183
```python { .api }
184
def reshape(a, newshape, order='C'): ...
185
def transpose(a, axes=None): ...
186
def concatenate(arrays, axis=0, out=None, dtype=None, casting="same_kind"): ...
187
def split(ary, indices_or_sections, axis=0): ...
188
def stack(arrays, axis=0, out=None): ...
189
def expand_dims(a, axis): ...
190
```
191
192
[Array Manipulation](./array-manipulation.md)
193
194
### Binary Operations
195
196
Bitwise operations for integer and boolean arrays including AND, OR, XOR, NOT operations and bit shifting. Essential for low-level data manipulation and boolean logic.
197
198
```python { .api }
199
def bitwise_and(x1, x2, out=None, **kwargs): ...
200
def bitwise_or(x1, x2, out=None, **kwargs): ...
201
def bitwise_xor(x1, x2, out=None, **kwargs): ...
202
def bitwise_not(x, out=None, **kwargs): ...
203
def left_shift(x1, x2, out=None, **kwargs): ...
204
def right_shift(x1, x2, out=None, **kwargs): ...
205
```
206
207
[Binary Operations](./binary-operations.md)
208
209
### Logic Functions
210
211
Logical operations and comparison functions including element-wise and array-wise logical operations, truth value testing, and type checking functions.
212
213
```python { .api }
214
def logical_and(x1, x2, out=None, **kwargs): ...
215
def logical_or(x1, x2, out=None, **kwargs): ...
216
def equal(x1, x2, out=None, **kwargs): ...
217
def greater(x1, x2, out=None, **kwargs): ...
218
def isfinite(x, out=None, **kwargs): ...
219
def all(a, axis=None, out=None, keepdims=False): ...
220
```
221
222
[Logic Functions](./logic-functions.md)
223
224
### Indexing and Searching
225
226
Advanced indexing, searching, and selection operations including multi-dimensional indexing, conditional selection, and array searching functions.
227
228
```python { .api }
229
def take(a, indices, axis=None, out=None, mode='raise'): ...
230
def choose(a, choices, out=None, mode='raise'): ...
231
def where(condition, x=None, y=None): ...
232
def nonzero(a): ...
233
def argmax(a, axis=None, out=None): ...
234
def searchsorted(a, v, side='left', sorter=None): ...
235
```
236
237
[Indexing and Searching](./indexing-searching.md)
238
239
### Sorting and Counting
240
241
Sorting algorithms, search operations, and counting functions for array organization and analysis including various sort methods and element counting.
242
243
```python { .api }
244
def sort(a, axis=-1, kind=None, order=None): ...
245
def argsort(a, axis=-1, kind=None, order=None): ...
246
def lexsort(keys, axis=-1): ...
247
def partition(a, kth, axis=-1, kind=None, order=None): ...
248
def count_nonzero(a, axis=None, keepdims=False): ...
249
```
250
251
[Sorting and Counting](./sorting-counting.md)
252
253
### Input and Output
254
255
File I/O operations for saving and loading arrays in various formats including NumPy's .npy and .npz formats with GPU memory optimization.
256
257
```python { .api }
258
def save(file, arr, allow_pickle=True, fix_imports=True): ...
259
def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'): ...
260
def savez(file, *args, **kwds): ...
261
def savez_compressed(file, *args, **kwds): ...
262
```
263
264
[Input and Output](./io-operations.md)
265
266
### Polynomial Functions
267
268
Polynomial operations including polynomial arithmetic, fitting, evaluation, and root finding with full GPU acceleration for mathematical analysis.
269
270
```python { .api }
271
def polyval(p, x): ...
272
def polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False): ...
273
def polyadd(a1, a2): ...
274
def polymul(a1, a2): ...
275
def roots(p): ...
276
```
277
278
[Polynomial Functions](./polynomial-functions.md)
279
280
## Types
281
282
```python { .api }
283
class ndarray:
284
"""N-dimensional array object on GPU.
285
286
CuPy's main data structure that mirrors NumPy's ndarray but resides in GPU memory.
287
Supports all NumPy array operations and attributes.
288
"""
289
def __init__(self, shape, dtype=float, order='C'): ...
290
291
# Properties
292
shape: tuple
293
dtype: numpy.dtype
294
size: int
295
ndim: int
296
data: cupy.cuda.MemoryPointer
297
298
# Methods
299
def get(self, stream=None, order='C'): ... # Transfer to CPU
300
def set(self, arr, stream=None): ... # Transfer from CPU
301
def copy(self, order='C'): ...
302
def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...
303
304
class ufunc:
305
"""Universal function object for element-wise array operations."""
306
def __call__(self, *args, **kwargs): ...
307
def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False, initial=None, where=True): ...
308
309
# Memory and Device Types
310
class MemoryPointer:
311
"""Pointer to GPU memory location."""
312
ptr: int
313
size: int
314
device: Device
315
316
class Device:
317
"""CUDA device representation."""
318
id: int
319
320
class Stream:
321
"""CUDA stream for asynchronous operations."""
322
def __init__(self, non_blocking=False): ...
323
def synchronize(self): ...
324
325
# Custom Kernel Types
326
class RawKernel:
327
"""Raw CUDA kernel wrapper."""
328
def __init__(self, code, name, options=(), backend='nvcc', translate_cucomplex=True): ...
329
def __call__(self, grid, block, args, **kwargs): ...
330
331
class ElementwiseKernel:
332
"""Element-wise operation kernel."""
333
def __init__(self, in_params, out_params, operation, name='kernel', **kwargs): ...
334
def __call__(self, *args, **kwargs): ...
335
336
class ReductionKernel:
337
"""Reduction operation kernel."""
338
def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr='', **kwargs): ...
339
def __call__(self, *args, **kwargs): ...
340
```