0
# CuPy
1
2
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. It acts as a drop-in replacement for NumPy and SciPy code, enabling seamless migration of existing CPU-based numerical computations to GPU hardware for significant performance improvements. CuPy supports both NVIDIA CUDA and AMD ROCm platforms.
3
4
## Package Information
5
6
- **Package Name**: cupy-rocm-4-3
7
- **Language**: Python
8
- **Installation**: `pip install cupy-rocm-4-3`
9
- **GPU Platform**: AMD ROCm 4.3
10
- **Compatibility**: NumPy 1.26+, Python 3.8+
11
12
## Core Imports
13
14
```python
15
import cupy as cp
16
```
17
18
For specific functionality:
19
20
```python
21
import cupy
22
from cupy import cuda
23
import cupy.linalg
24
import cupy.random
25
import cupy.fft
26
import cupyx
27
```
28
29
## Basic Usage
30
31
```python
32
import cupy as cp
33
import numpy as np
34
35
# Create arrays on GPU
36
x_gpu = cp.array([1, 2, 3, 4, 5])
37
y_gpu = cp.linspace(0, 10, 100)
38
39
# NumPy-compatible operations run on GPU
40
z_gpu = cp.sin(x_gpu) * 2
41
mean_val = cp.mean(y_gpu)
42
43
# Linear algebra operations
44
A = cp.random.rand(1000, 1000)
45
B = cp.random.rand(1000, 1000)
46
C = cp.dot(A, B) # Matrix multiplication on GPU
47
48
# Convert back to CPU when needed
49
result = cp.asnumpy(C) # Returns numpy array
50
51
# Memory management
52
mempool = cp.get_default_memory_pool()
53
print(f"Memory used: {mempool.used_bytes()} bytes")
54
```
55
56
## Architecture
57
58
CuPy provides GPU acceleration through several key architectural components:
59
60
- **GPU Arrays**: `cupy.ndarray` objects that mirror NumPy's ndarray API but execute on GPU
61
- **Memory Management**: Automatic memory pools for efficient GPU memory allocation and deallocation
62
- **CUDA/ROCm Integration**: Direct access to GPU runtime, streams, events, and kernel compilation
63
- **Kernel System**: Custom kernel creation through ElementwiseKernel, ReductionKernel, and RawKernel
64
- **Device Management**: Multi-GPU support with context switching and device selection
65
66
The library maintains NumPy API compatibility while providing GPU-specific extensions through the `cupy.cuda` and `cupyx` modules, enabling both easy migration and advanced GPU programming.
67
68
## Capabilities
69
70
### Array Creation and Manipulation
71
72
Core array creation functions, data type handling, and array manipulation operations that mirror NumPy's functionality. Includes basic array creation, shape manipulation, indexing, and element access.
73
74
```python { .api }
75
def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
76
def zeros(shape, dtype=float, order='C'): ...
77
def ones(shape, dtype=None, order='C'): ...
78
def empty(shape, dtype=float, order='C'): ...
79
def arange(start, stop=None, step=1, dtype=None): ...
80
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
81
```
82
83
[Array Creation and Manipulation](./array-creation.md)
84
85
### Mathematical Functions
86
87
Comprehensive mathematical operations including trigonometric, exponential, logarithmic, hyperbolic, arithmetic, and special functions. All operations are GPU-accelerated and maintain NumPy compatibility.
88
89
```python { .api }
90
def sin(x): ...
91
def cos(x): ...
92
def exp(x): ...
93
def log(x): ...
94
def sqrt(x): ...
95
def add(x1, x2): ...
96
def multiply(x1, x2): ...
97
def power(x1, x2): ...
98
```
99
100
[Mathematical Functions](./mathematical-functions.md)
101
102
### Linear Algebra
103
104
GPU-accelerated linear algebra operations including matrix multiplication, decompositions, eigenvalue computations, and equation solving. Powered by cuBLAS and cuSOLVER libraries.
105
106
```python { .api }
107
def dot(a, b, out=None): ...
108
def matmul(x1, x2, out=None): ...
109
def solve(a, b): ...
110
def inv(a): ...
111
def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...
112
def eigh(a, UPLO='L'): ...
113
```
114
115
[Linear Algebra](./linear-algebra.md)
116
117
### Random Number Generation
118
119
GPU-accelerated random number generation supporting multiple distributions and random number generators. Provides both legacy RandomState interface and modern Generator interface.
120
121
```python { .api }
122
def random(size=None): ...
123
def normal(loc=0.0, scale=1.0, size=None): ...
124
def uniform(low=0.0, high=1.0, size=None): ...
125
def choice(a, size=None, replace=True, p=None): ...
126
class RandomState: ...
127
class Generator: ...
128
```
129
130
[Random Number Generation](./random.md)
131
132
### Fast Fourier Transform
133
134
GPU-accelerated Fast Fourier Transform operations supporting 1D, 2D, and N-dimensional transforms for both complex and real data. Compatible with NumPy's FFT interface.
135
136
```python { .api }
137
def fft(a, n=None, axis=-1, norm=None): ...
138
def ifft(a, n=None, axis=-1, norm=None): ...
139
def rfft(a, n=None, axis=-1, norm=None): ...
140
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
141
def fftn(a, s=None, axes=None, norm=None): ...
142
```
143
144
[Fast Fourier Transform](./fft.md)
145
146
### Statistics and Sorting
147
148
Statistical functions, sorting algorithms, and searching functions. Includes descriptive statistics, histograms, correlations, and efficient sorting operations.
149
150
```python { .api }
151
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
152
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
153
def sort(a, axis=-1, kind=None, order=None): ...
154
def argsort(a, axis=-1, kind=None, order=None): ...
155
def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): ...
156
```
157
158
[Statistics and Sorting](./statistics.md)
159
160
### CUDA Integration
161
162
Direct CUDA/ROCm integration providing low-level GPU control including memory management, stream operations, kernel compilation, and device management.
163
164
```python { .api }
165
class Device: ...
166
class Stream: ...
167
class MemoryPool: ...
168
def get_device_id(): ...
169
def synchronize(): ...
170
def malloc(size): ...
171
```
172
173
[CUDA Integration](./cuda-integration.md)
174
175
### Custom Kernels
176
177
Create custom GPU kernels for specialized operations. Supports element-wise kernels, reduction kernels, and raw CUDA kernels with just-in-time compilation.
178
179
```python { .api }
180
class ElementwiseKernel: ...
181
class ReductionKernel: ...
182
class RawKernel: ...
183
def fuse(*args, **kwargs): ...
184
```
185
186
[Custom Kernels](./custom-kernels.md)
187
188
### Extended Functionality (cupyx)
189
190
Extended functionality beyond NumPy compatibility including SciPy-compatible functions, JIT compilation, optimization utilities, and specialized GPU algorithms.
191
192
```python { .api }
193
def scatter_add(a, indices, updates, axis=None): ...
194
def rsqrt(x): ...
195
class GeneralizedUFunc: ...
196
def empty_pinned(shape, dtype=float, order='C'): ...
197
```
198
199
[Extended Functionality](./extended-functionality.md)
200
201
### Input/Output Functions
202
203
File input/output operations for saving and loading arrays in various formats. Supports NumPy-compatible binary formats (.npy, .npz) and text formats with automatic GPU-CPU data transfers.
204
205
```python { .api }
206
def save(file, arr, allow_pickle=True, fix_imports=True): ...
207
def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'): ...
208
def savez(file, *args, **kwds): ...
209
def loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None): ...
210
def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer=''): ...
211
```
212
213
[Input/Output Functions](./io-functions.md)
214
215
### Polynomial Functions
216
217
Polynomial operations including fitting, evaluation, arithmetic, and root finding. Provides both functional interface and object-oriented poly1d class for polynomial manipulation.
218
219
```python { .api }
220
def poly(seq_of_zeros): ...
221
def polyval(p, x): ...
222
def polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False): ...
223
def roots(p): ...
224
def polyadd(a1, a2): ...
225
def polymul(a1, a2): ...
226
class poly1d: ...
227
```
228
229
[Polynomial Functions](./polynomial.md)
230
231
### Data Type Functions
232
233
Data type utilities for type checking, conversion, and promotion. Essential functions for managing data types and ensuring compatibility between GPU array operations.
234
235
```python { .api }
236
def can_cast(from_, to, casting='safe'): ...
237
def result_type(*arrays_and_dtypes): ...
238
def common_type(*arrays): ...
239
def promote_types(type1, type2): ...
240
def finfo(dtype): ...
241
def iinfo(int_type): ...
242
```
243
244
[Data Type Functions](./data-types.md)
245
246
### Utility Functions
247
248
General utility functions for array inspection, memory management, and CuPy-specific operations. Includes functions for memory transfer, debugging, and functional programming patterns.
249
250
```python { .api }
251
def get_array_module(*args): ...
252
def asnumpy(a, stream=None, blocking=True): ...
253
def get_default_memory_pool(): ...
254
def vectorize(pyfunc, otypes=None, doc=None, excluded=None, cache=False): ...
255
def show_config(): ...
256
def who(vardict=None): ...
257
```
258
259
[Utility Functions](./utilities.md)
260
261
### Logic Functions
262
263
Logical operations, comparisons, and truth value testing. Includes element-wise logical operations, array comparisons, content testing for special values, and set operations.
264
265
```python { .api }
266
def logical_and(x1, x2): ...
267
def logical_or(x1, x2): ...
268
def equal(x1, x2): ...
269
def less(x1, x2): ...
270
def all(a, axis=None, out=None, keepdims=False): ...
271
def isfinite(x): ...
272
def in1d(ar1, ar2, assume_unique=False, invert=False): ...
273
```
274
275
[Logic Functions](./logic-functions.md)
276
277
## Types
278
279
```python { .api }
280
class ndarray:
281
"""GPU array class compatible with numpy.ndarray"""
282
def __init__(self, shape, dtype=float, memptr=None, strides=None, order='C'): ...
283
def get(self, stream=None, order='C', out=None): ...
284
def set(self, arr, stream=None): ...
285
def copy(self, order='K'): ...
286
def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...
287
288
# Properties
289
shape: tuple
290
dtype: numpy.dtype
291
size: int
292
ndim: int
293
data: cupy.cuda.MemoryPointer
294
295
class ufunc:
296
"""Universal function class for element-wise operations"""
297
def __call__(self, *args, **kwargs): ...
298
def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...
299
def accumulate(self, array, axis=0, dtype=None, out=None): ...
300
```