NumPy & SciPy-compatible array library for GPU-accelerated computing with Python
npx @tessl/cli install tessl/pypi-cupy@13.6.00
# CuPy
1
2
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms, providing significant performance improvements for mathematical computations, linear algebra, and scientific computing workloads.
3
4
## Package Information
5
6
- **Package Name**: cupy
7
- **Language**: Python
8
- **Installation**: `pip install cupy` (or `cupy-cuda11x`, `cupy-cuda12x` for specific CUDA versions)
9
10
## Core Imports
11
12
```python
13
import cupy as cp
14
```
15
16
For CUDA-specific functionality:
17
18
```python
19
import cupy.cuda as cuda
20
```
21
22
For extended functionality:
23
24
```python
25
import cupyx
26
```
27
28
## Basic Usage
29
30
```python
31
import cupy as cp
32
import numpy as np
33
34
# Create arrays on GPU
35
x_gpu = cp.array([1, 2, 3, 4, 5])
36
y_gpu = cp.zeros((3, 3))
37
38
# Perform operations on GPU (same API as NumPy)
39
result_gpu = cp.sum(x_gpu)
40
z_gpu = cp.dot(x_gpu, x_gpu)
41
42
# Transfer data between CPU and GPU
43
x_cpu = cp.asnumpy(x_gpu) # GPU to CPU
44
x_gpu_from_cpu = cp.asarray(x_cpu) # CPU to GPU
45
46
# Linear algebra operations
47
A = cp.random.random((1000, 1000))
48
B = cp.random.random((1000, 1000))
49
C = cp.dot(A, B) # Performed on GPU
50
51
# Element-wise operations with broadcasting
52
result = cp.sqrt(A) + cp.sin(B)
53
```
54
55
## Architecture
56
57
CuPy's architecture mirrors NumPy while enabling GPU acceleration:
58
59
- **ndarray**: GPU-accelerated equivalent of NumPy arrays, supporting same interface and operations
60
- **CUDA Memory Management**: Automatic memory pooling and allocation on GPU devices
61
- **Universal Functions (ufuncs)**: Element-wise operations optimized for parallel GPU execution
62
- **Kernel System**: Custom CUDA kernels for specialized operations not covered by standard functions
63
- **Stream Management**: CUDA streams for asynchronous execution and memory operations
64
- **Multi-GPU Support**: Distribution of computations across multiple GPU devices
65
66
This design enables seamless migration from NumPy to GPU computing while maintaining full API compatibility and adding CUDA-specific enhancements for maximum performance.
67
68
## Capabilities
69
70
### Core Array Class
71
72
The fundamental ndarray class providing GPU-accelerated multi-dimensional arrays.
73
74
```python { .api }
75
class ndarray:
76
"""
77
GPU-accelerated multi-dimensional array object.
78
79
Attributes:
80
- shape: tuple, dimensions of the array
81
- dtype: data type of array elements
82
- size: int, total number of elements
83
- ndim: int, number of dimensions
84
- itemsize: int, size of each element in bytes
85
- nbytes: int, total bytes consumed by elements
86
- device: cupy.cuda.Device, GPU device where array resides
87
"""
88
89
def __init__(self, shape, dtype=float, order='C'): ...
90
def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...
91
def copy(self, order='C'): ...
92
def flatten(self, order='C'): ...
93
def ravel(self, order='C'): ...
94
def reshape(self, *shape, order='C'): ...
95
def squeeze(self, axis=None): ...
96
def transpose(self, *axes): ...
97
def swapaxes(self, axis1, axis2): ...
98
def get(self, stream=None, order='C', out=None): ...
99
def set(self, arr, stream=None): ...
100
def sum(self, axis=None, dtype=None, out=None, keepdims=False): ...
101
def mean(self, axis=None, dtype=None, out=None, keepdims=False): ...
102
def std(self, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
103
def var(self, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
104
def max(self, axis=None, out=None, keepdims=False, initial=None, where=None): ...
105
def min(self, axis=None, out=None, keepdims=False, initial=None, where=None): ...
106
def dot(self, b, out=None): ...
107
def sort(self, axis=-1, kind=None, order=None): ...
108
def argsort(self, axis=-1, kind=None, order=None): ...
109
```
110
111
### Array Creation and Manipulation
112
113
Core functionality for creating, reshaping, and manipulating GPU arrays with the same interface as NumPy.
114
115
```python { .api }
116
def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
117
def zeros(shape, dtype=None, order='C'): ...
118
def ones(shape, dtype=None, order='C'): ...
119
def empty(shape, dtype=float32, order='C'): ...
120
def arange(start, stop=None, step=1, dtype=None): ...
121
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
122
def reshape(a, newshape, order='C'): ...
123
def concatenate(arrays, axis=0, out=None, dtype=None, casting='same_kind'): ...
124
```
125
126
[Array Creation and Manipulation](./array-creation.md)
127
128
### Array Manipulation and Reshaping
129
130
Shape manipulation, joining, splitting, and rearranging array operations.
131
132
```python { .api }
133
def reshape(a, newshape, order='C'): ...
134
def ravel(a, order='C'): ...
135
def transpose(a, axes=None): ...
136
def moveaxis(a, source, destination): ...
137
def swapaxes(a, axis1, axis2): ...
138
def squeeze(a, axis=None): ...
139
def expand_dims(a, axis): ...
140
def atleast_1d(*arys): ...
141
def atleast_2d(*arys): ...
142
def atleast_3d(*arys): ...
143
def stack(arrays, axis=0, out=None): ...
144
def vstack(tup): ...
145
def hstack(tup): ...
146
def dstack(tup): ...
147
def split(ary, indices_or_sections, axis=0): ...
148
def hsplit(ary, indices_or_sections): ...
149
def vsplit(ary, indices_or_sections): ...
150
def repeat(a, repeats, axis=None): ...
151
def tile(A, reps): ...
152
def flip(m, axis=None): ...
153
def roll(a, shift, axis=None): ...
154
```
155
156
### Mathematical Operations
157
158
Element-wise mathematical functions including trigonometric, logarithmic, arithmetic, and comparison operations.
159
160
```python { .api }
161
def add(x1, x2, /, out=None): ...
162
def multiply(x1, x2, /, out=None): ...
163
def sin(x, /, out=None): ...
164
def cos(x, /, out=None): ...
165
def exp(x, /, out=None): ...
166
def log(x, /, out=None): ...
167
def sqrt(x, /, out=None): ...
168
def maximum(x1, x2, /, out=None): ...
169
def sum(a, axis=None, dtype=None, out=None, keepdims=False, initial=None, where=None): ...
170
```
171
172
[Mathematical Operations](./math-functions.md)
173
174
### Linear Algebra
175
176
GPU-accelerated linear algebra operations including matrix multiplication, decompositions, eigenvalue computation, and solving linear systems.
177
178
```python { .api }
179
def dot(a, b, out=None): ...
180
def matmul(x1, x2, /, out=None, *, casting='same_kind', order='K', dtype=None, subok=True): ...
181
def einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe', optimize=False): ...
182
```
183
184
From `cupy.linalg`:
185
186
```python { .api }
187
def norm(x, ord=None, axis=None, keepdims=False): ...
188
def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...
189
def inv(a): ...
190
def solve(a, b): ...
191
def eigh(a, UPLO='L'): ...
192
```
193
194
[Linear Algebra](./linear-algebra.md)
195
196
### Random Number Generation
197
198
GPU-accelerated random number generation with multiple generators and probability distributions.
199
200
```python { .api }
201
def rand(*args): ...
202
def randn(*args): ...
203
def randint(low, high=None, size=None, dtype=int): ...
204
def random_sample(size=None): ...
205
def normal(loc=0.0, scale=1.0, size=None): ...
206
def uniform(low=0.0, high=1.0, size=None): ...
207
def choice(a, size=None, replace=True, p=None): ...
208
```
209
210
Generator API:
211
212
```python { .api }
213
def default_rng(seed=None): ...
214
class Generator:
215
def random(self, size=None, dtype=float64, out=None): ...
216
def integers(self, low, high=None, size=None, dtype=int64, endpoint=False): ...
217
```
218
219
[Random Number Generation](./random.md)
220
221
### Fast Fourier Transform
222
223
GPU-accelerated discrete Fourier transforms for signal processing and frequency domain analysis.
224
225
```python { .api }
226
def fft(a, n=None, axis=-1, norm=None): ...
227
def ifft(a, n=None, axis=-1, norm=None): ...
228
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
229
def fftn(a, s=None, axes=None, norm=None): ...
230
def rfft(a, n=None, axis=-1, norm=None): ...
231
def fftshift(x, axes=None): ...
232
def fftfreq(n, d=1.0): ...
233
```
234
235
[Fast Fourier Transform](./fft.md)
236
237
### CUDA Memory and Device Management
238
239
Low-level CUDA functionality for memory allocation, device management, and stream operations.
240
241
```python { .api }
242
def get_default_memory_pool(): ...
243
def get_default_pinned_memory_pool(): ...
244
def is_available(): ...
245
def asnumpy(a, stream=None, order='C', out=None, *, blocking=True): ...
246
def get_array_module(*args): ...
247
```
248
249
From `cupy.cuda`:
250
251
```python { .api }
252
class Device:
253
def __init__(self, device=None): ...
254
def __enter__(self): ...
255
def __exit__(self, *args): ...
256
257
class Stream:
258
def __init__(self, null=False, non_blocking=False, priority=0): ...
259
def synchronize(self): ...
260
261
class MemoryPool:
262
def __init__(self, allocator=None): ...
263
def malloc(self, size): ...
264
def free_all_blocks(self): ...
265
```
266
267
[CUDA Memory and Device Management](./cuda-management.md)
268
269
### Custom Kernels and Performance
270
271
Tools for writing custom CUDA kernels and optimizing GPU performance.
272
273
```python { .api }
274
class ElementwiseKernel:
275
def __init__(self, in_params, out_params, operation, name='kernel', **kwargs): ...
276
def __call__(self, *args, **kwargs): ...
277
278
class ReductionKernel:
279
def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr, identity, name='kernel', **kwargs): ...
280
def __call__(self, *args, **kwargs): ...
281
282
class RawKernel:
283
def __init__(self, code, name, **kwargs): ...
284
def __call__(self, grid, block, args, *, shared_mem=0, stream=None): ...
285
```
286
287
[Custom Kernels and Performance](./kernels.md)
288
289
### Statistics and Data Analysis
290
291
Statistical functions for data analysis including descriptive statistics, correlations, and histograms.
292
293
```python { .api }
294
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
295
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
296
def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
297
def median(a, axis=None, out=None, overwrite_input=False, keepdims=False): ...
298
def corrcoef(x, y=None, rowvar=True, bias=None, ddof=None): ...
299
def histogram(a, bins=10, range=None, weights=None, density=None): ...
300
def percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False): ...
301
```
302
303
[Statistics and Data Analysis](./statistics.md)
304
305
### Indexing and Selection
306
307
Advanced indexing operations including multi-dimensional indexing, selection, and array generation utilities.
308
309
```python { .api }
310
def take(a, indices, axis=None, out=None, mode='raise'): ...
311
def take_along_axis(arr, indices, axis): ...
312
def choose(a, choices, out=None, mode='raise'): ...
313
def compress(condition, a, axis=None, out=None): ...
314
def extract(condition, arr): ...
315
def select(condlist, choicelist, default=0): ...
316
def indices(dimensions, dtype=int, sparse=False): ...
317
def ix_(*args): ...
318
def ravel_multi_index(multi_index, dims, mode='raise', order='C'): ...
319
def unravel_index(indices, shape, order='C'): ...
320
def diagonal(a, offset=0, axis1=0, axis2=1): ...
321
def diag_indices(n, ndim=2): ...
322
def triu_indices(n, k=0, m=None): ...
323
def tril_indices(n, k=0, m=None): ...
324
```
325
326
### Sparse Matrix Operations
327
328
GPU-accelerated sparse matrix operations for large-scale scientific computing.
329
330
```python { .api }
331
class csr_matrix:
332
def __init__(self, arg1, shape=None, dtype=None, copy=False): ...
333
def dot(self, other): ...
334
def transpose(self, axes=None, copy=False): ...
335
336
class csc_matrix:
337
def __init__(self, arg1, shape=None, dtype=None, copy=False): ...
338
339
class coo_matrix:
340
def __init__(self, arg1, shape=None, dtype=None, copy=False): ...
341
```
342
343
[Sparse Matrix Operations](./sparse.md)
344
345
### SciPy Compatibility Extensions
346
347
Extended scientific computing functions from cupyx.scipy for advanced mathematical operations.
348
349
From `cupyx.scipy`:
350
351
```python { .api }
352
# Signal processing
353
def convolve(in1, in2, mode='full', method='auto'): ...
354
def correlate(in1, in2, mode='full', method='auto'): ...
355
356
# Image processing
357
def gaussian_filter(input, sigma, order=0, output=None, mode='reflect', cval=0.0, truncate=4.0): ...
358
def sobel(input, axis=-1, output=None, mode='reflect', cval=0.0): ...
359
360
# Optimization
361
def minimize(fun, x0, args=(), method=None, jac=None, bounds=None, constraints=()): ...
362
```
363
364
[SciPy Extensions](./scipy-extensions.md)
365
366
### Input/Output Operations
367
368
File operations for loading and saving arrays in various formats.
369
370
```python { .api }
371
def load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII'): ...
372
def save(file, arr, allow_pickle=True, fix_imports=True): ...
373
def savez(file, *args, **kwds): ...
374
def savez_compressed(file, *args, **kwds): ...
375
def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None): ...
376
def loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None): ...
377
```
378
379
### Logic and Comparison Functions
380
381
Element-wise logical operations, truth value testing, and array comparison functions.
382
383
```python { .api }
384
def allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False): ...
385
def array_equal(a1, a2, equal_nan=False): ...
386
def array_equiv(a1, a2): ...
387
def isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False): ...
388
def isfinite(x, /, out=None): ...
389
def isinf(x, /, out=None): ...
390
def isnan(x, /, out=None): ...
391
def isreal(x): ...
392
def iscomplex(x): ...
393
def in1d(ar1, ar2, assume_unique=False, invert=False): ...
394
def isin(element, test_elements, assume_unique=False, invert=False): ...
395
def intersect1d(ar1, ar2, assume_unique=False, return_indices=False): ...
396
def setdiff1d(ar1, ar2, assume_unique=False): ...
397
def union1d(ar1, ar2): ...
398
```
399
400
### Binary Operations
401
402
Bitwise operations and binary representations.
403
404
```python { .api }
405
def bitwise_and(x1, x2, /, out=None): ...
406
def bitwise_or(x1, x2, /, out=None): ...
407
def bitwise_xor(x1, x2, /, out=None): ...
408
def bitwise_not(x, /, out=None): ...
409
def left_shift(x1, x2, /, out=None): ...
410
def right_shift(x1, x2, /, out=None): ...
411
def packbits(a, axis=None, bitorder='big'): ...
412
def unpackbits(a, axis=None, count=None, bitorder='big'): ...
413
```
414
415
## Error Handling
416
417
CuPy uses the same exception hierarchy as NumPy with additional CUDA-specific exceptions:
418
419
```python { .api }
420
class AxisError(Exception): ...
421
class ComplexWarning(Warning): ...
422
class TooHardError(Exception): ...
423
class VisibleDeprecationWarning(Warning): ...
424
```
425
426
Common CUDA-related errors are automatically handled with informative error messages for debugging GPU memory issues, device compatibility, and kernel execution problems.