CuPy: NumPy & SciPy for GPU - CUDA 11.x optimized distribution providing GPU-accelerated computing with Python
npx @tessl/cli install tessl/pypi-cupy-cuda11x@13.6.00
# CuPy
1
2
CuPy is a NumPy/SciPy-compatible array library that accelerates NumPy-based code using NVIDIA CUDA or AMD ROCm platforms. It provides a comprehensive GPU-accelerated computing framework for scientific computing, machine learning, and data analysis, serving as a drop-in replacement for NumPy arrays with extensive mathematical operations, linear algebra, signal processing, and statistical functions.
3
4
## Package Information
5
6
- **Package Name**: cupy-cuda11x
7
- **Language**: Python
8
- **Installation**: `pip install cupy-cuda11x`
9
- **CUDA Compatibility**: CUDA 11.2 through 11.8
10
- **Platform Support**: Linux (x86_64, aarch64), Windows (x86_64)
11
12
## Core Imports
13
14
```python
15
import cupy as cp
16
```
17
18
For specific modules:
19
20
```python
21
import cupy
22
from cupy import fft, linalg, random
23
import cupyx
24
from cupyx import scipy
25
```
26
27
## Basic Usage
28
29
```python
30
import cupy as cp
31
import numpy as np
32
33
# Create arrays on GPU
34
gpu_array = cp.array([1, 2, 3, 4])
35
gpu_zeros = cp.zeros((1000, 1000))
36
37
# NumPy-compatible operations
38
result = cp.sum(gpu_array)
39
matrix_mult = cp.dot(gpu_zeros, gpu_zeros.T)
40
41
# Transfer between CPU and GPU
42
cpu_array = cp.asnumpy(gpu_array) # GPU to CPU
43
gpu_from_cpu = cp.asarray(cpu_array) # CPU to GPU
44
45
# Mathematical operations
46
x = cp.linspace(0, 2*cp.pi, 1000)
47
y = cp.sin(x)
48
```
49
50
## Architecture
51
52
CuPy's architecture mirrors NumPy while providing GPU acceleration:
53
54
- **Core Array**: `cupy.ndarray` - GPU memory-resident array objects with NumPy-compatible interface
55
- **Mathematical Functions**: Element-wise and reduction operations leveraging CUDA kernels
56
- **Memory Management**: Automatic memory pooling with configurable allocators for optimal GPU memory usage
57
- **Stream Processing**: Asynchronous execution support through CUDA streams
58
- **Kernel Integration**: Custom CUDA kernel support via RawKernel and ElementwiseKernel
59
- **SciPy Extensions**: `cupyx.scipy` provides GPU-accelerated SciPy-compatible functions
60
61
## Capabilities
62
63
### Array Operations
64
65
Core array creation, manipulation, and mathematical operations that form the foundation of GPU-accelerated NumPy-compatible computing.
66
67
```python { .api }
68
def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
69
def zeros(shape, dtype=float, order='C'): ...
70
def ones(shape, dtype=None, order='C'): ...
71
def empty(shape, dtype=float, order='C'): ...
72
def arange(start, stop=None, step=1, dtype=None): ...
73
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
74
```
75
76
[Array Operations](./array-operations.md)
77
78
### Mathematical Functions
79
80
Comprehensive mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, and statistical functions optimized for GPU execution.
81
82
```python { .api }
83
def sin(x, out=None, **kwargs): ...
84
def cos(x, out=None, **kwargs): ...
85
def exp(x, out=None, **kwargs): ...
86
def log(x, out=None, **kwargs): ...
87
def sqrt(x, out=None, **kwargs): ...
88
def sum(a, axis=None, dtype=None, out=None, keepdims=False): ...
89
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
90
```
91
92
[Mathematical Functions](./mathematical-functions.md)
93
94
### Linear Algebra
95
96
GPU-accelerated linear algebra operations including matrix multiplication, decomposition, eigenvalue computation, and solving linear systems.
97
98
```python { .api }
99
def dot(a, b, out=None): ...
100
def matmul(x1, x2, out=None): ...
101
def einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe', optimize=False): ...
102
```
103
104
[Linear Algebra](./linear-algebra.md)
105
106
### Fast Fourier Transform
107
108
GPU-accelerated FFT operations supporting 1D, 2D, and N-D transforms with both forward and inverse operations.
109
110
```python { .api }
111
def fft(a, n=None, axis=-1, norm=None): ...
112
def ifft(a, n=None, axis=-1, norm=None): ...
113
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
114
def fftn(a, s=None, axes=None, norm=None): ...
115
```
116
117
[Fast Fourier Transform](./fft.md)
118
119
### Random Number Generation
120
121
Comprehensive random number generation including uniform, normal, and specialized distributions, all optimized for GPU parallel execution.
122
123
```python { .api }
124
def random(size=None, dtype=float, out=None): ...
125
def normal(loc=0.0, scale=1.0, size=None, dtype=float): ...
126
def uniform(low=0.0, high=1.0, size=None, dtype=float): ...
127
def choice(a, size=None, replace=True, p=None): ...
128
```
129
130
[Random Number Generation](./random.md)
131
132
### CUDA Integration
133
134
Direct CUDA device management, memory operations, kernel execution, and stream processing for advanced GPU programming.
135
136
```python { .api }
137
class Device:
138
def __init__(self, device=None): ...
139
def use(self): ...
140
141
def get_device_id(): ...
142
def synchronize(): ...
143
```
144
145
[CUDA Integration](./cuda-integration.md)
146
147
### SciPy Extensions
148
149
GPU-accelerated SciPy-compatible functions including sparse matrices, signal processing, image processing, optimization, and statistical operations.
150
151
```python { .api }
152
# Available through cupyx.scipy
153
import cupyx.scipy as scipy
154
```
155
156
[SciPy Extensions](./scipy-extensions.md)
157
158
### Custom Kernel Development
159
160
Advanced CUDA kernel development enabling custom element-wise operations, reduction kernels, and raw CUDA programming for maximum performance and specialized computational tasks.
161
162
```python { .api }
163
class ElementwiseKernel:
164
def __init__(self, in_params, out_params, operation, name="kernel", **kwargs): ...
165
def __call__(self, *args, **kwargs): ...
166
167
class ReductionKernel:
168
def __init__(self, in_params, out_params, map_expr, reduce_expr, **kwargs): ...
169
def __call__(self, *args, **kwargs): ...
170
171
class RawKernel:
172
def __init__(self, code, name, **kwargs): ...
173
def __call__(self, grid, block, args=(), shared_mem=0, stream=None): ...
174
```
175
176
[Custom Kernel Development](./custom-kernels.md)
177
178
### JIT Compilation
179
180
Just-in-time compilation of Python functions to GPU kernels, enabling high-performance GPU programming with Python syntax and automatic optimization.
181
182
```python { .api }
183
def rawkernel(device=False): ...
184
def kernel(grid=None, block=None, shared_mem=0): ...
185
def elementwise(signature): ...
186
def reduction(signature, identity=None): ...
187
```
188
189
[JIT Compilation](./jit-compilation.md)
190
191
### Performance Profiling
192
193
Comprehensive performance analysis tools for measuring execution times, analyzing GPU utilization, memory usage profiling, and identifying optimization opportunities.
194
195
```python { .api }
196
def benchmark(func, args=(), kwargs=None, **params): ...
197
def time_range(): ...
198
def profile(): ...
199
def nvtx_push(message, color=None): ...
200
```
201
202
[Performance Profiling](./performance-profiling.md)
203
204
### Input/Output Operations
205
206
File I/O operations supporting various formats including binary, text, and compressed data with efficient GPU-CPU data transfer and memory management.
207
208
```python { .api }
209
def save(file, arr): ...
210
def load(file, **kwargs): ...
211
def loadtxt(fname, **kwargs): ...
212
def savetxt(fname, X, **kwargs): ...
213
```
214
215
[Input/Output Operations](./io-operations.md)
216
217
### Polynomial Operations
218
219
Mathematical operations with polynomials including arithmetic, evaluation, fitting, root finding, and advanced polynomial manipulations with support for various polynomial bases.
220
221
```python { .api }
222
class poly1d:
223
def __init__(self, c_or_r, r=False, variable=None): ...
224
def __call__(self, val): ...
225
226
def polyfit(x, y, deg, **kwargs): ...
227
def polyval(p, x): ...
228
def roots(p): ...
229
```
230
231
[Polynomial Operations](./polynomial-operations.md)
232
233
## Types
234
235
```python { .api }
236
class ndarray:
237
"""
238
GPU-resident N-dimensional array object compatible with NumPy arrays.
239
240
Attributes:
241
shape: tuple of ints - dimensions of the array
242
dtype: numpy.dtype - data type of array elements
243
size: int - total number of elements
244
ndim: int - number of dimensions
245
device: cupy.cuda.Device - GPU device containing the array
246
"""
247
def __init__(self, shape, dtype=float, memptr=None, strides=None, order='C'): ...
248
def get(self, stream=None, order='C', out=None): ...
249
def set(self, arr, stream=None): ...
250
def copy(self, order='C'): ...
251
def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...
252
253
class ufunc:
254
"""Universal function for element-wise operations on arrays."""
255
def __call__(self, *args, **kwargs): ...
256
def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...
257
def accumulate(self, a, axis=0, dtype=None, out=None): ...
258
259
def asnumpy(a, stream=None, order='C', out=None, *, blocking=True) -> numpy.ndarray:
260
"""Convert CuPy array to NumPy array on CPU."""
261
262
def get_array_module(*args):
263
"""Return cupy if any argument is a CuPy array, otherwise numpy."""
264
```