0
# CuPy
1
2
CuPy is a NumPy & SciPy-compatible array library for GPU-accelerated computing with Python. It provides a drop-in replacement for NumPy arrays and mathematical functions, enabling existing NumPy/SciPy code to run on NVIDIA CUDA GPUs without modification while achieving significant performance improvements through GPU parallelization.
3
4
## Package Information
5
6
- **Package Name**: cupy-cuda113
7
- **Language**: Python
8
- **Installation**: `pip install cupy-cuda113`
9
- **License**: MIT
10
- **Documentation**: https://docs.cupy.dev/en/stable/
11
12
## Core Imports
13
14
```python
15
import cupy as cp
16
```
17
18
Common patterns for array creation and operations:
19
20
```python
21
import cupy as cp
22
import numpy as np
23
24
# Use CuPy as drop-in replacement for NumPy
25
arr = cp.array([1, 2, 3])
26
result = cp.sum(arr)
27
```
28
29
For CPU/GPU generic code:
30
31
```python
32
import cupy as cp
33
34
# Automatically select NumPy or CuPy based on input arrays
35
def generic_function(x):
36
xp = cp.get_array_module(x) # Returns cp or np
37
return xp.sum(x)
38
```
39
40
## Basic Usage
41
42
```python
43
import cupy as cp
44
import numpy as np
45
46
# Create CuPy arrays (stored in GPU memory)
47
gpu_array = cp.array([1, 2, 3, 4, 5])
48
gpu_zeros = cp.zeros((3, 4))
49
gpu_random = cp.random.random((1000, 1000))
50
51
# Perform GPU-accelerated operations
52
result = cp.sum(gpu_array)
53
matrix_mult = cp.dot(gpu_random, gpu_random.T)
54
55
# Transfer between GPU and CPU
56
cpu_array = cp.asnumpy(gpu_array) # GPU -> CPU
57
gpu_from_cpu = cp.asarray(cpu_array) # CPU -> GPU
58
59
# Use with existing NumPy code - just change np to cp
60
x = cp.linspace(0, 2 * cp.pi, 1000)
61
y = cp.sin(x)
62
fft_result = cp.fft.fft(y)
63
64
# Memory management
65
pool = cp.get_default_memory_pool()
66
print(f"Used: {pool.used_bytes()}, Total: {pool.total_bytes()}")
67
```
68
69
## Architecture
70
71
CuPy's architecture enables seamless GPU acceleration:
72
73
- **Core Arrays**: `cupy.ndarray` provides NumPy-compatible arrays in GPU memory
74
- **CUDA Integration**: Direct access to CUDA streams, memory management, and device control
75
- **Automatic Memory Management**: Built-in memory pools for efficient GPU memory allocation
76
- **Kernel System**: Custom CUDA kernels through ElementwiseKernel, ReductionKernel, and RawKernel
77
- **Library Integration**: GPU-accelerated versions of cuBLAS, cuFFT, cuRAND, cuSOLVER, cuSPARSE
78
- **Extensions**: CuPy-X provides additional functionality including SciPy compatibility and JIT compilation
79
80
This design allows CuPy to serve as a complete GPU computing platform while maintaining NumPy API compatibility.
81
82
## Capabilities
83
84
### Array Creation and Manipulation
85
86
Core array creation functions and array manipulation operations that mirror NumPy's interface but operate on GPU memory, including shape manipulation, joining, splitting, and element rearrangement.
87
88
```python { .api }
89
def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
90
def zeros(shape, dtype=float64, order='C'): ...
91
def ones(shape, dtype=None, order='C'): ...
92
def empty(shape, dtype=float64, order='C'): ...
93
def arange(start, stop=None, step=1, dtype=None): ...
94
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
95
def reshape(a, newshape, order='C'): ...
96
def concatenate(arrays, axis=0, out=None): ...
97
def transpose(a, axes=None): ...
98
```
99
100
[Array Operations](./array-operations.md)
101
102
### Mathematical Functions
103
104
Comprehensive mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and complex number functions, all GPU-accelerated and compatible with NumPy's mathematical function interface.
105
106
```python { .api }
107
def sin(x, out=None): ...
108
def cos(x, out=None): ...
109
def exp(x, out=None): ...
110
def log(x, out=None): ...
111
def add(x1, x2, out=None): ...
112
def multiply(x1, x2, out=None): ...
113
def sqrt(x, out=None): ...
114
def power(x1, x2, out=None): ...
115
```
116
117
[Mathematical Functions](./math-functions.md)
118
119
### Linear Algebra
120
121
GPU-accelerated linear algebra operations including matrix products, decompositions, eigenvalue computations, and system solving through cuBLAS and cuSOLVER integration.
122
123
```python { .api }
124
def dot(a, b, out=None): ...
125
def matmul(x1, x2, out=None): ...
126
def einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe', optimize=False): ...
127
```
128
129
From `cupy.linalg`:
130
131
```python { .api }
132
def norm(x, ord=None, axis=None, keepdims=False): ...
133
def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...
134
def solve(a, b): ...
135
def inv(a): ...
136
```
137
138
[Linear Algebra](./linear-algebra.md)
139
140
### CUDA Integration
141
142
Direct CUDA functionality including device management, stream control, memory management, and custom kernel execution for advanced GPU programming and performance optimization.
143
144
```python { .api }
145
class Device:
146
def __init__(self, device=None): ...
147
def __enter__(self): ...
148
def __exit__(self, *args): ...
149
150
class Stream:
151
def __init__(self, null=False, non_blocking=False, ptds=False): ...
152
def synchronize(self): ...
153
154
class MemoryPool:
155
def malloc(self, size): ...
156
def free_all_blocks(self): ...
157
def used_bytes(self): ...
158
```
159
160
[CUDA Integration](./cuda-integration.md)
161
162
### FFT Operations
163
164
Fast Fourier Transform operations through cuFFT integration, providing GPU-accelerated 1D, 2D, and N-dimensional transforms for both real and complex data.
165
166
```python { .api }
167
def fft(a, n=None, axis=-1, norm=None): ...
168
def ifft(a, n=None, axis=-1, norm=None): ...
169
def rfft(a, n=None, axis=-1, norm=None): ...
170
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
171
def fftn(a, s=None, axes=None, norm=None): ...
172
```
173
174
[FFT Operations](./fft-operations.md)
175
176
### Random Number Generation
177
178
GPU-accelerated random number generation through cuRAND integration, supporting various probability distributions and random sampling operations with high performance on GPU.
179
180
```python { .api }
181
def random(size=None): ...
182
def randn(*size): ...
183
def randint(low, high=None, size=None, dtype='l'): ...
184
def normal(loc=0.0, scale=1.0, size=None): ...
185
def uniform(low=0.0, high=1.0, size=None): ...
186
```
187
188
[Random Number Generation](./random-generation.md)
189
190
### Custom Kernels
191
192
User-defined CUDA kernel creation through ElementwiseKernel, ReductionKernel, and RawKernel classes, enabling custom GPU operations and performance-critical computations.
193
194
```python { .api }
195
class ElementwiseKernel:
196
def __init__(self, in_params, out_params, operation, name='kernel', **kwargs): ...
197
def __call__(self, *args, **kwargs): ...
198
199
class ReductionKernel:
200
def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr='', **kwargs): ...
201
def __call__(self, *args, **kwargs): ...
202
203
class RawKernel:
204
def __init__(self, code, name, **kwargs): ...
205
def __call__(self, grid, block, args, **kwargs): ...
206
```
207
208
[Custom Kernels](./custom-kernels.md)
209
210
### Statistical Functions
211
212
Statistical operations and analyses including descriptive statistics, correlations, histograms, and probability computations, all optimized for GPU execution.
213
214
```python { .api }
215
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
216
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
217
def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
218
def corrcoef(x, y=None, rowvar=True, bias=None, ddof=None): ...
219
def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): ...
220
```
221
222
[Statistical Functions](./statistical-functions.md)
223
224
### CuPy Extensions
225
226
Additional functionality through CuPy-X including SciPy compatibility, JIT compilation, specialized operations, and advanced GPU programming features.
227
228
```python { .api }
229
def scatter_add(a, indices, b, axis=None): ...
230
def rsqrt(x, out=None): ...
231
```
232
233
From `cupyx.jit`:
234
235
```python { .api }
236
def rawkernel(mode='python', device=False): ...
237
```
238
239
From `cupyx.scipy.sparse`:
240
241
```python { .api }
242
class csr_matrix: ...
243
class csc_matrix: ...
244
```
245
246
[CuPy Extensions](./cupy-extensions.md)
247
248
## Types
249
250
```python { .api }
251
class ndarray:
252
"""
253
CuPy's core N-dimensional array class, stored in GPU memory.
254
Compatible with NumPy arrays but operations run on GPU.
255
"""
256
def __init__(self, shape, dtype=float64, memptr=None, strides=None, order='C'): ...
257
def get(self, stream=None, order='C'): ... # Transfer to CPU
258
def set(self, arr, stream=None): ... # Transfer from CPU
259
@property
260
def device(self): ...
261
@property
262
def dtype(self): ...
263
@property
264
def shape(self): ...
265
@property
266
def size(self): ...
267
268
class ufunc:
269
"""Universal function class for element-wise operations on GPU arrays."""
270
def __call__(self, *args, **kwargs): ...
271
def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...
272
def accumulate(self, a, axis=0, dtype=None, out=None): ...
273
```