0
# CuPy
1
2
CuPy is a NumPy & SciPy-compatible GPU-accelerated computing library that enables high-performance array operations on NVIDIA CUDA GPUs. It provides a drop-in replacement for NumPy, allowing existing NumPy/SciPy code to run on GPUs with minimal modifications while delivering significant performance improvements for large-scale numerical computations.
3
4
## Package Information
5
6
- **Package Name**: cupy-cuda112
7
- **Language**: Python
8
- **Installation**: `pip install cupy-cuda112`
9
- **GPU Requirements**: NVIDIA CUDA 11.2 or compatible
10
- **Homepage**: https://cupy.dev/
11
- **Documentation**: https://docs.cupy.dev/
12
13
## Core Imports
14
15
```python
16
import cupy as cp
17
```
18
19
For CUDA-specific functionality:
20
21
```python
22
import cupy.cuda
23
```
24
25
For SciPy-compatible extensions:
26
27
```python
28
import cupyx.scipy
29
```
30
31
## Basic Usage
32
33
```python
34
import cupy as cp
35
import numpy as np
36
37
# Create arrays on GPU
38
gpu_array = cp.array([1, 2, 3, 4, 5])
39
gpu_zeros = cp.zeros((3, 4))
40
gpu_random = cp.random.random((1000, 1000))
41
42
# Array operations (executed on GPU)
43
result = cp.sqrt(gpu_array)
44
matrix_mult = cp.dot(gpu_random, gpu_random.T)
45
46
# Convert back to NumPy for CPU operations
47
cpu_result = cp.asnumpy(result)
48
49
# Memory pool management
50
mempool = cp.get_default_memory_pool()
51
print(f"Used bytes: {mempool.used_bytes()}")
52
print(f"Total bytes: {mempool.total_bytes()}")
53
54
# Check GPU availability
55
if cp.cuda.is_available():
56
print(f"CUDA devices available: {cp.cuda.runtime.getDeviceCount()}")
57
```
58
59
## Architecture
60
61
CuPy's architecture mirrors NumPy while adding GPU-specific capabilities:
62
63
- **Core Arrays**: `cupy.ndarray` provides GPU-accelerated N-dimensional arrays with NumPy-compatible interface
64
- **Universal Functions**: GPU-accelerated element-wise operations through `cupy.ufunc`
65
- **Memory Management**: Automatic memory pooling with configurable allocators for optimal GPU memory usage
66
- **CUDA Integration**: Direct access to CUDA streams, events, memory management, and custom kernel compilation
67
- **Custom Kernels**: Support for user-defined CUDA kernels through `RawKernel`, `ElementwiseKernel`, and `ReductionKernel`
68
- **Multi-GPU**: Support for multi-GPU computation and memory management
69
- **CuPy Extensions (cupyx)**: Additional functionality including SciPy compatibility, profiling, JIT compilation, and advanced linear algebra
70
71
This design enables seamless migration from NumPy-based code to GPU-accelerated computation while providing advanced CUDA programming capabilities for performance-critical applications.
72
73
## Capabilities
74
75
### Array Creation and Manipulation
76
77
Core functionality for creating, reshaping, and manipulating N-dimensional arrays on GPU, providing NumPy-compatible array creation routines with GPU memory allocation.
78
79
```python { .api }
80
def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
81
def zeros(shape, dtype=float, order='C'): ...
82
def ones(shape, dtype=float, order='C'): ...
83
def empty(shape, dtype=float, order='C'): ...
84
def arange(start, stop=None, step=1, dtype=None): ...
85
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None): ...
86
def reshape(a, newshape, order='C'): ...
87
def transpose(a, axes=None): ...
88
def concatenate(arrays, axis=0, out=None): ...
89
```
90
91
[Array Operations](./array-operations.md)
92
93
### Mathematical Functions
94
95
Comprehensive collection of mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, and arithmetic functions optimized for GPU execution.
96
97
```python { .api }
98
def sin(x, out=None, **kwargs): ...
99
def cos(x, out=None, **kwargs): ...
100
def exp(x, out=None, **kwargs): ...
101
def log(x, out=None, **kwargs): ...
102
def sqrt(x, out=None, **kwargs): ...
103
def add(x1, x2, out=None, **kwargs): ...
104
def multiply(x1, x2, out=None, **kwargs): ...
105
def sum(a, axis=None, dtype=None, out=None, keepdims=False): ...
106
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
107
```
108
109
[Mathematical Operations](./math-operations.md)
110
111
### Linear Algebra
112
113
GPU-accelerated linear algebra operations including matrix multiplication, decompositions, eigenvalue computation, and equation solving using cuBLAS and cuSOLVER.
114
115
```python { .api }
116
def dot(a, b, out=None): ...
117
def matmul(x1, x2, out=None): ...
118
def linalg.svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...
119
def linalg.eigh(a, UPLO='L'): ...
120
def linalg.solve(a, b): ...
121
def linalg.inv(a): ...
122
def linalg.norm(x, ord=None, axis=None, keepdims=False): ...
123
def einsum(subscripts, *operands, **kwargs): ...
124
```
125
126
[Linear Algebra](./linear-algebra.md)
127
128
### Random Number Generation
129
130
GPU-accelerated random number generation supporting multiple bit generators and probability distributions for statistical computing and simulation.
131
132
```python { .api }
133
def random.random(size=None, dtype=float): ...
134
def random.rand(*args): ...
135
def random.randn(*args): ...
136
def random.randint(low, high=None, size=None, dtype=int): ...
137
def random.normal(loc=0.0, scale=1.0, size=None): ...
138
def random.uniform(low=0.0, high=1.0, size=None): ...
139
class random.Generator: ...
140
def random.default_rng(seed=None): ...
141
```
142
143
[Random Number Generation](./random-generation.md)
144
145
### CUDA Integration
146
147
Direct interface to CUDA runtime, memory management, stream processing, and custom kernel development for advanced GPU programming.
148
149
```python { .api }
150
class cuda.Device: ...
151
def cuda.get_device_id(): ...
152
class cuda.MemoryPool: ...
153
class cuda.Stream: ...
154
class cuda.Event: ...
155
def cuda.compile_with_cache(source, options=(), **kwargs): ...
156
class ElementwiseKernel: ...
157
class RawKernel: ...
158
```
159
160
[CUDA Interface](./cuda-interface.md)
161
162
### Fast Fourier Transform
163
164
GPU-accelerated FFT operations for signal processing and frequency domain analysis using cuFFT library.
165
166
```python { .api }
167
def fft.fft(a, n=None, axis=-1, norm=None): ...
168
def fft.ifft(a, n=None, axis=-1, norm=None): ...
169
def fft.fft2(a, s=None, axes=(-2, -1), norm=None): ...
170
def fft.fftn(a, s=None, axes=None, norm=None): ...
171
def fft.rfft(a, n=None, axis=-1, norm=None): ...
172
def fft.fftfreq(n, d=1.0): ...
173
```
174
175
[FFT Operations](./fft-operations.md)
176
177
### SciPy Compatibility
178
179
Extended functionality providing SciPy-compatible operations for sparse matrices, signal processing, image processing, and specialized mathematical functions.
180
181
```python { .api }
182
import cupyx.scipy.sparse
183
import cupyx.scipy.ndimage
184
import cupyx.scipy.signal
185
import cupyx.scipy.special
186
import cupyx.scipy.linalg
187
def cupyx.scipy.sparse.csr_matrix(arg1, shape=None, dtype=None, copy=False): ...
188
def cupyx.scipy.ndimage.gaussian_filter(input, sigma, **kwargs): ...
189
```
190
191
[SciPy Extensions](./scipy-extensions.md)
192
193
### Input/Output Operations
194
195
File I/O operations for saving and loading arrays in various formats including NumPy's .npy and .npz formats.
196
197
```python { .api }
198
def save(file, arr, allow_pickle=True, fix_imports=True): ...
199
def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'): ...
200
def savez(file, *args, **kwds): ...
201
def savez_compressed(file, *args, **kwds): ...
202
def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\\n', header='', footer='', comments='# ', encoding=None): ...
203
```
204
205
[Input/Output](./input-output.md)
206
207
## Types
208
209
```python { .api }
210
class ndarray:
211
"""N-dimensional array object on GPU memory"""
212
def __init__(self, shape, dtype=float, buffer=None, offset=0, strides=None, order=None): ...
213
def get(self, stream=None, order='C', out=None): ... # Transfer to CPU
214
def set(self, arr, stream=None): ... # Transfer from CPU
215
@property
216
def device(self): ...
217
@property
218
def data(self): ...
219
@property
220
def shape(self): ...
221
@property
222
def dtype(self): ...
223
224
class ufunc:
225
"""Universal function for element-wise operations"""
226
def __call__(self, *args, **kwargs): ...
227
def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...
228
def accumulate(self, a, axis=0, dtype=None, out=None): ...
229
230
# Memory management types
231
class cuda.MemoryPointer: ...
232
class cuda.Memory: ...
233
class cuda.MemoryPool: ...
234
class cuda.PinnedMemory: ...
235
236
# Stream and event types
237
class cuda.Stream: ...
238
class cuda.Event: ...
239
class cuda.Device: ...
240
241
# Custom kernel types
242
class ElementwiseKernel: ...
243
class ReductionKernel: ...
244
class RawKernel: ...
245
class RawModule: ...
246
```