CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python
npx @tessl/cli install tessl/pypi-cupy-cuda12x@12.3.00
# CuPy
1
2
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. It acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA and AMD ROCm platforms, enabling high-performance scientific computing by leveraging GPU parallelism while maintaining full compatibility with existing codebases.
3
4
## Package Information
5
6
- **Package Name**: cupy-cuda12x
7
- **Language**: Python
8
- **Installation**: `pip install cupy-cuda12x`
9
- **CUDA Requirement**: CUDA Toolkit 12.x
10
- **License**: MIT
11
12
## Core Imports
13
14
```python
15
import cupy as cp
16
```
17
18
For specific submodules:
19
20
```python
21
import cupy.cuda as cuda
22
import cupy.random as random
23
import cupy.linalg as linalg
24
import cupy.fft as fft
25
```
26
27
## Basic Usage
28
29
```python
30
import cupy as cp
31
import numpy as np
32
33
# Create arrays on GPU
34
gpu_array = cp.array([1, 2, 3, 4, 5])
35
gpu_zeros = cp.zeros((1000, 1000))
36
gpu_random = cp.random.random((100, 100))
37
38
# NumPy compatibility - same API
39
result = cp.sum(gpu_array)
40
matrix_mult = cp.dot(gpu_random, gpu_random.T)
41
42
# Transfer between GPU and CPU
43
cpu_array = cp.asnumpy(gpu_array) # GPU to CPU
44
gpu_from_numpy = cp.asarray(np.array([1, 2, 3])) # CPU to GPU
45
46
# Memory management
47
memory_pool = cp.get_default_memory_pool()
48
print(f"Used bytes: {memory_pool.used_bytes()}")
49
50
# Context management
51
with cp.cuda.Device(0): # Use specific GPU
52
data = cp.random.random((1000, 1000))
53
result = cp.linalg.svd(data)
54
```
55
56
## Architecture
57
58
CuPy's architecture mirrors NumPy while leveraging GPU acceleration:
59
60
- **cupy.ndarray**: GPU-accelerated multi-dimensional arrays with NumPy-compatible interface
61
- **cupy.cuda**: Low-level CUDA interface for device management, memory allocation, and kernel execution
62
- **cupy.random**: GPU-accelerated random number generation compatible with numpy.random
63
- **cupy.linalg**: Linear algebra operations using cuBLAS and cuSOLVER
64
- **cupy.fft**: Fast Fourier Transform operations using cuFFT
65
- **Custom Kernels**: ElementwiseKernel, ReductionKernel, and RawKernel for custom GPU operations
66
67
The design provides seamless NumPy compatibility while offering direct access to CUDA features for performance optimization.
68
69
## Capabilities
70
71
### Array Creation and Manipulation
72
73
Comprehensive array creation functions matching NumPy's API, including basic creation (zeros, ones, empty), data conversion (array, asarray), ranges (arange, linspace), and matrix creation (eye, diag). All functions create arrays directly on GPU memory.
74
75
```python { .api }
76
def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
77
def zeros(shape, dtype=float, order='C'): ...
78
def ones(shape, dtype=None, order='C'): ...
79
def empty(shape, dtype=float, order='C'): ...
80
def arange(start, stop=None, step=1, dtype=None): ...
81
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
82
```
83
84
[Array Operations](./array-operations.md)
85
86
### Mathematical Functions
87
88
Element-wise mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and special functions. All functions are GPU-accelerated and maintain NumPy compatibility.
89
90
```python { .api }
91
def sin(x, out=None, **kwargs): ...
92
def cos(x, out=None, **kwargs): ...
93
def exp(x, out=None, **kwargs): ...
94
def log(x, out=None, **kwargs): ...
95
def add(x1, x2, out=None, **kwargs): ...
96
def multiply(x1, x2, out=None, **kwargs): ...
97
```
98
99
[Mathematical Functions](./math-functions.md)
100
101
### Linear Algebra
102
103
GPU-accelerated linear algebra operations using cuBLAS and cuSOLVER, including matrix multiplication, decompositions, eigenvalue problems, and system solving.
104
105
```python { .api }
106
def dot(a, b, out=None): ...
107
def matmul(x1, x2, out=None, **kwargs): ...
108
def solve(a, b): ...
109
def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...
110
def eigh(a, UPLO='L'): ...
111
```
112
113
[Linear Algebra](./linear-algebra.md)
114
115
### Random Number Generation
116
117
GPU-accelerated random number generation compatible with numpy.random, supporting multiple distributions and modern Generator API with various bit generators (XORWOW, MRG32k3a, Philox).
118
119
```python { .api }
120
def random(size=None): ...
121
def normal(loc=0.0, scale=1.0, size=None): ...
122
def uniform(low=0.0, high=1.0, size=None): ...
123
def choice(a, size=None, replace=True, p=None): ...
124
def default_rng(seed=None): ...
125
```
126
127
[Random Numbers](./random-numbers.md)
128
129
### Fast Fourier Transform
130
131
GPU-accelerated FFT operations using cuFFT, including 1D, 2D, and N-D transforms for both complex-to-complex and real-to-complex transformations.
132
133
```python { .api }
134
def fft(a, n=None, axis=-1, norm=None): ...
135
def ifft(a, n=None, axis=-1, norm=None): ...
136
def rfft(a, n=None, axis=-1, norm=None): ...
137
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
138
def fftn(a, s=None, axes=None, norm=None): ...
139
```
140
141
[FFT Operations](./fft-operations.md)
142
143
### CUDA Interface
144
145
Direct access to CUDA functionality including device management, memory allocation, streams, events, and custom kernel compilation. Enables fine-grained control over GPU resources and performance optimization.
146
147
```python { .api }
148
def is_available(): ...
149
class Device:
150
def __init__(self, device=None): ...
151
class MemoryPool:
152
def __init__(self, allocator=None): ...
153
class Stream:
154
def __init__(self, null=False, non_blocking=False, ptds=False): ...
155
```
156
157
[CUDA Interface](./cuda-interface.md)
158
159
### Custom Kernels
160
161
Framework for creating custom GPU kernels including ElementwiseKernel for element-wise operations, ReductionKernel for reduction operations, and RawKernel for arbitrary CUDA code.
162
163
```python { .api }
164
class ElementwiseKernel:
165
def __init__(self, in_params, out_params, operation, name='kernel', **kwargs): ...
166
class ReductionKernel:
167
def __init__(self, in_params, out_params, map_expr, reduce_expr, **kwargs): ...
168
class RawKernel:
169
def __init__(self, code, name, options=(), **kwargs): ...
170
```
171
172
[Custom Kernels](./custom-kernels.md)
173
174
### Statistics and Sorting
175
176
Statistical functions including descriptive statistics, correlations, histograms, and sorting operations. All functions handle NaN values appropriately and support axis-specific operations.
177
178
```python { .api }
179
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
180
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
181
def sort(a, axis=-1, kind=None, order=None): ...
182
def argsort(a, axis=-1, kind=None, order=None): ...
183
def histogram(a, bins=10, range=None, weights=None, density=None): ...
184
```
185
186
[Statistics and Sorting](./statistics-sorting.md)
187
188
## Core Utilities
189
190
Essential utility functions for GPU/CPU data transfer and array module selection.
191
192
```python { .api }
193
def asnumpy(a, stream=None, order='C', out=None):
194
"""
195
Convert CuPy array to NumPy array on CPU.
196
197
Parameters:
198
- a: input CuPy array or array-like
199
- stream: CUDA stream for async transfer
200
- order: memory layout ('C', 'F', 'A')
201
- out: output NumPy array
202
203
Returns:
204
numpy.ndarray: array on CPU memory
205
"""
206
207
def get_array_module(*args):
208
"""
209
Return array module (cupy or numpy) based on input types.
210
211
Parameters:
212
- args: values to determine module
213
214
Returns:
215
module: cupy or numpy module
216
"""
217
218
def is_available():
219
"""
220
Check if CUDA is available.
221
222
Returns:
223
bool: True if CUDA devices are available
224
"""
225
```
226
227
## Types
228
229
```python { .api }
230
class ndarray:
231
"""GPU-accelerated multi-dimensional array."""
232
def __init__(self): ...
233
@property
234
def shape(self): ...
235
@property
236
def dtype(self): ...
237
@property
238
def size(self): ...
239
def get(self, stream=None, order='C', out=None): ...
240
def set(self, arr, stream=None): ...
241
242
class ufunc:
243
"""Universal function for element-wise operations."""
244
def __call__(self, *args, **kwargs): ...
245
246
# Data types (from NumPy)
247
bool_ = numpy.bool_
248
int8 = numpy.int8
249
int16 = numpy.int16
250
int32 = numpy.int32
251
int64 = numpy.int64
252
uint8 = numpy.uint8
253
uint16 = numpy.uint16
254
uint32 = numpy.uint32
255
uint64 = numpy.uint64
256
float16 = numpy.float16
257
float32 = numpy.float32
258
float64 = numpy.float64
259
complex64 = numpy.complex64
260
complex128 = numpy.complex128
261
```