0
# CuPy: NumPy & SciPy for GPU
1
2
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. It acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms, providing high-performance computing capabilities while maintaining familiar NumPy syntax and semantics.
3
4
## Package Information
5
6
- **Package Name**: cupy-cuda114
7
- **Package Type**: Library
8
- **Language**: Python
9
- **Installation**: `pip install cupy-cuda114`
10
- **CUDA Target**: CUDA 11.4
11
12
## Core Imports
13
14
```python
15
import cupy as cp
16
```
17
18
For compatibility with NumPy code:
19
20
```python
21
import cupy as np # Drop-in replacement for numpy
22
```
23
24
Access specific modules:
25
26
```python
27
from cupy import cuda, random, linalg, fft
28
```
29
30
## Basic Usage
31
32
```python
33
import cupy as cp
34
35
# Create arrays on GPU
36
x = cp.arange(6).reshape(2, 3).astype('f')
37
print(x)
38
# array([[ 0., 1., 2.],
39
# [ 3., 4., 5.]], dtype=float32)
40
41
# Perform computations on GPU
42
result = x.sum(axis=1)
43
print(result)
44
# array([ 3., 12.], dtype=float32)
45
46
# Convert to NumPy array on CPU
47
cpu_result = cp.asnumpy(result)
48
49
# Mathematical operations
50
y = cp.sin(x) * cp.cos(x)
51
z = cp.sqrt(x**2 + y**2)
52
53
# Linear algebra
54
A = cp.random.random((1000, 1000))
55
B = cp.random.random((1000, 1000))
56
C = cp.dot(A, B) # Matrix multiplication on GPU
57
58
# Fast Fourier Transform
59
signal = cp.random.random(1024)
60
fft_result = cp.fft.fft(signal)
61
```
62
63
## Architecture
64
65
CuPy provides a comprehensive GPU computing ecosystem:
66
67
- **Core Arrays**: GPU-accelerated `ndarray` objects with NumPy-compatible API
68
- **Mathematical Functions**: Complete set of element-wise operations, reductions, and special functions
69
- **Linear Algebra**: High-performance matrix operations using cuBLAS and cuSOLVER
70
- **CUDA Integration**: Direct access to CUDA features including memory management, streams, and custom kernels
71
- **SciPy Extensions**: GPU implementations of SciPy functionality for sparse matrices, signal processing, and more
72
- **Performance Tools**: Memory pools, kernel fusion, JIT compilation, and profiling utilities
73
74
## Capabilities
75
76
### Array Creation and Manipulation
77
78
Comprehensive array creation functions and manipulation operations compatible with NumPy, enabling easy migration of existing code to GPU acceleration.
79
80
```python { .api }
81
def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
82
def zeros(shape, dtype=float32, order='C'): ...
83
def ones(shape, dtype=None, order='C'): ...
84
def empty(shape, dtype=float32, order='C'): ...
85
def arange(start, stop=None, step=1, dtype=None): ...
86
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
87
def reshape(a, newshape, order='C'): ...
88
def transpose(a, axes=None): ...
89
def concatenate(arrays, axis=0, out=None, dtype=None, casting="same_kind"): ...
90
```
91
92
[Array Operations](./array-operations.md)
93
94
### Mathematical Functions
95
96
Complete set of mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and special functions, all optimized for GPU execution.
97
98
```python { .api }
99
def sin(x, out=None, **kwargs): ...
100
def cos(x, out=None, **kwargs): ...
101
def exp(x, out=None, **kwargs): ...
102
def log(x, out=None, **kwargs): ...
103
def sqrt(x, out=None, **kwargs): ...
104
def add(x1, x2, out=None, **kwargs): ...
105
def multiply(x1, x2, out=None, **kwargs): ...
106
def sum(a, axis=None, dtype=None, out=None, keepdims=False, initial=None, where=None): ...
107
def mean(a, axis=None, dtype=None, out=None, keepdims=False, where=None): ...
108
```
109
110
[Mathematical Functions](./mathematical-functions.md)
111
112
### Linear Algebra
113
114
High-performance linear algebra operations leveraging cuBLAS and cuSOLVER libraries for matrix operations, decompositions, and solving linear systems.
115
116
```python { .api }
117
def dot(a, b, out=None): ...
118
def matmul(x1, x2, out=None, **kwargs): ...
119
def norm(x, ord=None, axis=None, keepdims=False): ...
120
def solve(a, b): ...
121
def inv(a): ...
122
def svd(a, full_matrices=True, compute_uv=True): ...
123
def eigh(a, UPLO='L'): ...
124
def cholesky(a): ...
125
```
126
127
[Linear Algebra](./linear-algebra.md)
128
129
### CUDA Integration
130
131
Direct access to CUDA features including device management, memory allocation, streams, events, and custom kernel compilation for advanced GPU programming.
132
133
```python { .api }
134
class Device:
135
def __init__(self, device=None): ...
136
137
class Stream:
138
def __init__(self, null=False, non_blocking=False, ptds=False): ...
139
140
class MemoryPool:
141
def __init__(self, allocator=None): ...
142
143
def compile_with_cache(source, name, options=(), arch=None, cachdir=None, prepend_cupy_headers=True, backend='nvcc', translate_cucomplex=True, enable_cooperative_groups=False, name_expressions=None, log_stream=None, cache_in_memory=False, jitify=False): ...
144
```
145
146
[CUDA Integration](./cuda-integration.md)
147
148
### Random Number Generation
149
150
Comprehensive random number generation capabilities with support for multiple algorithms, distributions, and GPU-accelerated sampling for scientific computing and simulation.
151
152
```python { .api }
153
def random(size=None): ...
154
def randn(*args): ...
155
def randint(low, high=None, size=None, dtype=int): ...
156
def normal(loc=0.0, scale=1.0, size=None): ...
157
def uniform(low=0.0, high=1.0, size=None): ...
158
def choice(a, size=None, replace=True, p=None): ...
159
160
class Generator:
161
def __init__(self, bit_generator): ...
162
```
163
164
[Random Number Generation](./random-generation.md)
165
166
### Fast Fourier Transform
167
168
GPU-accelerated FFT operations supporting 1D, 2D, and N-dimensional transforms for both complex and real data with comprehensive frequency domain processing capabilities.
169
170
```python { .api }
171
def fft(a, n=None, axis=-1, norm=None): ...
172
def ifft(a, n=None, axis=-1, norm=None): ...
173
def rfft(a, n=None, axis=-1, norm=None): ...
174
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
175
def fftn(a, s=None, axes=None, norm=None): ...
176
def fftfreq(n, d=1.0): ...
177
```
178
179
[Fast Fourier Transform](./fft.md)
180
181
### SciPy Extensions
182
183
GPU implementations of SciPy functionality including sparse matrices, signal processing, special functions, statistics, and N-dimensional image processing.
184
185
```python { .api }
186
# Sparse matrices
187
class csr_matrix: ...
188
class csc_matrix: ...
189
class coo_matrix: ...
190
191
# Signal processing
192
def convolve(in1, in2, mode='full'): ...
193
def correlate(in1, in2, mode='full'): ...
194
195
# Special functions
196
def gamma(z): ...
197
def erf(z): ...
198
```
199
200
[SciPy Extensions](./scipy-extensions.md)
201
202
### JIT Compilation and Custom Kernels
203
204
Just-in-time compilation capabilities and custom CUDA kernel creation for performance-critical applications requiring low-level GPU programming.
205
206
```python { .api }
207
class ElementwiseKernel:
208
def __init__(self, in_params, out_params, operation, name='kernel', reduce_dims=True, **kwargs): ...
209
210
class ReductionKernel:
211
def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr='', identity=None, name='kernel', reduce_type=None, reduce_dims=True, **kwargs): ...
212
213
class RawKernel:
214
def __init__(self, code, name, **kwargs): ...
215
216
@rawkernel()
217
def my_kernel(x, y, size): ...
218
```
219
220
[JIT and Custom Kernels](./jit-kernels.md)
221
222
### Statistics and Data Analysis
223
224
Statistical functions and data analysis tools including descriptive statistics, correlation analysis, and histogram computation, all optimized for large-scale GPU processing.
225
226
```python { .api }
227
def mean(a, axis=None, dtype=None, out=None, keepdims=False, where=None): ...
228
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, where=None): ...
229
def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, where=None): ...
230
def corrcoef(x, y=None, rowvar=True, bias=None, ddof=None, fweights=None, aweights=None): ...
231
def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): ...
232
def percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False): ...
233
```
234
235
[Statistics](./statistics.md)
236
237
### Logic and Comparison Operations
238
239
Comprehensive logical operations, element-wise comparisons, truth value testing, and content validation functions for array processing and conditional operations.
240
241
```python { .api }
242
def equal(x1, x2, out=None, **kwargs): ...
243
def not_equal(x1, x2, out=None, **kwargs): ...
244
def less(x1, x2, out=None, **kwargs): ...
245
def greater(x1, x2, out=None, **kwargs): ...
246
def logical_and(x1, x2, out=None, **kwargs): ...
247
def all(a, axis=None, out=None, keepdims=False, where=None): ...
248
def any(a, axis=None, out=None, keepdims=False, where=None): ...
249
def isfinite(x, out=None, **kwargs): ...
250
def isnan(x, out=None, **kwargs): ...
251
```
252
253
[Logic Operations](./logic-operations.md)
254
255
### Indexing and Selection
256
257
Advanced indexing, selection, and extraction operations including fancy indexing, boolean indexing, and element insertion for flexible array manipulation.
258
259
```python { .api }
260
def take(a, indices, axis=None, out=None, mode='raise'): ...
261
def choose(a, choices, out=None, mode='raise'): ...
262
def where(condition, x=None, y=None): ...
263
def extract(condition, arr): ...
264
def nonzero(a): ...
265
def argmax(a, axis=None, out=None, keepdims=False): ...
266
def argmin(a, axis=None, out=None, keepdims=False): ...
267
```
268
269
[Indexing and Selection](./indexing-selection.md)
270
271
### Input/Output Operations
272
273
File I/O operations supporting NumPy's binary formats (NPZ) and text formats with GPU-optimized loading and saving capabilities.
274
275
```python { .api }
276
def save(file, arr, allow_pickle=True, fix_imports=True): ...
277
def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'): ...
278
def savez(file, *args, **kwds): ...
279
def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None): ...
280
def loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None): ...
281
```
282
283
[Input/Output](./input-output.md)
284
285
### Testing and Debugging
286
287
Comprehensive testing framework with NumPy comparison utilities, array assertions, and performance benchmarking tools for development and validation.
288
289
```python { .api }
290
def assert_allclose(actual, desired, rtol=1e-7, atol=0, err_msg='', verbose=True): ...
291
def assert_array_equal(x, y, err_msg='', verbose=True, strides_check=False): ...
292
293
@numpy_cupy_allclose()
294
def test_function(xp): ...
295
296
@for_all_dtypes()
297
def test_dtypes(dtype): ...
298
```
299
300
[Testing Framework](./testing.md)
301
302
## Core Classes
303
304
```python { .api }
305
class ndarray:
306
"""N-dimensional GPU array object.
307
308
Primary data structure for GPU-accelerated computing with NumPy-compatible interface.
309
"""
310
def __init__(self, shape, dtype=float, buffer=None, offset=0, strides=None, order=None): ...
311
def get(self, stream=None, order='C', out=None): ... # Copy to CPU
312
def set(self, arr, stream=None): ... # Copy from CPU
313
def copy(self, order='C'): ...
314
def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...
315
316
class ufunc:
317
"""Universal function object for element-wise operations."""
318
def __call__(self, *args, **kwargs): ...
319
def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False, initial=None, where=None): ...
320
def accumulate(self, a, axis=0, dtype=None, out=None): ...
321
```
322
323
## Utility Functions
324
325
```python { .api }
326
def asnumpy(a, stream=None, order='C', out=None):
327
"""Convert CuPy array to NumPy array on CPU."""
328
329
def get_array_module(*args):
330
"""Get appropriate array module (cupy or numpy) based on input types."""
331
332
def is_available():
333
"""Check if CUDA is available."""
334
335
def get_default_memory_pool():
336
"""Get default GPU memory pool."""
337
338
def show_config():
339
"""Display CuPy configuration information."""
340
341
def binary_repr(num, width=None):
342
"""Return binary representation of input number as string."""
343
344
def base_repr(number, base=2, padding=0):
345
"""Return string representation of number in given base system."""
346
347
def ndim(a):
348
"""Return number of dimensions of an array."""
349
350
def isscalar(element):
351
"""Return True if the type of element is a scalar type."""
352
353
def fuse(*args, **kwargs):
354
"""Kernel fusion functionality for performance optimization."""
355
356
def clear_memo():
357
"""Clear memoization cache."""
358
359
def memoize(for_each_device=False):
360
"""Memoization decorator for caching function results."""
361
```
362
363
## Data Types
364
365
CuPy supports all NumPy data types:
366
367
```python { .api }
368
# Boolean
369
bool_, bool8
370
371
# Integers
372
int8, int16, int32, int64
373
uint8, uint16, uint32, uint64
374
375
# Floating point
376
float16, float32, float64
377
378
# Complex
379
complex64, complex128
380
381
# Generic type hierarchy
382
number, integer, signedinteger, unsignedinteger
383
floating, complexfloating
384
```