0
# CuPy
1
2
CuPy is a NumPy/SciPy-compatible array library that enables GPU-accelerated computing with Python. It provides identical APIs to NumPy and SciPy while leveraging GPU parallelism for significant performance improvements on NVIDIA CUDA platforms. CuPy serves as a drop-in replacement for NumPy operations, featuring seamless CPU/GPU data transfer, custom CUDA kernel integration, and comprehensive mathematical operations including linear algebra, FFT, sparse matrices, and random number generation.
3
4
## Package Information
5
6
- **Package Name**: cupy-cuda111
7
- **Language**: Python
8
- **Installation**: `pip install cupy-cuda111`
9
- **CUDA Version**: 11.1
10
- **Homepage**: https://cupy.dev/
11
- **Documentation**: https://docs.cupy.dev/en/stable/
12
13
## Core Imports
14
15
```python
16
import cupy as cp
17
```
18
19
Common imports for specific functionality:
20
21
```python
22
# Core array operations (main namespace)
23
import cupy as cp
24
25
# GPU memory management
26
import cupy.cuda as cuda
27
28
# Linear algebra
29
import cupy.linalg as linalg
30
31
# Random number generation
32
import cupy.random as random
33
34
# Fast Fourier Transform
35
import cupy.fft as fft
36
37
# SciPy-compatible functions
38
import cupyx.scipy as scipy
39
40
# Sparse matrices (updated path)
41
import cupyx.scipy.sparse as sparse
42
43
# Testing utilities
44
import cupy.testing as testing
45
```
46
47
## Basic Usage
48
49
```python
50
import cupy as cp
51
import numpy as np
52
53
# Create arrays on GPU
54
gpu_array = cp.array([1, 2, 3, 4, 5])
55
gpu_zeros = cp.zeros((3, 4))
56
gpu_random = cp.random.random((100, 100))
57
58
# NumPy-compatible operations on GPU
59
result = cp.sin(gpu_array) + cp.cos(gpu_array)
60
matrix_mult = cp.dot(gpu_random, gpu_random.T)
61
62
# Transfer between CPU and GPU
63
cpu_data = np.array([1, 2, 3, 4, 5])
64
gpu_data = cp.asarray(cpu_data) # CPU to GPU
65
back_to_cpu = cp.asnumpy(gpu_data) # GPU to CPU
66
67
# Memory management
68
mempool = cp.get_default_memory_pool()
69
print(f"Used bytes: {mempool.used_bytes()}")
70
print(f"Total bytes: {mempool.total_bytes()}")
71
72
# Check GPU availability
73
if cp.cuda.is_available():
74
print(f"GPU device: {cp.cuda.Device().id}")
75
```
76
77
## Architecture
78
79
CuPy's architecture mirrors NumPy while adding GPU acceleration:
80
81
- **ndarray**: Core GPU array class providing NumPy-compatible interface
82
- **CUDA Integration**: Direct access to CUDA runtime, memory management, and custom kernels
83
- **Universal Functions (ufuncs)**: Element-wise operations optimized for GPU execution
84
- **Memory Pools**: Efficient GPU memory allocation and reuse
85
- **Stream Management**: Asynchronous execution and multi-stream operations
86
- **Custom Kernels**: Integration of user-defined CUDA kernels via ElementwiseKernel, ReductionKernel, and RawKernel
87
88
This design provides seamless NumPy compatibility while unlocking GPU performance for scientific computing, machine learning, and data analysis workloads.
89
90
## Capabilities
91
92
### Array Creation and Manipulation
93
94
Comprehensive array creation functions, shape manipulation, indexing, and data type operations. Provides all NumPy array creation patterns with GPU acceleration.
95
96
```python { .api }
97
# Basic creation
98
def zeros(shape, dtype=float, order='C'): ...
99
def ones(shape, dtype=None, order='C'): ...
100
def empty(shape, dtype=float, order='C'): ...
101
def full(shape, fill_value, dtype=None, order='C'): ...
102
def arange(start, stop=None, step=1, dtype=None): ...
103
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
104
105
# From data
106
def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
107
def asarray(a, dtype=None, order=None): ...
108
def asanyarray(a, dtype=None, order=None): ...
109
110
# Shape manipulation
111
def reshape(a, newshape, order='C'): ...
112
def ravel(a, order='C'): ...
113
def transpose(a, axes=None): ...
114
def moveaxis(a, source, destination): ...
115
def expand_dims(a, axis): ...
116
def squeeze(a, axis=None): ...
117
```
118
119
[Array Operations](./array-operations.md)
120
121
### Mathematical Functions
122
123
Complete mathematical function library including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and special functions optimized for GPU execution.
124
125
```python { .api }
126
# Trigonometric
127
def sin(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
128
def cos(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
129
def tan(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
130
131
# Exponential and logarithmic
132
def exp(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
133
def log(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
134
def sqrt(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
135
136
# Arithmetic
137
def add(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
138
def multiply(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
139
def power(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True): ...
140
```
141
142
[Mathematical Functions](./mathematical-functions.md)
143
144
### Linear Algebra
145
146
GPU-accelerated linear algebra operations including matrix multiplication, decompositions, eigenvalue problems, and solving linear systems using cuBLAS and cuSOLVER.
147
148
```python { .api }
149
# Matrix products
150
def dot(a, b, out=None): ...
151
def matmul(x1, x2, /, out=None, *, casting='same_kind', order='K', dtype=None, subok=True): ...
152
def einsum(subscripts, *operands, **kwargs): ...
153
def tensordot(a, b, axes=2): ...
154
155
# Decompositions
156
def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...
157
def qr(a, mode='reduced'): ...
158
def cholesky(a): ...
159
160
# Eigenvalues
161
def eigh(a, UPLO='L'): ...
162
def eigvalsh(a, UPLO='L'): ...
163
164
# Linear systems
165
def solve(a, b): ...
166
def inv(a): ...
167
def pinv(a, rcond=1e-15, hermitian=False): ...
168
```
169
170
[Linear Algebra](./linear-algebra.md)
171
172
### Random Number Generation
173
174
Comprehensive random number generation using GPU-optimized generators with support for various distributions, modern generator APIs, and advanced bit generators.
175
176
```python { .api }
177
# Modern generator API
178
def default_rng(seed=None): ...
179
class Generator:
180
def random(self, size=None, dtype=float32, out=None): ...
181
def integers(self, low, high=None, size=None, dtype=int64, endpoint=False): ...
182
183
class BitGenerator: ...
184
class XORWOW(BitGenerator): ...
185
class MRG32k3a(BitGenerator): ...
186
class Philox4x3210(BitGenerator): ...
187
188
# Legacy API
189
def seed(seed=None): ...
190
def get_random_state(): ...
191
class RandomState: ...
192
193
# Simple random data
194
def rand(*args): ...
195
def randn(*args): ...
196
def randint(low, high=None, size=None, dtype=int): ...
197
def random_sample(size=None): ...
198
def choice(a, size=None, replace=True, p=None): ...
199
200
# Distributions
201
def normal(loc=0.0, scale=1.0, size=None): ...
202
def uniform(low=0.0, high=1.0, size=None): ...
203
def exponential(scale=1.0, size=None): ...
204
def poisson(lam=1.0, size=None): ...
205
def gamma(shape, scale=1.0, size=None): ...
206
def beta(a, b, size=None): ...
207
def binomial(n, p, size=None): ...
208
209
# Multivariate distributions
210
def multivariate_normal(mean, cov, size=None, check_valid='warn', tol=1e-8): ...
211
def dirichlet(alpha, size=None): ...
212
213
# Permutations
214
def shuffle(x): ...
215
def permutation(x): ...
216
```
217
218
[Random Number Generation](./random-generation.md)
219
220
### Fast Fourier Transform
221
222
GPU-accelerated FFT operations using cuFFT for high-performance frequency domain analysis with comprehensive support for real and complex transforms in 1D, 2D, and N-dimensional cases.
223
224
```python { .api }
225
# 1D complex transforms
226
def fft(a, n=None, axis=-1, norm=None): ...
227
def ifft(a, n=None, axis=-1, norm=None): ...
228
229
# 1D real transforms (optimized for real input)
230
def rfft(a, n=None, axis=-1, norm=None): ...
231
def irfft(a, n=None, axis=-1, norm=None): ...
232
233
# 1D Hermitian transforms
234
def hfft(a, n=None, axis=-1, norm=None): ...
235
def ihfft(a, n=None, axis=-1, norm=None): ...
236
237
# 2D transforms
238
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
239
def ifft2(a, s=None, axes=(-2, -1), norm=None): ...
240
def rfft2(a, s=None, axes=(-2, -1), norm=None): ...
241
def irfft2(a, s=None, axes=(-2, -1), norm=None): ...
242
243
# N-D transforms
244
def fftn(a, s=None, axes=None, norm=None): ...
245
def ifftn(a, s=None, axes=None, norm=None): ...
246
def rfftn(a, s=None, axes=None, norm=None): ...
247
def irfftn(a, s=None, axes=None, norm=None): ...
248
249
# Helper functions
250
def fftfreq(n, d=1.0): ...
251
def rfftfreq(n, d=1.0): ...
252
def fftshift(x, axes=None): ...
253
def ifftshift(x, axes=None): ...
254
255
# Configuration
256
import cupy.fft.config # FFT planning and optimization
257
```
258
259
[FFT Operations](./fft-operations.md)
260
261
### CUDA Integration
262
263
Direct CUDA functionality including memory management, device control, custom kernels, streams, and low-level GPU programming capabilities.
264
265
```python { .api }
266
# Device management
267
class Device:
268
def __init__(self, device=None): ...
269
def use(self): ...
270
271
def get_device_id(): ...
272
def is_available(): ...
273
274
# Memory management
275
def alloc(size): ...
276
class MemoryPool:
277
def malloc(self, size): ...
278
def free_all_blocks(self): ...
279
def used_bytes(self): ...
280
281
# Stream management
282
class Stream:
283
def __init__(self, null=False, non_blocking=False, ptds=False): ...
284
def use(self): ...
285
286
# Custom kernels
287
class ElementwiseKernel:
288
def __init__(self, in_params, out_params, operation, name='kernel'): ...
289
290
class RawKernel:
291
def __init__(self, code, name, **kwargs): ...
292
```
293
294
[CUDA Integration](./cuda-integration.md)
295
296
### Input/Output Operations
297
298
Comprehensive file I/O operations for loading, saving, and formatting array data with support for binary files, compressed archives, text files, and custom formatting.
299
300
```python { .api }
301
# Binary file operations
302
def save(file, arr, allow_pickle=True, fix_imports=True): ...
303
def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'): ...
304
def savez(file, *args, **kwds): ...
305
def savez_compressed(file, *args, **kwds): ...
306
307
# Text file operations
308
def loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None): ...
309
def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None): ...
310
def genfromtxt(fname, dtype=float, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None): ...
311
312
# Data conversion
313
def frombuffer(buffer, dtype=float, count=-1, offset=0): ...
314
def fromstring(string, dtype=float, count=-1, sep=''): ...
315
def fromfunction(func, shape, dtype=float, **kwargs): ...
316
def fromiter(iterable, dtype, count=-1): ...
317
318
# Array formatting
319
def array_repr(arr, max_line_width=None, precision=None, suppress_small=None): ...
320
def array_str(a, max_line_width=None, precision=None, suppress_small=None): ...
321
def array2string(a, max_line_width=None, precision=None, suppress_small=None, separator=' ', prefix='', formatter=None, threshold=None, edgeitems=None): ...
322
```
323
324
[I/O Operations](./io-operations.md)
325
326
### Polynomial Functions
327
328
Comprehensive polynomial operations including fitting, evaluation, arithmetic, root finding, and advanced polynomial manipulation with support for various polynomial types.
329
330
```python { .api }
331
# Basic operations
332
def poly(seq_of_zeros): ...
333
def roots(p): ...
334
def polyval(p, x): ...
335
def polyder(p, m=1): ...
336
def polyint(p, m=1, k=None): ...
337
338
# Arithmetic operations
339
def polyadd(a1, a2): ...
340
def polysub(a1, a2): ...
341
def polymul(a1, a2): ...
342
def polydiv(u, v): ...
343
344
# Curve fitting
345
def polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False): ...
346
def polyvander(x, deg): ...
347
348
# Object-oriented interface
349
class poly1d:
350
def __init__(self, c_or_r, r=False, variable=None): ...
351
def __call__(self, val): ...
352
def deriv(self, m=1): ...
353
def integ(self, m=1, k=0): ...
354
@property
355
def roots(self): ...
356
357
# Specialized polynomial types
358
class Chebyshev: ...
359
class Legendre: ...
360
class Hermite: ...
361
class Laguerre: ...
362
```
363
364
[Polynomial Functions](./polynomial-functions.md)
365
366
### Statistics and Aggregation
367
368
Statistical functions, sorting, searching, and data aggregation operations optimized for GPU computation.
369
370
```python { .api }
371
# Aggregation
372
def sum(a, axis=None, dtype=None, out=None, keepdims=False, initial=0, where=True): ...
373
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
374
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
375
def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
376
377
# Order statistics
378
def max(a, axis=None, out=None, keepdims=False, initial=None, where=None): ...
379
def min(a, axis=None, out=None, keepdims=False, initial=None, where=None): ...
380
def percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False): ...
381
382
# Sorting and searching
383
def sort(a, axis=-1, kind=None, order=None): ...
384
def argsort(a, axis=-1, kind=None, order=None): ...
385
def searchsorted(a, v, side='left', sorter=None): ...
386
```
387
388
*Note: Statistical functions available throughout cupy namespace*
389
390
### SciPy Compatibility
391
392
Comprehensive SciPy-compatible functions through cupyx.scipy including sparse matrices, signal processing, image processing, special functions, statistics, and advanced linear algebra.
393
394
```python { .api }
395
# Sparse matrices (cupyx.scipy.sparse)
396
class csr_matrix:
397
def __init__(self, arg1, shape=None, dtype=None, copy=False): ...
398
def dot(self, other): ...
399
400
class csc_matrix:
401
def __init__(self, arg1, shape=None, dtype=None, copy=False): ...
402
403
# Signal processing (cupyx.scipy.signal)
404
def convolve(in1, in2, mode='full', method='auto'): ...
405
def correlate(in1, in2, mode='full', method='auto'): ...
406
407
# Image processing (cupyx.scipy.ndimage)
408
def gaussian_filter(input, sigma, order=0, output=None, mode='reflect', cval=0.0, truncate=4.0): ...
409
def rotate(input, angle, axes=(1, 0), reshape=True, output=None, order=1, mode='constant', cval=0.0, prefilter=True): ...
410
411
# Special functions (cupyx.scipy.special)
412
def gamma(x): ...
413
def erf(x): ...
414
def betaln(a, b): ...
415
416
# Statistics (cupyx.scipy.stats)
417
def ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate', alternative='two-sided'): ...
418
def pearsonr(x, y): ...
419
```
420
421
[SciPy Compatibility](./scipy-compatibility.md)
422
423
### Testing and Validation
424
425
Comprehensive testing utilities for GPU/CPU comparison, parameterized testing, and numerical accuracy validation with specialized decorators for scientific computing workflows.
426
427
```python { .api }
428
# Array comparison functions
429
def assert_allclose(actual, desired, rtol=1e-7, atol=0, err_msg='', verbose=True): ...
430
def assert_array_equal(x, y, err_msg='', verbose=True): ...
431
def assert_array_almost_equal(x, y, decimal=6, err_msg='', verbose=True): ...
432
def assert_array_less(x, y, err_msg='', verbose=True): ...
433
434
# Parameterized testing decorators
435
def parameterize(*params, **named_params): ...
436
def for_all_dtypes(name='dtype', no_bool=False, no_float16=False, no_complex=False): ...
437
def for_float_dtypes(name='dtype', no_float16=False): ...
438
def for_complex_dtypes(name='dtype'): ...
439
def for_signed_dtypes(name='dtype'): ...
440
def for_unsigned_dtypes(name='dtype'): ...
441
442
# NumPy compatibility testing
443
def numpy_cupy_allclose(rtol=1e-7, atol=0, err_msg='', verbose=True, name='xp', type_check=True, accept_error=False, contiguous_check=True, sp_name=None): ...
444
def numpy_cupy_array_equal(err_msg='', verbose=True, name='xp', type_check=True, accept_error=False, contiguous_check=True, sp_name=None): ...
445
446
# Test data generation
447
def shaped_random(shape, xp=None, dtype=float32, scale=1): ...
448
def shaped_arange(shape, xp=None, dtype=float32): ...
449
def generate_seed(): ...
450
451
# Error testing decorators
452
def numpy_cupy_raises(name='xp', sp_name=None, accept_error=Exception): ...
453
```
454
455
*Note: Comprehensive testing framework available in cupy.testing module*
456
457
## Core Classes
458
459
```python { .api }
460
class ndarray:
461
"""GPU array class providing NumPy-compatible interface"""
462
def __init__(self): ...
463
@property
464
def shape(self): ...
465
@property
466
def dtype(self): ...
467
@property
468
def size(self): ...
469
def get(self, stream=None, order='C', out=None): ...
470
def set(self, arr, stream=None): ...
471
472
class ufunc:
473
"""Universal function for element-wise operations"""
474
def __call__(self, *args, **kwargs): ...
475
def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False, initial=None, where=True): ...
476
```
477
478
## Data Conversion
479
480
```python { .api }
481
def asnumpy(a, stream=None, order='C', out=None):
482
"""Convert CuPy array to NumPy array on CPU"""
483
484
def asarray(a, dtype=None, order=None):
485
"""Convert input to CuPy array"""
486
487
def get_array_module(*args):
488
"""Get appropriate array module (cupy/numpy) based on input types"""
489
```
490
491
## Memory Management
492
493
```python { .api }
494
def get_default_memory_pool():
495
"""Get the default GPU memory pool"""
496
497
def get_default_pinned_memory_pool():
498
"""Get the default pinned memory pool"""
499
500
class MemoryPool:
501
def malloc(self, size): ...
502
def free_all_blocks(self): ...
503
def used_bytes(self): ...
504
def total_bytes(self): ...
505
```
506
507
## Utilities
508
509
```python { .api }
510
def is_available():
511
"""Check if CUDA is available"""
512
513
def show_config(*, _full=False):
514
"""Print runtime configuration"""
515
516
def clear_memo():
517
"""Clear memoization cache"""
518
```