or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-cupy

NumPy & SciPy-compatible array library for GPU-accelerated computing with Python

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/cupy@13.6.x

To install, run

npx @tessl/cli install tessl/pypi-cupy@13.6.0

0

# CuPy

1

2

CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms, providing significant performance improvements for mathematical computations, linear algebra, and scientific computing workloads.

3

4

## Package Information

5

6

- **Package Name**: cupy

7

- **Language**: Python

8

- **Installation**: `pip install cupy` (or `cupy-cuda11x`, `cupy-cuda12x` for specific CUDA versions)

9

10

## Core Imports

11

12

```python

13

import cupy as cp

14

```

15

16

For CUDA-specific functionality:

17

18

```python

19

import cupy.cuda as cuda

20

```

21

22

For extended functionality:

23

24

```python

25

import cupyx

26

```

27

28

## Basic Usage

29

30

```python

31

import cupy as cp

32

import numpy as np

33

34

# Create arrays on GPU

35

x_gpu = cp.array([1, 2, 3, 4, 5])

36

y_gpu = cp.zeros((3, 3))

37

38

# Perform operations on GPU (same API as NumPy)

39

result_gpu = cp.sum(x_gpu)

40

z_gpu = cp.dot(x_gpu, x_gpu)

41

42

# Transfer data between CPU and GPU

43

x_cpu = cp.asnumpy(x_gpu) # GPU to CPU

44

x_gpu_from_cpu = cp.asarray(x_cpu) # CPU to GPU

45

46

# Linear algebra operations

47

A = cp.random.random((1000, 1000))

48

B = cp.random.random((1000, 1000))

49

C = cp.dot(A, B) # Performed on GPU

50

51

# Element-wise operations with broadcasting

52

result = cp.sqrt(A) + cp.sin(B)

53

```

54

55

## Architecture

56

57

CuPy's architecture mirrors NumPy while enabling GPU acceleration:

58

59

- **ndarray**: GPU-accelerated equivalent of NumPy arrays, supporting same interface and operations

60

- **CUDA Memory Management**: Automatic memory pooling and allocation on GPU devices

61

- **Universal Functions (ufuncs)**: Element-wise operations optimized for parallel GPU execution

62

- **Kernel System**: Custom CUDA kernels for specialized operations not covered by standard functions

63

- **Stream Management**: CUDA streams for asynchronous execution and memory operations

64

- **Multi-GPU Support**: Distribution of computations across multiple GPU devices

65

66

This design enables seamless migration from NumPy to GPU computing while maintaining full API compatibility and adding CUDA-specific enhancements for maximum performance.

67

68

## Capabilities

69

70

### Core Array Class

71

72

The fundamental ndarray class providing GPU-accelerated multi-dimensional arrays.

73

74

```python { .api }

75

class ndarray:

76

"""

77

GPU-accelerated multi-dimensional array object.

78

79

Attributes:

80

- shape: tuple, dimensions of the array

81

- dtype: data type of array elements

82

- size: int, total number of elements

83

- ndim: int, number of dimensions

84

- itemsize: int, size of each element in bytes

85

- nbytes: int, total bytes consumed by elements

86

- device: cupy.cuda.Device, GPU device where array resides

87

"""

88

89

def __init__(self, shape, dtype=float, order='C'): ...

90

def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...

91

def copy(self, order='C'): ...

92

def flatten(self, order='C'): ...

93

def ravel(self, order='C'): ...

94

def reshape(self, *shape, order='C'): ...

95

def squeeze(self, axis=None): ...

96

def transpose(self, *axes): ...

97

def swapaxes(self, axis1, axis2): ...

98

def get(self, stream=None, order='C', out=None): ...

99

def set(self, arr, stream=None): ...

100

def sum(self, axis=None, dtype=None, out=None, keepdims=False): ...

101

def mean(self, axis=None, dtype=None, out=None, keepdims=False): ...

102

def std(self, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...

103

def var(self, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...

104

def max(self, axis=None, out=None, keepdims=False, initial=None, where=None): ...

105

def min(self, axis=None, out=None, keepdims=False, initial=None, where=None): ...

106

def dot(self, b, out=None): ...

107

def sort(self, axis=-1, kind=None, order=None): ...

108

def argsort(self, axis=-1, kind=None, order=None): ...

109

```

110

111

### Array Creation and Manipulation

112

113

Core functionality for creating, reshaping, and manipulating GPU arrays with the same interface as NumPy.

114

115

```python { .api }

116

def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...

117

def zeros(shape, dtype=None, order='C'): ...

118

def ones(shape, dtype=None, order='C'): ...

119

def empty(shape, dtype=float32, order='C'): ...

120

def arange(start, stop=None, step=1, dtype=None): ...

121

def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...

122

def reshape(a, newshape, order='C'): ...

123

def concatenate(arrays, axis=0, out=None, dtype=None, casting='same_kind'): ...

124

```

125

126

[Array Creation and Manipulation](./array-creation.md)

127

128

### Array Manipulation and Reshaping

129

130

Shape manipulation, joining, splitting, and rearranging array operations.

131

132

```python { .api }

133

def reshape(a, newshape, order='C'): ...

134

def ravel(a, order='C'): ...

135

def transpose(a, axes=None): ...

136

def moveaxis(a, source, destination): ...

137

def swapaxes(a, axis1, axis2): ...

138

def squeeze(a, axis=None): ...

139

def expand_dims(a, axis): ...

140

def atleast_1d(*arys): ...

141

def atleast_2d(*arys): ...

142

def atleast_3d(*arys): ...

143

def stack(arrays, axis=0, out=None): ...

144

def vstack(tup): ...

145

def hstack(tup): ...

146

def dstack(tup): ...

147

def split(ary, indices_or_sections, axis=0): ...

148

def hsplit(ary, indices_or_sections): ...

149

def vsplit(ary, indices_or_sections): ...

150

def repeat(a, repeats, axis=None): ...

151

def tile(A, reps): ...

152

def flip(m, axis=None): ...

153

def roll(a, shift, axis=None): ...

154

```

155

156

### Mathematical Operations

157

158

Element-wise mathematical functions including trigonometric, logarithmic, arithmetic, and comparison operations.

159

160

```python { .api }

161

def add(x1, x2, /, out=None): ...

162

def multiply(x1, x2, /, out=None): ...

163

def sin(x, /, out=None): ...

164

def cos(x, /, out=None): ...

165

def exp(x, /, out=None): ...

166

def log(x, /, out=None): ...

167

def sqrt(x, /, out=None): ...

168

def maximum(x1, x2, /, out=None): ...

169

def sum(a, axis=None, dtype=None, out=None, keepdims=False, initial=None, where=None): ...

170

```

171

172

[Mathematical Operations](./math-functions.md)

173

174

### Linear Algebra

175

176

GPU-accelerated linear algebra operations including matrix multiplication, decompositions, eigenvalue computation, and solving linear systems.

177

178

```python { .api }

179

def dot(a, b, out=None): ...

180

def matmul(x1, x2, /, out=None, *, casting='same_kind', order='K', dtype=None, subok=True): ...

181

def einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe', optimize=False): ...

182

```

183

184

From `cupy.linalg`:

185

186

```python { .api }

187

def norm(x, ord=None, axis=None, keepdims=False): ...

188

def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...

189

def inv(a): ...

190

def solve(a, b): ...

191

def eigh(a, UPLO='L'): ...

192

```

193

194

[Linear Algebra](./linear-algebra.md)

195

196

### Random Number Generation

197

198

GPU-accelerated random number generation with multiple generators and probability distributions.

199

200

```python { .api }

201

def rand(*args): ...

202

def randn(*args): ...

203

def randint(low, high=None, size=None, dtype=int): ...

204

def random_sample(size=None): ...

205

def normal(loc=0.0, scale=1.0, size=None): ...

206

def uniform(low=0.0, high=1.0, size=None): ...

207

def choice(a, size=None, replace=True, p=None): ...

208

```

209

210

Generator API:

211

212

```python { .api }

213

def default_rng(seed=None): ...

214

class Generator:

215

def random(self, size=None, dtype=float64, out=None): ...

216

def integers(self, low, high=None, size=None, dtype=int64, endpoint=False): ...

217

```

218

219

[Random Number Generation](./random.md)

220

221

### Fast Fourier Transform

222

223

GPU-accelerated discrete Fourier transforms for signal processing and frequency domain analysis.

224

225

```python { .api }

226

def fft(a, n=None, axis=-1, norm=None): ...

227

def ifft(a, n=None, axis=-1, norm=None): ...

228

def fft2(a, s=None, axes=(-2, -1), norm=None): ...

229

def fftn(a, s=None, axes=None, norm=None): ...

230

def rfft(a, n=None, axis=-1, norm=None): ...

231

def fftshift(x, axes=None): ...

232

def fftfreq(n, d=1.0): ...

233

```

234

235

[Fast Fourier Transform](./fft.md)

236

237

### CUDA Memory and Device Management

238

239

Low-level CUDA functionality for memory allocation, device management, and stream operations.

240

241

```python { .api }

242

def get_default_memory_pool(): ...

243

def get_default_pinned_memory_pool(): ...

244

def is_available(): ...

245

def asnumpy(a, stream=None, order='C', out=None, *, blocking=True): ...

246

def get_array_module(*args): ...

247

```

248

249

From `cupy.cuda`:

250

251

```python { .api }

252

class Device:

253

def __init__(self, device=None): ...

254

def __enter__(self): ...

255

def __exit__(self, *args): ...

256

257

class Stream:

258

def __init__(self, null=False, non_blocking=False, priority=0): ...

259

def synchronize(self): ...

260

261

class MemoryPool:

262

def __init__(self, allocator=None): ...

263

def malloc(self, size): ...

264

def free_all_blocks(self): ...

265

```

266

267

[CUDA Memory and Device Management](./cuda-management.md)

268

269

### Custom Kernels and Performance

270

271

Tools for writing custom CUDA kernels and optimizing GPU performance.

272

273

```python { .api }

274

class ElementwiseKernel:

275

def __init__(self, in_params, out_params, operation, name='kernel', **kwargs): ...

276

def __call__(self, *args, **kwargs): ...

277

278

class ReductionKernel:

279

def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr, identity, name='kernel', **kwargs): ...

280

def __call__(self, *args, **kwargs): ...

281

282

class RawKernel:

283

def __init__(self, code, name, **kwargs): ...

284

def __call__(self, grid, block, args, *, shared_mem=0, stream=None): ...

285

```

286

287

[Custom Kernels and Performance](./kernels.md)

288

289

### Statistics and Data Analysis

290

291

Statistical functions for data analysis including descriptive statistics, correlations, and histograms.

292

293

```python { .api }

294

def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...

295

def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...

296

def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...

297

def median(a, axis=None, out=None, overwrite_input=False, keepdims=False): ...

298

def corrcoef(x, y=None, rowvar=True, bias=None, ddof=None): ...

299

def histogram(a, bins=10, range=None, weights=None, density=None): ...

300

def percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False): ...

301

```

302

303

[Statistics and Data Analysis](./statistics.md)

304

305

### Indexing and Selection

306

307

Advanced indexing operations including multi-dimensional indexing, selection, and array generation utilities.

308

309

```python { .api }

310

def take(a, indices, axis=None, out=None, mode='raise'): ...

311

def take_along_axis(arr, indices, axis): ...

312

def choose(a, choices, out=None, mode='raise'): ...

313

def compress(condition, a, axis=None, out=None): ...

314

def extract(condition, arr): ...

315

def select(condlist, choicelist, default=0): ...

316

def indices(dimensions, dtype=int, sparse=False): ...

317

def ix_(*args): ...

318

def ravel_multi_index(multi_index, dims, mode='raise', order='C'): ...

319

def unravel_index(indices, shape, order='C'): ...

320

def diagonal(a, offset=0, axis1=0, axis2=1): ...

321

def diag_indices(n, ndim=2): ...

322

def triu_indices(n, k=0, m=None): ...

323

def tril_indices(n, k=0, m=None): ...

324

```

325

326

### Sparse Matrix Operations

327

328

GPU-accelerated sparse matrix operations for large-scale scientific computing.

329

330

```python { .api }

331

class csr_matrix:

332

def __init__(self, arg1, shape=None, dtype=None, copy=False): ...

333

def dot(self, other): ...

334

def transpose(self, axes=None, copy=False): ...

335

336

class csc_matrix:

337

def __init__(self, arg1, shape=None, dtype=None, copy=False): ...

338

339

class coo_matrix:

340

def __init__(self, arg1, shape=None, dtype=None, copy=False): ...

341

```

342

343

[Sparse Matrix Operations](./sparse.md)

344

345

### SciPy Compatibility Extensions

346

347

Extended scientific computing functions from cupyx.scipy for advanced mathematical operations.

348

349

From `cupyx.scipy`:

350

351

```python { .api }

352

# Signal processing

353

def convolve(in1, in2, mode='full', method='auto'): ...

354

def correlate(in1, in2, mode='full', method='auto'): ...

355

356

# Image processing

357

def gaussian_filter(input, sigma, order=0, output=None, mode='reflect', cval=0.0, truncate=4.0): ...

358

def sobel(input, axis=-1, output=None, mode='reflect', cval=0.0): ...

359

360

# Optimization

361

def minimize(fun, x0, args=(), method=None, jac=None, bounds=None, constraints=()): ...

362

```

363

364

[SciPy Extensions](./scipy-extensions.md)

365

366

### Input/Output Operations

367

368

File operations for loading and saving arrays in various formats.

369

370

```python { .api }

371

def load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII'): ...

372

def save(file, arr, allow_pickle=True, fix_imports=True): ...

373

def savez(file, *args, **kwds): ...

374

def savez_compressed(file, *args, **kwds): ...

375

def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None): ...

376

def loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None): ...

377

```

378

379

### Logic and Comparison Functions

380

381

Element-wise logical operations, truth value testing, and array comparison functions.

382

383

```python { .api }

384

def allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False): ...

385

def array_equal(a1, a2, equal_nan=False): ...

386

def array_equiv(a1, a2): ...

387

def isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False): ...

388

def isfinite(x, /, out=None): ...

389

def isinf(x, /, out=None): ...

390

def isnan(x, /, out=None): ...

391

def isreal(x): ...

392

def iscomplex(x): ...

393

def in1d(ar1, ar2, assume_unique=False, invert=False): ...

394

def isin(element, test_elements, assume_unique=False, invert=False): ...

395

def intersect1d(ar1, ar2, assume_unique=False, return_indices=False): ...

396

def setdiff1d(ar1, ar2, assume_unique=False): ...

397

def union1d(ar1, ar2): ...

398

```

399

400

### Binary Operations

401

402

Bitwise operations and binary representations.

403

404

```python { .api }

405

def bitwise_and(x1, x2, /, out=None): ...

406

def bitwise_or(x1, x2, /, out=None): ...

407

def bitwise_xor(x1, x2, /, out=None): ...

408

def bitwise_not(x, /, out=None): ...

409

def left_shift(x1, x2, /, out=None): ...

410

def right_shift(x1, x2, /, out=None): ...

411

def packbits(a, axis=None, bitorder='big'): ...

412

def unpackbits(a, axis=None, count=None, bitorder='big'): ...

413

```

414

415

## Error Handling

416

417

CuPy uses the same exception hierarchy as NumPy with additional CUDA-specific exceptions:

418

419

```python { .api }

420

class AxisError(Exception): ...

421

class ComplexWarning(Warning): ...

422

class TooHardError(Exception): ...

423

class VisibleDeprecationWarning(Warning): ...

424

```

425

426

Common CUDA-related errors are automatically handled with informative error messages for debugging GPU memory issues, device compatibility, and kernel execution problems.