or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-operations.mdcuda-integration.mdfft.mdindex.mdindexing-selection.mdinput-output.mdjit-kernels.mdlinear-algebra.mdlogic-operations.mdmathematical-functions.mdrandom-generation.mdscipy-extensions.mdstatistics.mdtesting.md

index.mddocs/

0

# CuPy: NumPy & SciPy for GPU

1

2

CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. It acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms, providing high-performance computing capabilities while maintaining familiar NumPy syntax and semantics.

3

4

## Package Information

5

6

- **Package Name**: cupy-cuda114

7

- **Package Type**: Library

8

- **Language**: Python

9

- **Installation**: `pip install cupy-cuda114`

10

- **CUDA Target**: CUDA 11.4

11

12

## Core Imports

13

14

```python

15

import cupy as cp

16

```

17

18

For compatibility with NumPy code:

19

20

```python

21

import cupy as np # Drop-in replacement for numpy

22

```

23

24

Access specific modules:

25

26

```python

27

from cupy import cuda, random, linalg, fft

28

```

29

30

## Basic Usage

31

32

```python

33

import cupy as cp

34

35

# Create arrays on GPU

36

x = cp.arange(6).reshape(2, 3).astype('f')

37

print(x)

38

# array([[ 0., 1., 2.],

39

# [ 3., 4., 5.]], dtype=float32)

40

41

# Perform computations on GPU

42

result = x.sum(axis=1)

43

print(result)

44

# array([ 3., 12.], dtype=float32)

45

46

# Convert to NumPy array on CPU

47

cpu_result = cp.asnumpy(result)

48

49

# Mathematical operations

50

y = cp.sin(x) * cp.cos(x)

51

z = cp.sqrt(x**2 + y**2)

52

53

# Linear algebra

54

A = cp.random.random((1000, 1000))

55

B = cp.random.random((1000, 1000))

56

C = cp.dot(A, B) # Matrix multiplication on GPU

57

58

# Fast Fourier Transform

59

signal = cp.random.random(1024)

60

fft_result = cp.fft.fft(signal)

61

```

62

63

## Architecture

64

65

CuPy provides a comprehensive GPU computing ecosystem:

66

67

- **Core Arrays**: GPU-accelerated `ndarray` objects with NumPy-compatible API

68

- **Mathematical Functions**: Complete set of element-wise operations, reductions, and special functions

69

- **Linear Algebra**: High-performance matrix operations using cuBLAS and cuSOLVER

70

- **CUDA Integration**: Direct access to CUDA features including memory management, streams, and custom kernels

71

- **SciPy Extensions**: GPU implementations of SciPy functionality for sparse matrices, signal processing, and more

72

- **Performance Tools**: Memory pools, kernel fusion, JIT compilation, and profiling utilities

73

74

## Capabilities

75

76

### Array Creation and Manipulation

77

78

Comprehensive array creation functions and manipulation operations compatible with NumPy, enabling easy migration of existing code to GPU acceleration.

79

80

```python { .api }

81

def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...

82

def zeros(shape, dtype=float32, order='C'): ...

83

def ones(shape, dtype=None, order='C'): ...

84

def empty(shape, dtype=float32, order='C'): ...

85

def arange(start, stop=None, step=1, dtype=None): ...

86

def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...

87

def reshape(a, newshape, order='C'): ...

88

def transpose(a, axes=None): ...

89

def concatenate(arrays, axis=0, out=None, dtype=None, casting="same_kind"): ...

90

```

91

92

[Array Operations](./array-operations.md)

93

94

### Mathematical Functions

95

96

Complete set of mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and special functions, all optimized for GPU execution.

97

98

```python { .api }

99

def sin(x, out=None, **kwargs): ...

100

def cos(x, out=None, **kwargs): ...

101

def exp(x, out=None, **kwargs): ...

102

def log(x, out=None, **kwargs): ...

103

def sqrt(x, out=None, **kwargs): ...

104

def add(x1, x2, out=None, **kwargs): ...

105

def multiply(x1, x2, out=None, **kwargs): ...

106

def sum(a, axis=None, dtype=None, out=None, keepdims=False, initial=None, where=None): ...

107

def mean(a, axis=None, dtype=None, out=None, keepdims=False, where=None): ...

108

```

109

110

[Mathematical Functions](./mathematical-functions.md)

111

112

### Linear Algebra

113

114

High-performance linear algebra operations leveraging cuBLAS and cuSOLVER libraries for matrix operations, decompositions, and solving linear systems.

115

116

```python { .api }

117

def dot(a, b, out=None): ...

118

def matmul(x1, x2, out=None, **kwargs): ...

119

def norm(x, ord=None, axis=None, keepdims=False): ...

120

def solve(a, b): ...

121

def inv(a): ...

122

def svd(a, full_matrices=True, compute_uv=True): ...

123

def eigh(a, UPLO='L'): ...

124

def cholesky(a): ...

125

```

126

127

[Linear Algebra](./linear-algebra.md)

128

129

### CUDA Integration

130

131

Direct access to CUDA features including device management, memory allocation, streams, events, and custom kernel compilation for advanced GPU programming.

132

133

```python { .api }

134

class Device:

135

def __init__(self, device=None): ...

136

137

class Stream:

138

def __init__(self, null=False, non_blocking=False, ptds=False): ...

139

140

class MemoryPool:

141

def __init__(self, allocator=None): ...

142

143

def compile_with_cache(source, name, options=(), arch=None, cachdir=None, prepend_cupy_headers=True, backend='nvcc', translate_cucomplex=True, enable_cooperative_groups=False, name_expressions=None, log_stream=None, cache_in_memory=False, jitify=False): ...

144

```

145

146

[CUDA Integration](./cuda-integration.md)

147

148

### Random Number Generation

149

150

Comprehensive random number generation capabilities with support for multiple algorithms, distributions, and GPU-accelerated sampling for scientific computing and simulation.

151

152

```python { .api }

153

def random(size=None): ...

154

def randn(*args): ...

155

def randint(low, high=None, size=None, dtype=int): ...

156

def normal(loc=0.0, scale=1.0, size=None): ...

157

def uniform(low=0.0, high=1.0, size=None): ...

158

def choice(a, size=None, replace=True, p=None): ...

159

160

class Generator:

161

def __init__(self, bit_generator): ...

162

```

163

164

[Random Number Generation](./random-generation.md)

165

166

### Fast Fourier Transform

167

168

GPU-accelerated FFT operations supporting 1D, 2D, and N-dimensional transforms for both complex and real data with comprehensive frequency domain processing capabilities.

169

170

```python { .api }

171

def fft(a, n=None, axis=-1, norm=None): ...

172

def ifft(a, n=None, axis=-1, norm=None): ...

173

def rfft(a, n=None, axis=-1, norm=None): ...

174

def fft2(a, s=None, axes=(-2, -1), norm=None): ...

175

def fftn(a, s=None, axes=None, norm=None): ...

176

def fftfreq(n, d=1.0): ...

177

```

178

179

[Fast Fourier Transform](./fft.md)

180

181

### SciPy Extensions

182

183

GPU implementations of SciPy functionality including sparse matrices, signal processing, special functions, statistics, and N-dimensional image processing.

184

185

```python { .api }

186

# Sparse matrices

187

class csr_matrix: ...

188

class csc_matrix: ...

189

class coo_matrix: ...

190

191

# Signal processing

192

def convolve(in1, in2, mode='full'): ...

193

def correlate(in1, in2, mode='full'): ...

194

195

# Special functions

196

def gamma(z): ...

197

def erf(z): ...

198

```

199

200

[SciPy Extensions](./scipy-extensions.md)

201

202

### JIT Compilation and Custom Kernels

203

204

Just-in-time compilation capabilities and custom CUDA kernel creation for performance-critical applications requiring low-level GPU programming.

205

206

```python { .api }

207

class ElementwiseKernel:

208

def __init__(self, in_params, out_params, operation, name='kernel', reduce_dims=True, **kwargs): ...

209

210

class ReductionKernel:

211

def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr='', identity=None, name='kernel', reduce_type=None, reduce_dims=True, **kwargs): ...

212

213

class RawKernel:

214

def __init__(self, code, name, **kwargs): ...

215

216

@rawkernel()

217

def my_kernel(x, y, size): ...

218

```

219

220

[JIT and Custom Kernels](./jit-kernels.md)

221

222

### Statistics and Data Analysis

223

224

Statistical functions and data analysis tools including descriptive statistics, correlation analysis, and histogram computation, all optimized for large-scale GPU processing.

225

226

```python { .api }

227

def mean(a, axis=None, dtype=None, out=None, keepdims=False, where=None): ...

228

def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, where=None): ...

229

def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False, where=None): ...

230

def corrcoef(x, y=None, rowvar=True, bias=None, ddof=None, fweights=None, aweights=None): ...

231

def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): ...

232

def percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False): ...

233

```

234

235

[Statistics](./statistics.md)

236

237

### Logic and Comparison Operations

238

239

Comprehensive logical operations, element-wise comparisons, truth value testing, and content validation functions for array processing and conditional operations.

240

241

```python { .api }

242

def equal(x1, x2, out=None, **kwargs): ...

243

def not_equal(x1, x2, out=None, **kwargs): ...

244

def less(x1, x2, out=None, **kwargs): ...

245

def greater(x1, x2, out=None, **kwargs): ...

246

def logical_and(x1, x2, out=None, **kwargs): ...

247

def all(a, axis=None, out=None, keepdims=False, where=None): ...

248

def any(a, axis=None, out=None, keepdims=False, where=None): ...

249

def isfinite(x, out=None, **kwargs): ...

250

def isnan(x, out=None, **kwargs): ...

251

```

252

253

[Logic Operations](./logic-operations.md)

254

255

### Indexing and Selection

256

257

Advanced indexing, selection, and extraction operations including fancy indexing, boolean indexing, and element insertion for flexible array manipulation.

258

259

```python { .api }

260

def take(a, indices, axis=None, out=None, mode='raise'): ...

261

def choose(a, choices, out=None, mode='raise'): ...

262

def where(condition, x=None, y=None): ...

263

def extract(condition, arr): ...

264

def nonzero(a): ...

265

def argmax(a, axis=None, out=None, keepdims=False): ...

266

def argmin(a, axis=None, out=None, keepdims=False): ...

267

```

268

269

[Indexing and Selection](./indexing-selection.md)

270

271

### Input/Output Operations

272

273

File I/O operations supporting NumPy's binary formats (NPZ) and text formats with GPU-optimized loading and saving capabilities.

274

275

```python { .api }

276

def save(file, arr, allow_pickle=True, fix_imports=True): ...

277

def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'): ...

278

def savez(file, *args, **kwds): ...

279

def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None): ...

280

def loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None): ...

281

```

282

283

[Input/Output](./input-output.md)

284

285

### Testing and Debugging

286

287

Comprehensive testing framework with NumPy comparison utilities, array assertions, and performance benchmarking tools for development and validation.

288

289

```python { .api }

290

def assert_allclose(actual, desired, rtol=1e-7, atol=0, err_msg='', verbose=True): ...

291

def assert_array_equal(x, y, err_msg='', verbose=True, strides_check=False): ...

292

293

@numpy_cupy_allclose()

294

def test_function(xp): ...

295

296

@for_all_dtypes()

297

def test_dtypes(dtype): ...

298

```

299

300

[Testing Framework](./testing.md)

301

302

## Core Classes

303

304

```python { .api }

305

class ndarray:

306

"""N-dimensional GPU array object.

307

308

Primary data structure for GPU-accelerated computing with NumPy-compatible interface.

309

"""

310

def __init__(self, shape, dtype=float, buffer=None, offset=0, strides=None, order=None): ...

311

def get(self, stream=None, order='C', out=None): ... # Copy to CPU

312

def set(self, arr, stream=None): ... # Copy from CPU

313

def copy(self, order='C'): ...

314

def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...

315

316

class ufunc:

317

"""Universal function object for element-wise operations."""

318

def __call__(self, *args, **kwargs): ...

319

def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False, initial=None, where=None): ...

320

def accumulate(self, a, axis=0, dtype=None, out=None): ...

321

```

322

323

## Utility Functions

324

325

```python { .api }

326

def asnumpy(a, stream=None, order='C', out=None):

327

"""Convert CuPy array to NumPy array on CPU."""

328

329

def get_array_module(*args):

330

"""Get appropriate array module (cupy or numpy) based on input types."""

331

332

def is_available():

333

"""Check if CUDA is available."""

334

335

def get_default_memory_pool():

336

"""Get default GPU memory pool."""

337

338

def show_config():

339

"""Display CuPy configuration information."""

340

341

def binary_repr(num, width=None):

342

"""Return binary representation of input number as string."""

343

344

def base_repr(number, base=2, padding=0):

345

"""Return string representation of number in given base system."""

346

347

def ndim(a):

348

"""Return number of dimensions of an array."""

349

350

def isscalar(element):

351

"""Return True if the type of element is a scalar type."""

352

353

def fuse(*args, **kwargs):

354

"""Kernel fusion functionality for performance optimization."""

355

356

def clear_memo():

357

"""Clear memoization cache."""

358

359

def memoize(for_each_device=False):

360

"""Memoization decorator for caching function results."""

361

```

362

363

## Data Types

364

365

CuPy supports all NumPy data types:

366

367

```python { .api }

368

# Boolean

369

bool_, bool8

370

371

# Integers

372

int8, int16, int32, int64

373

uint8, uint16, uint32, uint64

374

375

# Floating point

376

float16, float32, float64

377

378

# Complex

379

complex64, complex128

380

381

# Generic type hierarchy

382

number, integer, signedinteger, unsignedinteger

383

floating, complexfloating

384

```