or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-operations.mdcuda-interface.mdfft-operations.mdindex.mdinput-output.mdlinear-algebra.mdmath-operations.mdrandom-generation.mdscipy-extensions.md

index.mddocs/

0

# CuPy

1

2

CuPy is a NumPy & SciPy-compatible GPU-accelerated computing library that enables high-performance array operations on NVIDIA CUDA GPUs. It provides a drop-in replacement for NumPy, allowing existing NumPy/SciPy code to run on GPUs with minimal modifications while delivering significant performance improvements for large-scale numerical computations.

3

4

## Package Information

5

6

- **Package Name**: cupy-cuda112

7

- **Language**: Python

8

- **Installation**: `pip install cupy-cuda112`

9

- **GPU Requirements**: NVIDIA CUDA 11.2 or compatible

10

- **Homepage**: https://cupy.dev/

11

- **Documentation**: https://docs.cupy.dev/

12

13

## Core Imports

14

15

```python

16

import cupy as cp

17

```

18

19

For CUDA-specific functionality:

20

21

```python

22

import cupy.cuda

23

```

24

25

For SciPy-compatible extensions:

26

27

```python

28

import cupyx.scipy

29

```

30

31

## Basic Usage

32

33

```python

34

import cupy as cp

35

import numpy as np

36

37

# Create arrays on GPU

38

gpu_array = cp.array([1, 2, 3, 4, 5])

39

gpu_zeros = cp.zeros((3, 4))

40

gpu_random = cp.random.random((1000, 1000))

41

42

# Array operations (executed on GPU)

43

result = cp.sqrt(gpu_array)

44

matrix_mult = cp.dot(gpu_random, gpu_random.T)

45

46

# Convert back to NumPy for CPU operations

47

cpu_result = cp.asnumpy(result)

48

49

# Memory pool management

50

mempool = cp.get_default_memory_pool()

51

print(f"Used bytes: {mempool.used_bytes()}")

52

print(f"Total bytes: {mempool.total_bytes()}")

53

54

# Check GPU availability

55

if cp.cuda.is_available():

56

print(f"CUDA devices available: {cp.cuda.runtime.getDeviceCount()}")

57

```

58

59

## Architecture

60

61

CuPy's architecture mirrors NumPy while adding GPU-specific capabilities:

62

63

- **Core Arrays**: `cupy.ndarray` provides GPU-accelerated N-dimensional arrays with NumPy-compatible interface

64

- **Universal Functions**: GPU-accelerated element-wise operations through `cupy.ufunc`

65

- **Memory Management**: Automatic memory pooling with configurable allocators for optimal GPU memory usage

66

- **CUDA Integration**: Direct access to CUDA streams, events, memory management, and custom kernel compilation

67

- **Custom Kernels**: Support for user-defined CUDA kernels through `RawKernel`, `ElementwiseKernel`, and `ReductionKernel`

68

- **Multi-GPU**: Support for multi-GPU computation and memory management

69

- **CuPy Extensions (cupyx)**: Additional functionality including SciPy compatibility, profiling, JIT compilation, and advanced linear algebra

70

71

This design enables seamless migration from NumPy-based code to GPU-accelerated computation while providing advanced CUDA programming capabilities for performance-critical applications.

72

73

## Capabilities

74

75

### Array Creation and Manipulation

76

77

Core functionality for creating, reshaping, and manipulating N-dimensional arrays on GPU, providing NumPy-compatible array creation routines with GPU memory allocation.

78

79

```python { .api }

80

def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...

81

def zeros(shape, dtype=float, order='C'): ...

82

def ones(shape, dtype=float, order='C'): ...

83

def empty(shape, dtype=float, order='C'): ...

84

def arange(start, stop=None, step=1, dtype=None): ...

85

def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None): ...

86

def reshape(a, newshape, order='C'): ...

87

def transpose(a, axes=None): ...

88

def concatenate(arrays, axis=0, out=None): ...

89

```

90

91

[Array Operations](./array-operations.md)

92

93

### Mathematical Functions

94

95

Comprehensive collection of mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, and arithmetic functions optimized for GPU execution.

96

97

```python { .api }

98

def sin(x, out=None, **kwargs): ...

99

def cos(x, out=None, **kwargs): ...

100

def exp(x, out=None, **kwargs): ...

101

def log(x, out=None, **kwargs): ...

102

def sqrt(x, out=None, **kwargs): ...

103

def add(x1, x2, out=None, **kwargs): ...

104

def multiply(x1, x2, out=None, **kwargs): ...

105

def sum(a, axis=None, dtype=None, out=None, keepdims=False): ...

106

def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...

107

```

108

109

[Mathematical Operations](./math-operations.md)

110

111

### Linear Algebra

112

113

GPU-accelerated linear algebra operations including matrix multiplication, decompositions, eigenvalue computation, and equation solving using cuBLAS and cuSOLVER.

114

115

```python { .api }

116

def dot(a, b, out=None): ...

117

def matmul(x1, x2, out=None): ...

118

def linalg.svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...

119

def linalg.eigh(a, UPLO='L'): ...

120

def linalg.solve(a, b): ...

121

def linalg.inv(a): ...

122

def linalg.norm(x, ord=None, axis=None, keepdims=False): ...

123

def einsum(subscripts, *operands, **kwargs): ...

124

```

125

126

[Linear Algebra](./linear-algebra.md)

127

128

### Random Number Generation

129

130

GPU-accelerated random number generation supporting multiple bit generators and probability distributions for statistical computing and simulation.

131

132

```python { .api }

133

def random.random(size=None, dtype=float): ...

134

def random.rand(*args): ...

135

def random.randn(*args): ...

136

def random.randint(low, high=None, size=None, dtype=int): ...

137

def random.normal(loc=0.0, scale=1.0, size=None): ...

138

def random.uniform(low=0.0, high=1.0, size=None): ...

139

class random.Generator: ...

140

def random.default_rng(seed=None): ...

141

```

142

143

[Random Number Generation](./random-generation.md)

144

145

### CUDA Integration

146

147

Direct interface to CUDA runtime, memory management, stream processing, and custom kernel development for advanced GPU programming.

148

149

```python { .api }

150

class cuda.Device: ...

151

def cuda.get_device_id(): ...

152

class cuda.MemoryPool: ...

153

class cuda.Stream: ...

154

class cuda.Event: ...

155

def cuda.compile_with_cache(source, options=(), **kwargs): ...

156

class ElementwiseKernel: ...

157

class RawKernel: ...

158

```

159

160

[CUDA Interface](./cuda-interface.md)

161

162

### Fast Fourier Transform

163

164

GPU-accelerated FFT operations for signal processing and frequency domain analysis using cuFFT library.

165

166

```python { .api }

167

def fft.fft(a, n=None, axis=-1, norm=None): ...

168

def fft.ifft(a, n=None, axis=-1, norm=None): ...

169

def fft.fft2(a, s=None, axes=(-2, -1), norm=None): ...

170

def fft.fftn(a, s=None, axes=None, norm=None): ...

171

def fft.rfft(a, n=None, axis=-1, norm=None): ...

172

def fft.fftfreq(n, d=1.0): ...

173

```

174

175

[FFT Operations](./fft-operations.md)

176

177

### SciPy Compatibility

178

179

Extended functionality providing SciPy-compatible operations for sparse matrices, signal processing, image processing, and specialized mathematical functions.

180

181

```python { .api }

182

import cupyx.scipy.sparse

183

import cupyx.scipy.ndimage

184

import cupyx.scipy.signal

185

import cupyx.scipy.special

186

import cupyx.scipy.linalg

187

def cupyx.scipy.sparse.csr_matrix(arg1, shape=None, dtype=None, copy=False): ...

188

def cupyx.scipy.ndimage.gaussian_filter(input, sigma, **kwargs): ...

189

```

190

191

[SciPy Extensions](./scipy-extensions.md)

192

193

### Input/Output Operations

194

195

File I/O operations for saving and loading arrays in various formats including NumPy's .npy and .npz formats.

196

197

```python { .api }

198

def save(file, arr, allow_pickle=True, fix_imports=True): ...

199

def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'): ...

200

def savez(file, *args, **kwds): ...

201

def savez_compressed(file, *args, **kwds): ...

202

def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\\n', header='', footer='', comments='# ', encoding=None): ...

203

```

204

205

[Input/Output](./input-output.md)

206

207

## Types

208

209

```python { .api }

210

class ndarray:

211

"""N-dimensional array object on GPU memory"""

212

def __init__(self, shape, dtype=float, buffer=None, offset=0, strides=None, order=None): ...

213

def get(self, stream=None, order='C', out=None): ... # Transfer to CPU

214

def set(self, arr, stream=None): ... # Transfer from CPU

215

@property

216

def device(self): ...

217

@property

218

def data(self): ...

219

@property

220

def shape(self): ...

221

@property

222

def dtype(self): ...

223

224

class ufunc:

225

"""Universal function for element-wise operations"""

226

def __call__(self, *args, **kwargs): ...

227

def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...

228

def accumulate(self, a, axis=0, dtype=None, out=None): ...

229

230

# Memory management types

231

class cuda.MemoryPointer: ...

232

class cuda.Memory: ...

233

class cuda.MemoryPool: ...

234

class cuda.PinnedMemory: ...

235

236

# Stream and event types

237

class cuda.Stream: ...

238

class cuda.Event: ...

239

class cuda.Device: ...

240

241

# Custom kernel types

242

class ElementwiseKernel: ...

243

class ReductionKernel: ...

244

class RawKernel: ...

245

class RawModule: ...

246

```