or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-operations.mdcuda-integration.mdcustom-kernels.mdfft.mdindex.mdio-operations.mdjit-compilation.mdlinear-algebra.mdmathematical-functions.mdperformance-profiling.mdpolynomial-operations.mdrandom.mdscipy-extensions.md

index.mddocs/

0

# CuPy

1

2

CuPy is a NumPy/SciPy-compatible array library that accelerates NumPy-based code using NVIDIA CUDA or AMD ROCm platforms. It provides a comprehensive GPU-accelerated computing framework for scientific computing, machine learning, and data analysis, serving as a drop-in replacement for NumPy arrays with extensive mathematical operations, linear algebra, signal processing, and statistical functions.

3

4

## Package Information

5

6

- **Package Name**: cupy-cuda11x

7

- **Language**: Python

8

- **Installation**: `pip install cupy-cuda11x`

9

- **CUDA Compatibility**: CUDA 11.2 through 11.8

10

- **Platform Support**: Linux (x86_64, aarch64), Windows (x86_64)

11

12

## Core Imports

13

14

```python

15

import cupy as cp

16

```

17

18

For specific modules:

19

20

```python

21

import cupy

22

from cupy import fft, linalg, random

23

import cupyx

24

from cupyx import scipy

25

```

26

27

## Basic Usage

28

29

```python

30

import cupy as cp

31

import numpy as np

32

33

# Create arrays on GPU

34

gpu_array = cp.array([1, 2, 3, 4])

35

gpu_zeros = cp.zeros((1000, 1000))

36

37

# NumPy-compatible operations

38

result = cp.sum(gpu_array)

39

matrix_mult = cp.dot(gpu_zeros, gpu_zeros.T)

40

41

# Transfer between CPU and GPU

42

cpu_array = cp.asnumpy(gpu_array) # GPU to CPU

43

gpu_from_cpu = cp.asarray(cpu_array) # CPU to GPU

44

45

# Mathematical operations

46

x = cp.linspace(0, 2*cp.pi, 1000)

47

y = cp.sin(x)

48

```

49

50

## Architecture

51

52

CuPy's architecture mirrors NumPy while providing GPU acceleration:

53

54

- **Core Array**: `cupy.ndarray` - GPU memory-resident array objects with NumPy-compatible interface

55

- **Mathematical Functions**: Element-wise and reduction operations leveraging CUDA kernels

56

- **Memory Management**: Automatic memory pooling with configurable allocators for optimal GPU memory usage

57

- **Stream Processing**: Asynchronous execution support through CUDA streams

58

- **Kernel Integration**: Custom CUDA kernel support via RawKernel and ElementwiseKernel

59

- **SciPy Extensions**: `cupyx.scipy` provides GPU-accelerated SciPy-compatible functions

60

61

## Capabilities

62

63

### Array Operations

64

65

Core array creation, manipulation, and mathematical operations that form the foundation of GPU-accelerated NumPy-compatible computing.

66

67

```python { .api }

68

def array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...

69

def zeros(shape, dtype=float, order='C'): ...

70

def ones(shape, dtype=None, order='C'): ...

71

def empty(shape, dtype=float, order='C'): ...

72

def arange(start, stop=None, step=1, dtype=None): ...

73

def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...

74

```

75

76

[Array Operations](./array-operations.md)

77

78

### Mathematical Functions

79

80

Comprehensive mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, and statistical functions optimized for GPU execution.

81

82

```python { .api }

83

def sin(x, out=None, **kwargs): ...

84

def cos(x, out=None, **kwargs): ...

85

def exp(x, out=None, **kwargs): ...

86

def log(x, out=None, **kwargs): ...

87

def sqrt(x, out=None, **kwargs): ...

88

def sum(a, axis=None, dtype=None, out=None, keepdims=False): ...

89

def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...

90

```

91

92

[Mathematical Functions](./mathematical-functions.md)

93

94

### Linear Algebra

95

96

GPU-accelerated linear algebra operations including matrix multiplication, decomposition, eigenvalue computation, and solving linear systems.

97

98

```python { .api }

99

def dot(a, b, out=None): ...

100

def matmul(x1, x2, out=None): ...

101

def einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe', optimize=False): ...

102

```

103

104

[Linear Algebra](./linear-algebra.md)

105

106

### Fast Fourier Transform

107

108

GPU-accelerated FFT operations supporting 1D, 2D, and N-D transforms with both forward and inverse operations.

109

110

```python { .api }

111

def fft(a, n=None, axis=-1, norm=None): ...

112

def ifft(a, n=None, axis=-1, norm=None): ...

113

def fft2(a, s=None, axes=(-2, -1), norm=None): ...

114

def fftn(a, s=None, axes=None, norm=None): ...

115

```

116

117

[Fast Fourier Transform](./fft.md)

118

119

### Random Number Generation

120

121

Comprehensive random number generation including uniform, normal, and specialized distributions, all optimized for GPU parallel execution.

122

123

```python { .api }

124

def random(size=None, dtype=float, out=None): ...

125

def normal(loc=0.0, scale=1.0, size=None, dtype=float): ...

126

def uniform(low=0.0, high=1.0, size=None, dtype=float): ...

127

def choice(a, size=None, replace=True, p=None): ...

128

```

129

130

[Random Number Generation](./random.md)

131

132

### CUDA Integration

133

134

Direct CUDA device management, memory operations, kernel execution, and stream processing for advanced GPU programming.

135

136

```python { .api }

137

class Device:

138

def __init__(self, device=None): ...

139

def use(self): ...

140

141

def get_device_id(): ...

142

def synchronize(): ...

143

```

144

145

[CUDA Integration](./cuda-integration.md)

146

147

### SciPy Extensions

148

149

GPU-accelerated SciPy-compatible functions including sparse matrices, signal processing, image processing, optimization, and statistical operations.

150

151

```python { .api }

152

# Available through cupyx.scipy

153

import cupyx.scipy as scipy

154

```

155

156

[SciPy Extensions](./scipy-extensions.md)

157

158

### Custom Kernel Development

159

160

Advanced CUDA kernel development enabling custom element-wise operations, reduction kernels, and raw CUDA programming for maximum performance and specialized computational tasks.

161

162

```python { .api }

163

class ElementwiseKernel:

164

def __init__(self, in_params, out_params, operation, name="kernel", **kwargs): ...

165

def __call__(self, *args, **kwargs): ...

166

167

class ReductionKernel:

168

def __init__(self, in_params, out_params, map_expr, reduce_expr, **kwargs): ...

169

def __call__(self, *args, **kwargs): ...

170

171

class RawKernel:

172

def __init__(self, code, name, **kwargs): ...

173

def __call__(self, grid, block, args=(), shared_mem=0, stream=None): ...

174

```

175

176

[Custom Kernel Development](./custom-kernels.md)

177

178

### JIT Compilation

179

180

Just-in-time compilation of Python functions to GPU kernels, enabling high-performance GPU programming with Python syntax and automatic optimization.

181

182

```python { .api }

183

def rawkernel(device=False): ...

184

def kernel(grid=None, block=None, shared_mem=0): ...

185

def elementwise(signature): ...

186

def reduction(signature, identity=None): ...

187

```

188

189

[JIT Compilation](./jit-compilation.md)

190

191

### Performance Profiling

192

193

Comprehensive performance analysis tools for measuring execution times, analyzing GPU utilization, memory usage profiling, and identifying optimization opportunities.

194

195

```python { .api }

196

def benchmark(func, args=(), kwargs=None, **params): ...

197

def time_range(): ...

198

def profile(): ...

199

def nvtx_push(message, color=None): ...

200

```

201

202

[Performance Profiling](./performance-profiling.md)

203

204

### Input/Output Operations

205

206

File I/O operations supporting various formats including binary, text, and compressed data with efficient GPU-CPU data transfer and memory management.

207

208

```python { .api }

209

def save(file, arr): ...

210

def load(file, **kwargs): ...

211

def loadtxt(fname, **kwargs): ...

212

def savetxt(fname, X, **kwargs): ...

213

```

214

215

[Input/Output Operations](./io-operations.md)

216

217

### Polynomial Operations

218

219

Mathematical operations with polynomials including arithmetic, evaluation, fitting, root finding, and advanced polynomial manipulations with support for various polynomial bases.

220

221

```python { .api }

222

class poly1d:

223

def __init__(self, c_or_r, r=False, variable=None): ...

224

def __call__(self, val): ...

225

226

def polyfit(x, y, deg, **kwargs): ...

227

def polyval(p, x): ...

228

def roots(p): ...

229

```

230

231

[Polynomial Operations](./polynomial-operations.md)

232

233

## Types

234

235

```python { .api }

236

class ndarray:

237

"""

238

GPU-resident N-dimensional array object compatible with NumPy arrays.

239

240

Attributes:

241

shape: tuple of ints - dimensions of the array

242

dtype: numpy.dtype - data type of array elements

243

size: int - total number of elements

244

ndim: int - number of dimensions

245

device: cupy.cuda.Device - GPU device containing the array

246

"""

247

def __init__(self, shape, dtype=float, memptr=None, strides=None, order='C'): ...

248

def get(self, stream=None, order='C', out=None): ...

249

def set(self, arr, stream=None): ...

250

def copy(self, order='C'): ...

251

def astype(self, dtype, order='K', casting='unsafe', subok=True, copy=True): ...

252

253

class ufunc:

254

"""Universal function for element-wise operations on arrays."""

255

def __call__(self, *args, **kwargs): ...

256

def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...

257

def accumulate(self, a, axis=0, dtype=None, out=None): ...

258

259

def asnumpy(a, stream=None, order='C', out=None, *, blocking=True) -> numpy.ndarray:

260

"""Convert CuPy array to NumPy array on CPU."""

261

262

def get_array_module(*args):

263

"""Return cupy if any argument is a CuPy array, otherwise numpy."""

264

```