or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-cupy-cuda113

CuPy: NumPy & SciPy-compatible array library for GPU-accelerated computing with Python that provides a drop-in replacement for NumPy/SciPy on NVIDIA CUDA platforms.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/cupy-cuda113@9.6.x

To install, run

npx @tessl/cli install tessl/pypi-cupy-cuda113@9.6.0

0

# CuPy

1

2

CuPy is a NumPy & SciPy-compatible array library for GPU-accelerated computing with Python. It provides a drop-in replacement for NumPy arrays and mathematical functions, enabling existing NumPy/SciPy code to run on NVIDIA CUDA GPUs without modification while achieving significant performance improvements through GPU parallelization.

3

4

## Package Information

5

6

- **Package Name**: cupy-cuda113

7

- **Language**: Python

8

- **Installation**: `pip install cupy-cuda113`

9

- **License**: MIT

10

- **Documentation**: https://docs.cupy.dev/en/stable/

11

12

## Core Imports

13

14

```python

15

import cupy as cp

16

```

17

18

Common patterns for array creation and operations:

19

20

```python

21

import cupy as cp

22

import numpy as np

23

24

# Use CuPy as drop-in replacement for NumPy

25

arr = cp.array([1, 2, 3])

26

result = cp.sum(arr)

27

```

28

29

For CPU/GPU generic code:

30

31

```python

32

import cupy as cp

33

34

# Automatically select NumPy or CuPy based on input arrays

35

def generic_function(x):

36

xp = cp.get_array_module(x) # Returns cp or np

37

return xp.sum(x)

38

```

39

40

## Basic Usage

41

42

```python

43

import cupy as cp

44

import numpy as np

45

46

# Create CuPy arrays (stored in GPU memory)

47

gpu_array = cp.array([1, 2, 3, 4, 5])

48

gpu_zeros = cp.zeros((3, 4))

49

gpu_random = cp.random.random((1000, 1000))

50

51

# Perform GPU-accelerated operations

52

result = cp.sum(gpu_array)

53

matrix_mult = cp.dot(gpu_random, gpu_random.T)

54

55

# Transfer between GPU and CPU

56

cpu_array = cp.asnumpy(gpu_array) # GPU -> CPU

57

gpu_from_cpu = cp.asarray(cpu_array) # CPU -> GPU

58

59

# Use with existing NumPy code - just change np to cp

60

x = cp.linspace(0, 2 * cp.pi, 1000)

61

y = cp.sin(x)

62

fft_result = cp.fft.fft(y)

63

64

# Memory management

65

pool = cp.get_default_memory_pool()

66

print(f"Used: {pool.used_bytes()}, Total: {pool.total_bytes()}")

67

```

68

69

## Architecture

70

71

CuPy's architecture enables seamless GPU acceleration:

72

73

- **Core Arrays**: `cupy.ndarray` provides NumPy-compatible arrays in GPU memory

74

- **CUDA Integration**: Direct access to CUDA streams, memory management, and device control

75

- **Automatic Memory Management**: Built-in memory pools for efficient GPU memory allocation

76

- **Kernel System**: Custom CUDA kernels through ElementwiseKernel, ReductionKernel, and RawKernel

77

- **Library Integration**: GPU-accelerated versions of cuBLAS, cuFFT, cuRAND, cuSOLVER, cuSPARSE

78

- **Extensions**: CuPy-X provides additional functionality including SciPy compatibility and JIT compilation

79

80

This design allows CuPy to serve as a complete GPU computing platform while maintaining NumPy API compatibility.

81

82

## Capabilities

83

84

### Array Creation and Manipulation

85

86

Core array creation functions and array manipulation operations that mirror NumPy's interface but operate on GPU memory, including shape manipulation, joining, splitting, and element rearrangement.

87

88

```python { .api }

89

def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...

90

def zeros(shape, dtype=float64, order='C'): ...

91

def ones(shape, dtype=None, order='C'): ...

92

def empty(shape, dtype=float64, order='C'): ...

93

def arange(start, stop=None, step=1, dtype=None): ...

94

def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...

95

def reshape(a, newshape, order='C'): ...

96

def concatenate(arrays, axis=0, out=None): ...

97

def transpose(a, axes=None): ...

98

```

99

100

[Array Operations](./array-operations.md)

101

102

### Mathematical Functions

103

104

Comprehensive mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and complex number functions, all GPU-accelerated and compatible with NumPy's mathematical function interface.

105

106

```python { .api }

107

def sin(x, out=None): ...

108

def cos(x, out=None): ...

109

def exp(x, out=None): ...

110

def log(x, out=None): ...

111

def add(x1, x2, out=None): ...

112

def multiply(x1, x2, out=None): ...

113

def sqrt(x, out=None): ...

114

def power(x1, x2, out=None): ...

115

```

116

117

[Mathematical Functions](./math-functions.md)

118

119

### Linear Algebra

120

121

GPU-accelerated linear algebra operations including matrix products, decompositions, eigenvalue computations, and system solving through cuBLAS and cuSOLVER integration.

122

123

```python { .api }

124

def dot(a, b, out=None): ...

125

def matmul(x1, x2, out=None): ...

126

def einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe', optimize=False): ...

127

```

128

129

From `cupy.linalg`:

130

131

```python { .api }

132

def norm(x, ord=None, axis=None, keepdims=False): ...

133

def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...

134

def solve(a, b): ...

135

def inv(a): ...

136

```

137

138

[Linear Algebra](./linear-algebra.md)

139

140

### CUDA Integration

141

142

Direct CUDA functionality including device management, stream control, memory management, and custom kernel execution for advanced GPU programming and performance optimization.

143

144

```python { .api }

145

class Device:

146

def __init__(self, device=None): ...

147

def __enter__(self): ...

148

def __exit__(self, *args): ...

149

150

class Stream:

151

def __init__(self, null=False, non_blocking=False, ptds=False): ...

152

def synchronize(self): ...

153

154

class MemoryPool:

155

def malloc(self, size): ...

156

def free_all_blocks(self): ...

157

def used_bytes(self): ...

158

```

159

160

[CUDA Integration](./cuda-integration.md)

161

162

### FFT Operations

163

164

Fast Fourier Transform operations through cuFFT integration, providing GPU-accelerated 1D, 2D, and N-dimensional transforms for both real and complex data.

165

166

```python { .api }

167

def fft(a, n=None, axis=-1, norm=None): ...

168

def ifft(a, n=None, axis=-1, norm=None): ...

169

def rfft(a, n=None, axis=-1, norm=None): ...

170

def fft2(a, s=None, axes=(-2, -1), norm=None): ...

171

def fftn(a, s=None, axes=None, norm=None): ...

172

```

173

174

[FFT Operations](./fft-operations.md)

175

176

### Random Number Generation

177

178

GPU-accelerated random number generation through cuRAND integration, supporting various probability distributions and random sampling operations with high performance on GPU.

179

180

```python { .api }

181

def random(size=None): ...

182

def randn(*size): ...

183

def randint(low, high=None, size=None, dtype='l'): ...

184

def normal(loc=0.0, scale=1.0, size=None): ...

185

def uniform(low=0.0, high=1.0, size=None): ...

186

```

187

188

[Random Number Generation](./random-generation.md)

189

190

### Custom Kernels

191

192

User-defined CUDA kernel creation through ElementwiseKernel, ReductionKernel, and RawKernel classes, enabling custom GPU operations and performance-critical computations.

193

194

```python { .api }

195

class ElementwiseKernel:

196

def __init__(self, in_params, out_params, operation, name='kernel', **kwargs): ...

197

def __call__(self, *args, **kwargs): ...

198

199

class ReductionKernel:

200

def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr='', **kwargs): ...

201

def __call__(self, *args, **kwargs): ...

202

203

class RawKernel:

204

def __init__(self, code, name, **kwargs): ...

205

def __call__(self, grid, block, args, **kwargs): ...

206

```

207

208

[Custom Kernels](./custom-kernels.md)

209

210

### Statistical Functions

211

212

Statistical operations and analyses including descriptive statistics, correlations, histograms, and probability computations, all optimized for GPU execution.

213

214

```python { .api }

215

def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...

216

def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...

217

def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...

218

def corrcoef(x, y=None, rowvar=True, bias=None, ddof=None): ...

219

def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): ...

220

```

221

222

[Statistical Functions](./statistical-functions.md)

223

224

### CuPy Extensions

225

226

Additional functionality through CuPy-X including SciPy compatibility, JIT compilation, specialized operations, and advanced GPU programming features.

227

228

```python { .api }

229

def scatter_add(a, indices, b, axis=None): ...

230

def rsqrt(x, out=None): ...

231

```

232

233

From `cupyx.jit`:

234

235

```python { .api }

236

def rawkernel(mode='python', device=False): ...

237

```

238

239

From `cupyx.scipy.sparse`:

240

241

```python { .api }

242

class csr_matrix: ...

243

class csc_matrix: ...

244

```

245

246

[CuPy Extensions](./cupy-extensions.md)

247

248

## Types

249

250

```python { .api }

251

class ndarray:

252

"""

253

CuPy's core N-dimensional array class, stored in GPU memory.

254

Compatible with NumPy arrays but operations run on GPU.

255

"""

256

def __init__(self, shape, dtype=float64, memptr=None, strides=None, order='C'): ...

257

def get(self, stream=None, order='C'): ... # Transfer to CPU

258

def set(self, arr, stream=None): ... # Transfer from CPU

259

@property

260

def device(self): ...

261

@property

262

def dtype(self): ...

263

@property

264

def shape(self): ...

265

@property

266

def size(self): ...

267

268

class ufunc:

269

"""Universal function class for element-wise operations on GPU arrays."""

270

def __call__(self, *args, **kwargs): ...

271

def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...

272

def accumulate(self, a, axis=0, dtype=None, out=None): ...

273

```