Tessl Tile for pypi/cupy-cuda113@9.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

array-operations.md cuda-integration.md cupy-extensions.md custom-kernels.md fft-operations.md index.md linear-algebra.md math-functions.md random-generation.md statistical-functions.md

index.mddocs/

0
# CuPy
1

2
CuPy is a NumPy & SciPy-compatible array library for GPU-accelerated computing with Python. It provides a drop-in replacement for NumPy arrays and mathematical functions, enabling existing NumPy/SciPy code to run on NVIDIA CUDA GPUs without modification while achieving significant performance improvements through GPU parallelization.
3

4
## Package Information
5

6
- **Package Name**: cupy-cuda113
7
- **Language**: Python
8
- **Installation**: `pip install cupy-cuda113`
9
- **License**: MIT
10
- **Documentation**: https://docs.cupy.dev/en/stable/
11

12
## Core Imports
13

14
```python
15
import cupy as cp
16
```
17

18
Common patterns for array creation and operations:
19

20
```python
21
import cupy as cp
22
import numpy as np
23

24
# Use CuPy as drop-in replacement for NumPy
25
arr = cp.array([1, 2, 3])
26
result = cp.sum(arr)
27
```
28

29
For CPU/GPU generic code:
30

31
```python
32
import cupy as cp
33

34
# Automatically select NumPy or CuPy based on input arrays
35
def generic_function(x):
36
    xp = cp.get_array_module(x)  # Returns cp or np
37
    return xp.sum(x)
38
```
39

40
## Basic Usage
41

42
```python
43
import cupy as cp
44
import numpy as np
45

46
# Create CuPy arrays (stored in GPU memory)
47
gpu_array = cp.array([1, 2, 3, 4, 5])
48
gpu_zeros = cp.zeros((3, 4))
49
gpu_random = cp.random.random((1000, 1000))
50

51
# Perform GPU-accelerated operations
52
result = cp.sum(gpu_array)
53
matrix_mult = cp.dot(gpu_random, gpu_random.T)
54

55
# Transfer between GPU and CPU
56
cpu_array = cp.asnumpy(gpu_array)  # GPU -> CPU
57
gpu_from_cpu = cp.asarray(cpu_array)  # CPU -> GPU
58

59
# Use with existing NumPy code - just change np to cp
60
x = cp.linspace(0, 2 * cp.pi, 1000)
61
y = cp.sin(x)
62
fft_result = cp.fft.fft(y)
63

64
# Memory management
65
pool = cp.get_default_memory_pool()
66
print(f"Used: {pool.used_bytes()}, Total: {pool.total_bytes()}")
67
```
68

69
## Architecture
70

71
CuPy's architecture enables seamless GPU acceleration:
72

73
- **Core Arrays**: `cupy.ndarray` provides NumPy-compatible arrays in GPU memory
74
- **CUDA Integration**: Direct access to CUDA streams, memory management, and device control
75
- **Automatic Memory Management**: Built-in memory pools for efficient GPU memory allocation
76
- **Kernel System**: Custom CUDA kernels through ElementwiseKernel, ReductionKernel, and RawKernel
77
- **Library Integration**: GPU-accelerated versions of cuBLAS, cuFFT, cuRAND, cuSOLVER, cuSPARSE
78
- **Extensions**: CuPy-X provides additional functionality including SciPy compatibility and JIT compilation
79

80
This design allows CuPy to serve as a complete GPU computing platform while maintaining NumPy API compatibility.
81

82
## Capabilities
83

84
### Array Creation and Manipulation
85

86
Core array creation functions and array manipulation operations that mirror NumPy's interface but operate on GPU memory, including shape manipulation, joining, splitting, and element rearrangement.
87

88
```python { .api }
89
def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0): ...
90
def zeros(shape, dtype=float64, order='C'): ...
91
def ones(shape, dtype=None, order='C'): ...
92
def empty(shape, dtype=float64, order='C'): ...
93
def arange(start, stop=None, step=1, dtype=None): ...
94
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): ...
95
def reshape(a, newshape, order='C'): ...
96
def concatenate(arrays, axis=0, out=None): ...
97
def transpose(a, axes=None): ...
98
```
99

100
[Array Operations](./array-operations.md)
101

102
### Mathematical Functions
103

104
Comprehensive mathematical operations including trigonometric, hyperbolic, exponential, logarithmic, arithmetic, and complex number functions, all GPU-accelerated and compatible with NumPy's mathematical function interface.
105

106
```python { .api }
107
def sin(x, out=None): ...
108
def cos(x, out=None): ...
109
def exp(x, out=None): ...
110
def log(x, out=None): ...
111
def add(x1, x2, out=None): ...
112
def multiply(x1, x2, out=None): ...
113
def sqrt(x, out=None): ...
114
def power(x1, x2, out=None): ...
115
```
116

117
[Mathematical Functions](./math-functions.md)
118

119
### Linear Algebra
120

121
GPU-accelerated linear algebra operations including matrix products, decompositions, eigenvalue computations, and system solving through cuBLAS and cuSOLVER integration.
122

123
```python { .api }
124
def dot(a, b, out=None): ...
125
def matmul(x1, x2, out=None): ...
126
def einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe', optimize=False): ...
127
```
128

129
From `cupy.linalg`:
130

131
```python { .api }
132
def norm(x, ord=None, axis=None, keepdims=False): ...
133
def svd(a, full_matrices=True, compute_uv=True, hermitian=False): ...
134
def solve(a, b): ...
135
def inv(a): ...
136
```
137

138
[Linear Algebra](./linear-algebra.md)
139

140
### CUDA Integration
141

142
Direct CUDA functionality including device management, stream control, memory management, and custom kernel execution for advanced GPU programming and performance optimization.
143

144
```python { .api }
145
class Device:
146
    def __init__(self, device=None): ...
147
    def __enter__(self): ...
148
    def __exit__(self, *args): ...
149

150
class Stream:
151
    def __init__(self, null=False, non_blocking=False, ptds=False): ...
152
    def synchronize(self): ...
153

154
class MemoryPool:
155
    def malloc(self, size): ...
156
    def free_all_blocks(self): ...
157
    def used_bytes(self): ...
158
```
159

160
[CUDA Integration](./cuda-integration.md)
161

162
### FFT Operations
163

164
Fast Fourier Transform operations through cuFFT integration, providing GPU-accelerated 1D, 2D, and N-dimensional transforms for both real and complex data.
165

166
```python { .api }
167
def fft(a, n=None, axis=-1, norm=None): ...
168
def ifft(a, n=None, axis=-1, norm=None): ...
169
def rfft(a, n=None, axis=-1, norm=None): ...
170
def fft2(a, s=None, axes=(-2, -1), norm=None): ...
171
def fftn(a, s=None, axes=None, norm=None): ...
172
```
173

174
[FFT Operations](./fft-operations.md)
175

176
### Random Number Generation
177

178
GPU-accelerated random number generation through cuRAND integration, supporting various probability distributions and random sampling operations with high performance on GPU.
179

180
```python { .api }
181
def random(size=None): ...
182
def randn(*size): ...
183
def randint(low, high=None, size=None, dtype='l'): ...
184
def normal(loc=0.0, scale=1.0, size=None): ...
185
def uniform(low=0.0, high=1.0, size=None): ...
186
```
187

188
[Random Number Generation](./random-generation.md)
189

190
### Custom Kernels
191

192
User-defined CUDA kernel creation through ElementwiseKernel, ReductionKernel, and RawKernel classes, enabling custom GPU operations and performance-critical computations.
193

194
```python { .api }
195
class ElementwiseKernel:
196
    def __init__(self, in_params, out_params, operation, name='kernel', **kwargs): ...
197
    def __call__(self, *args, **kwargs): ...
198

199
class ReductionKernel:
200
    def __init__(self, in_params, out_params, map_expr, reduce_expr, post_map_expr='', **kwargs): ...
201
    def __call__(self, *args, **kwargs): ...
202

203
class RawKernel:
204
    def __init__(self, code, name, **kwargs): ...
205
    def __call__(self, grid, block, args, **kwargs): ...
206
```
207

208
[Custom Kernels](./custom-kernels.md)
209

210
### Statistical Functions
211

212
Statistical operations and analyses including descriptive statistics, correlations, histograms, and probability computations, all optimized for GPU execution.
213

214
```python { .api }
215
def mean(a, axis=None, dtype=None, out=None, keepdims=False): ...
216
def std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
217
def var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): ...
218
def corrcoef(x, y=None, rowvar=True, bias=None, ddof=None): ...
219
def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): ...
220
```
221

222
[Statistical Functions](./statistical-functions.md)
223

224
### CuPy Extensions
225

226
Additional functionality through CuPy-X including SciPy compatibility, JIT compilation, specialized operations, and advanced GPU programming features.
227

228
```python { .api }
229
def scatter_add(a, indices, b, axis=None): ...
230
def rsqrt(x, out=None): ...
231
```
232

233
From `cupyx.jit`:
234

235
```python { .api }
236
def rawkernel(mode='python', device=False): ...
237
```
238

239
From `cupyx.scipy.sparse`:
240

241
```python { .api }
242
class csr_matrix: ...
243
class csc_matrix: ...
244
```
245

246
[CuPy Extensions](./cupy-extensions.md)
247

248
## Types
249

250
```python { .api }
251
class ndarray:
252
    """
253
    CuPy's core N-dimensional array class, stored in GPU memory.
254
    Compatible with NumPy arrays but operations run on GPU.
255
    """
256
    def __init__(self, shape, dtype=float64, memptr=None, strides=None, order='C'): ...
257
    def get(self, stream=None, order='C'): ...  # Transfer to CPU
258
    def set(self, arr, stream=None): ...  # Transfer from CPU
259
    @property
260
    def device(self): ...
261
    @property
262
    def dtype(self): ...
263
    @property
264
    def shape(self): ...
265
    @property
266
    def size(self): ...
267

268
class ufunc:
269
    """Universal function class for element-wise operations on GPU arrays."""
270
    def __call__(self, *args, **kwargs): ...
271
    def reduce(self, a, axis=0, dtype=None, out=None, keepdims=False): ...
272
    def accumulate(self, a, axis=0, dtype=None, out=None): ...
273
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/