Tessl Tile for pypi/cuda-python@13.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cuda-core.md device-memory.md driver-api.md gpu-direct-storage.md index.md jit-compilation.md kernels-streams.md library-management.md runtime-compilation.md

device-memory.mddocs/

0
# Device and Memory Management
1

2
Essential CUDA device enumeration, selection, and memory allocation operations including unified memory, streams, and events for efficient GPU resource management. This module provides the foundational operations for working with CUDA devices and managing memory across CPU and GPU address spaces.
3

4
## Capabilities
5

6
### Device Information and Selection
7

8
Query available CUDA devices, select active devices, and retrieve device properties for optimal resource allocation.
9

10
```python { .api }
11
def cudaGetDeviceCount() -> int:
12
    """
13
    Get the number of CUDA-capable devices.
14
    
15
    Returns:
16
        int: Number of available CUDA devices
17
    
18
    Raises:
19
        cudaError_t: If CUDA driver/runtime error occurs
20
    """
21

22
def cudaSetDevice(device: int) -> None:
23
    """
24
    Set the current CUDA device for subsequent operations.
25
    
26
    Args:
27
        device (int): Device ID (0-based index)
28
    
29
    Raises:
30
        cudaError_t: If device ID is invalid or device not available
31
    """
32

33
def cudaGetDevice() -> int:
34
    """
35
    Get the currently selected CUDA device.
36
    
37
    Returns:
38
        int: Currently active device ID
39
    """
40

41
def cudaDeviceReset() -> None:
42
    """
43
    Reset the current CUDA device and destroy all associated contexts.
44
    
45
    Note:
46
        This function should be called to ensure clean shutdown
47
    """
48

49
def cudaDeviceSynchronize() -> None:
50
    """
51
    Wait for all operations on the current device to complete.
52
    
53
    Note:
54
        Blocks until all preceding operations complete
55
    """
56

57
def cudaGetErrorString(error: cudaError_t) -> str:
58
    """
59
    Get a descriptive string for a CUDA error code.
60
    
61
    Args:
62
        error (cudaError_t): CUDA error code
63
    
64
    Returns:
65
        str: Human-readable error description
66
    """
67
```
68

69
### Device Properties and Attributes
70

71
Retrieve detailed device capabilities and specifications for performance optimization.
72

73
```python { .api }
74
def cudaGetDeviceProperties(device: int) -> cudaDeviceProp:
75
    """
76
    Get comprehensive properties of a CUDA device.
77
    
78
    Args:
79
        device (int): Device ID to query
80
    
81
    Returns:
82
        cudaDeviceProp: Device properties structure
83
    """
84

85
def cudaDeviceGetAttribute(attr: cudaDeviceAttr, device: int) -> int:
86
    """
87
    Get a specific attribute value for a CUDA device.
88
    
89
    Args:
90
        attr (cudaDeviceAttr): Attribute to query
91
        device (int): Device ID
92
    
93
    Returns:
94
        int: Attribute value
95
    """
96
```
97

98
### Memory Allocation
99

100
Allocate memory on both device (GPU) and host (CPU) with various allocation strategies.
101

102
```python { .api }
103
def cudaMalloc(size: int) -> int:
104
    """
105
    Allocate memory on the CUDA device.
106
    
107
    Args:
108
        size (int): Number of bytes to allocate
109
    
110
    Returns:
111
        int: Device memory pointer (as integer address)
112
    
113
    Raises:
114
        cudaError_t: If allocation fails (e.g., out of memory)
115
    """
116

117
def cudaMallocHost(size: int) -> int:
118
    """
119
    Allocate page-locked (pinned) host memory.
120
    
121
    Args:
122
        size (int): Number of bytes to allocate
123
    
124
    Returns:
125
        int: Host memory pointer (as integer address)
126
    
127
    Note:
128
        Pinned memory enables faster host-device transfers
129
    """
130

131
def cudaMallocManaged(size: int, flags: int = 0) -> int:
132
    """
133
    Allocate unified memory accessible from both CPU and GPU.
134
    
135
    Args:
136
        size (int): Number of bytes to allocate
137
        flags (int): Allocation flags (optional)
138
    
139
    Returns:
140
        int: Unified memory pointer
141
    """
142

143
def cudaHostAlloc(size: int, flags: int) -> int:
144
    """
145
    Allocate host memory with specific allocation flags.
146
    
147
    Args:
148
        size (int): Number of bytes to allocate
149
        flags (int): Allocation flags (cudaHostAllocDefault, etc.)
150
    
151
    Returns:
152
        int: Host memory pointer
153
    """
154
```
155

156
### Memory Deallocation
157

158
Free allocated memory resources on both device and host.
159

160
```python { .api }
161
def cudaFree(devPtr: int) -> None:
162
    """
163
    Free device memory allocated with cudaMalloc.
164
    
165
    Args:
166
        devPtr (int): Device pointer to free
167
    """
168

169
def cudaFreeHost(ptr: int) -> None:
170
    """
171
    Free host memory allocated with cudaMallocHost or cudaHostAlloc.
172
    
173
    Args:
174
        ptr (int): Host pointer to free
175
    """
176

177
def cudaHostUnregister(ptr: int) -> None:
178
    """
179
    Unregister previously registered host memory.
180
    
181
    Args:
182
        ptr (int): Host pointer to unregister
183
    """
184
```
185

186
### Memory Transfer Operations
187

188
Copy data between host and device memory with various transfer directions and modes.
189

190
```python { .api }
191
def cudaMemcpy(dst, src, count: int, kind: cudaMemcpyKind) -> None:
192
    """
193
    Copy memory between host and device synchronously.
194
    
195
    Args:
196
        dst: Destination pointer
197
        src: Source pointer  
198
        count (int): Number of bytes to copy
199
        kind (cudaMemcpyKind): Copy direction
200
    
201
    Note:
202
        Blocks until copy completes
203
    """
204

205
def cudaMemcpyAsync(dst, src, count: int, kind: cudaMemcpyKind, stream: int) -> None:
206
    """
207
    Copy memory between host and device asynchronously.
208
    
209
    Args:
210
        dst: Destination pointer
211
        src: Source pointer
212
        count (int): Number of bytes to copy
213
        kind (cudaMemcpyKind): Copy direction
214
        stream (int): CUDA stream for asynchronous execution
215
    """
216

217
def cudaMemset(devPtr: int, value: int, count: int) -> None:
218
    """
219
    Set device memory to a specific value.
220
    
221
    Args:
222
        devPtr (int): Device pointer
223
        value (int): Value to set (0-255)
224
        count (int): Number of bytes to set
225
    """
226

227
def cudaMemsetAsync(devPtr: int, value: int, count: int, stream: int) -> None:
228
    """
229
    Set device memory to a specific value asynchronously.
230
    
231
    Args:
232
        devPtr (int): Device pointer
233
        value (int): Value to set (0-255)
234
        count (int): Number of bytes to set
235
        stream (int): CUDA stream for asynchronous execution
236
    """
237
```
238

239
### Memory Information
240

241
Query memory usage and availability on CUDA devices.
242

243
```python { .api }
244
def cudaMemGetInfo() -> tuple:
245
    """
246
    Get memory information for the current device.
247
    
248
    Returns:
249
        tuple[int, int]: (free_memory, total_memory) in bytes
250
    """
251

252
def cudaPointerGetAttributes(ptr: int) -> cudaPointerAttributes:
253
    """
254
    Get attributes of a memory pointer.
255
    
256
    Args:
257
        ptr (int): Memory pointer to query
258
    
259
    Returns:
260
        cudaPointerAttributes: Pointer attributes structure
261
    """
262
```
263

264
## Types
265

266
### Memory Copy Directions
267

268
```python { .api }
269
class cudaMemcpyKind:
270
    """Memory copy direction enumeration"""
271
    cudaMemcpyHostToHost: int
272
    cudaMemcpyHostToDevice: int  
273
    cudaMemcpyDeviceToHost: int
274
    cudaMemcpyDeviceToDevice: int
275
    cudaMemcpyDefault: int  # Infer direction automatically
276
```
277

278
### Device Attributes
279

280
```python { .api }
281
class cudaDeviceAttr:
282
    """CUDA device attribute enumeration"""
283
    cudaDevAttrMaxThreadsPerBlock: int
284
    cudaDevAttrMaxBlockDimX: int
285
    cudaDevAttrMaxBlockDimY: int
286
    cudaDevAttrMaxBlockDimZ: int
287
    cudaDevAttrMaxGridDimX: int
288
    cudaDevAttrMaxGridDimY: int
289
    cudaDevAttrMaxGridDimZ: int
290
    cudaDevAttrMaxSharedMemoryPerBlock: int
291
    cudaDevAttrTotalConstantMemory: int
292
    cudaDevAttrWarpSize: int
293
    cudaDevAttrMaxPitch: int
294
    cudaDevAttrMultiProcessorCount: int
295
    cudaDevAttrClockRate: int
296
    cudaDevAttrMemoryClockRate: int
297
    cudaDevAttrMemoryBusWidth: int
298
```
299

300
### Host Allocation Flags
301

302
```python { .api }
303
# Host allocation flag constants
304
cudaHostAllocDefault: int  # Default page-locked allocation
305
cudaHostAllocPortable: int  # Portable across CUDA contexts
306
cudaHostAllocMapped: int  # Map allocation into device address space
307
cudaHostAllocWriteCombined: int  # Write-combined memory
308
```
309

310
### Device Properties Structure
311

312
```python { .api }
313
class cudaDeviceProp:
314
    """CUDA device properties structure"""
315
    name: str  # Device name
316
    totalGlobalMem: int  # Global memory size in bytes
317
    sharedMemPerBlock: int  # Shared memory per block
318
    regsPerBlock: int  # Registers per block
319
    warpSize: int  # Warp size
320
    memPitch: int  # Maximum pitch in bytes
321
    maxThreadsPerBlock: int  # Maximum threads per block
322
    maxThreadsDim: tuple  # Maximum block dimensions
323
    maxGridSize: tuple  # Maximum grid dimensions
324
    clockRate: int  # Clock frequency in kHz
325
    totalConstMem: int  # Constant memory size
326
    major: int  # Compute capability major version
327
    minor: int  # Compute capability minor version
328
    multiProcessorCount: int  # Number of SMs
329
```
330

331
### Pointer Attributes
332

333
```python { .api }
334
class cudaPointerAttributes:
335
    """Memory pointer attributes structure"""
336
    type: int  # Memory type (host, device, managed)
337
    device: int  # Device where pointer resides
338
    devicePointer: int  # Device pointer value
339
    hostPointer: int  # Host pointer value
340
```
341

342
## Usage Examples
343

344
### Basic Device Management
345

346
```python
347
from cuda.bindings import runtime
348

349
# Check available devices
350
device_count = runtime.cudaGetDeviceCount()
351
print(f"Found {device_count} CUDA devices")
352

353
# Select and query device
354
runtime.cudaSetDevice(0)
355
current_device = runtime.cudaGetDevice()
356
print(f"Using device {current_device}")
357

358
# Get device properties
359
props = runtime.cudaGetDeviceProperties(0)
360
print(f"Device: {props.name}")
361
print(f"Compute Capability: {props.major}.{props.minor}")
362
print(f"Global Memory: {props.totalGlobalMem // (1024**3)} GB")
363
```
364

365
### Memory Operations
366

367
```python
368
from cuda.bindings import runtime
369

370
# Allocate memory
371
size = 1024 * 1024  # 1MB
372
device_ptr = runtime.cudaMalloc(size)
373
host_ptr = runtime.cudaMallocHost(size)
374

375
# Transfer data
376
runtime.cudaMemcpy(
377
    device_ptr, host_ptr, size,
378
    runtime.cudaMemcpyKind.cudaMemcpyHostToDevice
379
)
380

381
# Check memory usage
382
free_mem, total_mem = runtime.cudaMemGetInfo()
383
print(f"Free: {free_mem // (1024**2)} MB")
384
print(f"Total: {total_mem // (1024**2)} MB")
385

386
# Cleanup
387
runtime.cudaFree(device_ptr)
388
runtime.cudaFreeHost(host_ptr)
389
```

Version

Tile

Files

device-memory.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

device-memory.mddocs/