0
# Device and Memory Management
1
2
Essential CUDA device enumeration, selection, and memory allocation operations including unified memory, streams, and events for efficient GPU resource management. This module provides the foundational operations for working with CUDA devices and managing memory across CPU and GPU address spaces.
3
4
## Capabilities
5
6
### Device Information and Selection
7
8
Query available CUDA devices, select active devices, and retrieve device properties for optimal resource allocation.
9
10
```python { .api }
11
def cudaGetDeviceCount() -> int:
12
"""
13
Get the number of CUDA-capable devices.
14
15
Returns:
16
int: Number of available CUDA devices
17
18
Raises:
19
cudaError_t: If CUDA driver/runtime error occurs
20
"""
21
22
def cudaSetDevice(device: int) -> None:
23
"""
24
Set the current CUDA device for subsequent operations.
25
26
Args:
27
device (int): Device ID (0-based index)
28
29
Raises:
30
cudaError_t: If device ID is invalid or device not available
31
"""
32
33
def cudaGetDevice() -> int:
34
"""
35
Get the currently selected CUDA device.
36
37
Returns:
38
int: Currently active device ID
39
"""
40
41
def cudaDeviceReset() -> None:
42
"""
43
Reset the current CUDA device and destroy all associated contexts.
44
45
Note:
46
This function should be called to ensure clean shutdown
47
"""
48
49
def cudaDeviceSynchronize() -> None:
50
"""
51
Wait for all operations on the current device to complete.
52
53
Note:
54
Blocks until all preceding operations complete
55
"""
56
57
def cudaGetErrorString(error: cudaError_t) -> str:
58
"""
59
Get a descriptive string for a CUDA error code.
60
61
Args:
62
error (cudaError_t): CUDA error code
63
64
Returns:
65
str: Human-readable error description
66
"""
67
```
68
69
### Device Properties and Attributes
70
71
Retrieve detailed device capabilities and specifications for performance optimization.
72
73
```python { .api }
74
def cudaGetDeviceProperties(device: int) -> cudaDeviceProp:
75
"""
76
Get comprehensive properties of a CUDA device.
77
78
Args:
79
device (int): Device ID to query
80
81
Returns:
82
cudaDeviceProp: Device properties structure
83
"""
84
85
def cudaDeviceGetAttribute(attr: cudaDeviceAttr, device: int) -> int:
86
"""
87
Get a specific attribute value for a CUDA device.
88
89
Args:
90
attr (cudaDeviceAttr): Attribute to query
91
device (int): Device ID
92
93
Returns:
94
int: Attribute value
95
"""
96
```
97
98
### Memory Allocation
99
100
Allocate memory on both device (GPU) and host (CPU) with various allocation strategies.
101
102
```python { .api }
103
def cudaMalloc(size: int) -> int:
104
"""
105
Allocate memory on the CUDA device.
106
107
Args:
108
size (int): Number of bytes to allocate
109
110
Returns:
111
int: Device memory pointer (as integer address)
112
113
Raises:
114
cudaError_t: If allocation fails (e.g., out of memory)
115
"""
116
117
def cudaMallocHost(size: int) -> int:
118
"""
119
Allocate page-locked (pinned) host memory.
120
121
Args:
122
size (int): Number of bytes to allocate
123
124
Returns:
125
int: Host memory pointer (as integer address)
126
127
Note:
128
Pinned memory enables faster host-device transfers
129
"""
130
131
def cudaMallocManaged(size: int, flags: int = 0) -> int:
132
"""
133
Allocate unified memory accessible from both CPU and GPU.
134
135
Args:
136
size (int): Number of bytes to allocate
137
flags (int): Allocation flags (optional)
138
139
Returns:
140
int: Unified memory pointer
141
"""
142
143
def cudaHostAlloc(size: int, flags: int) -> int:
144
"""
145
Allocate host memory with specific allocation flags.
146
147
Args:
148
size (int): Number of bytes to allocate
149
flags (int): Allocation flags (cudaHostAllocDefault, etc.)
150
151
Returns:
152
int: Host memory pointer
153
"""
154
```
155
156
### Memory Deallocation
157
158
Free allocated memory resources on both device and host.
159
160
```python { .api }
161
def cudaFree(devPtr: int) -> None:
162
"""
163
Free device memory allocated with cudaMalloc.
164
165
Args:
166
devPtr (int): Device pointer to free
167
"""
168
169
def cudaFreeHost(ptr: int) -> None:
170
"""
171
Free host memory allocated with cudaMallocHost or cudaHostAlloc.
172
173
Args:
174
ptr (int): Host pointer to free
175
"""
176
177
def cudaHostUnregister(ptr: int) -> None:
178
"""
179
Unregister previously registered host memory.
180
181
Args:
182
ptr (int): Host pointer to unregister
183
"""
184
```
185
186
### Memory Transfer Operations
187
188
Copy data between host and device memory with various transfer directions and modes.
189
190
```python { .api }
191
def cudaMemcpy(dst, src, count: int, kind: cudaMemcpyKind) -> None:
192
"""
193
Copy memory between host and device synchronously.
194
195
Args:
196
dst: Destination pointer
197
src: Source pointer
198
count (int): Number of bytes to copy
199
kind (cudaMemcpyKind): Copy direction
200
201
Note:
202
Blocks until copy completes
203
"""
204
205
def cudaMemcpyAsync(dst, src, count: int, kind: cudaMemcpyKind, stream: int) -> None:
206
"""
207
Copy memory between host and device asynchronously.
208
209
Args:
210
dst: Destination pointer
211
src: Source pointer
212
count (int): Number of bytes to copy
213
kind (cudaMemcpyKind): Copy direction
214
stream (int): CUDA stream for asynchronous execution
215
"""
216
217
def cudaMemset(devPtr: int, value: int, count: int) -> None:
218
"""
219
Set device memory to a specific value.
220
221
Args:
222
devPtr (int): Device pointer
223
value (int): Value to set (0-255)
224
count (int): Number of bytes to set
225
"""
226
227
def cudaMemsetAsync(devPtr: int, value: int, count: int, stream: int) -> None:
228
"""
229
Set device memory to a specific value asynchronously.
230
231
Args:
232
devPtr (int): Device pointer
233
value (int): Value to set (0-255)
234
count (int): Number of bytes to set
235
stream (int): CUDA stream for asynchronous execution
236
"""
237
```
238
239
### Memory Information
240
241
Query memory usage and availability on CUDA devices.
242
243
```python { .api }
244
def cudaMemGetInfo() -> tuple:
245
"""
246
Get memory information for the current device.
247
248
Returns:
249
tuple[int, int]: (free_memory, total_memory) in bytes
250
"""
251
252
def cudaPointerGetAttributes(ptr: int) -> cudaPointerAttributes:
253
"""
254
Get attributes of a memory pointer.
255
256
Args:
257
ptr (int): Memory pointer to query
258
259
Returns:
260
cudaPointerAttributes: Pointer attributes structure
261
"""
262
```
263
264
## Types
265
266
### Memory Copy Directions
267
268
```python { .api }
269
class cudaMemcpyKind:
270
"""Memory copy direction enumeration"""
271
cudaMemcpyHostToHost: int
272
cudaMemcpyHostToDevice: int
273
cudaMemcpyDeviceToHost: int
274
cudaMemcpyDeviceToDevice: int
275
cudaMemcpyDefault: int # Infer direction automatically
276
```
277
278
### Device Attributes
279
280
```python { .api }
281
class cudaDeviceAttr:
282
"""CUDA device attribute enumeration"""
283
cudaDevAttrMaxThreadsPerBlock: int
284
cudaDevAttrMaxBlockDimX: int
285
cudaDevAttrMaxBlockDimY: int
286
cudaDevAttrMaxBlockDimZ: int
287
cudaDevAttrMaxGridDimX: int
288
cudaDevAttrMaxGridDimY: int
289
cudaDevAttrMaxGridDimZ: int
290
cudaDevAttrMaxSharedMemoryPerBlock: int
291
cudaDevAttrTotalConstantMemory: int
292
cudaDevAttrWarpSize: int
293
cudaDevAttrMaxPitch: int
294
cudaDevAttrMultiProcessorCount: int
295
cudaDevAttrClockRate: int
296
cudaDevAttrMemoryClockRate: int
297
cudaDevAttrMemoryBusWidth: int
298
```
299
300
### Host Allocation Flags
301
302
```python { .api }
303
# Host allocation flag constants
304
cudaHostAllocDefault: int # Default page-locked allocation
305
cudaHostAllocPortable: int # Portable across CUDA contexts
306
cudaHostAllocMapped: int # Map allocation into device address space
307
cudaHostAllocWriteCombined: int # Write-combined memory
308
```
309
310
### Device Properties Structure
311
312
```python { .api }
313
class cudaDeviceProp:
314
"""CUDA device properties structure"""
315
name: str # Device name
316
totalGlobalMem: int # Global memory size in bytes
317
sharedMemPerBlock: int # Shared memory per block
318
regsPerBlock: int # Registers per block
319
warpSize: int # Warp size
320
memPitch: int # Maximum pitch in bytes
321
maxThreadsPerBlock: int # Maximum threads per block
322
maxThreadsDim: tuple # Maximum block dimensions
323
maxGridSize: tuple # Maximum grid dimensions
324
clockRate: int # Clock frequency in kHz
325
totalConstMem: int # Constant memory size
326
major: int # Compute capability major version
327
minor: int # Compute capability minor version
328
multiProcessorCount: int # Number of SMs
329
```
330
331
### Pointer Attributes
332
333
```python { .api }
334
class cudaPointerAttributes:
335
"""Memory pointer attributes structure"""
336
type: int # Memory type (host, device, managed)
337
device: int # Device where pointer resides
338
devicePointer: int # Device pointer value
339
hostPointer: int # Host pointer value
340
```
341
342
## Usage Examples
343
344
### Basic Device Management
345
346
```python
347
from cuda.bindings import runtime
348
349
# Check available devices
350
device_count = runtime.cudaGetDeviceCount()
351
print(f"Found {device_count} CUDA devices")
352
353
# Select and query device
354
runtime.cudaSetDevice(0)
355
current_device = runtime.cudaGetDevice()
356
print(f"Using device {current_device}")
357
358
# Get device properties
359
props = runtime.cudaGetDeviceProperties(0)
360
print(f"Device: {props.name}")
361
print(f"Compute Capability: {props.major}.{props.minor}")
362
print(f"Global Memory: {props.totalGlobalMem // (1024**3)} GB")
363
```
364
365
### Memory Operations
366
367
```python
368
from cuda.bindings import runtime
369
370
# Allocate memory
371
size = 1024 * 1024 # 1MB
372
device_ptr = runtime.cudaMalloc(size)
373
host_ptr = runtime.cudaMallocHost(size)
374
375
# Transfer data
376
runtime.cudaMemcpy(
377
device_ptr, host_ptr, size,
378
runtime.cudaMemcpyKind.cudaMemcpyHostToDevice
379
)
380
381
# Check memory usage
382
free_mem, total_mem = runtime.cudaMemGetInfo()
383
print(f"Free: {free_mem // (1024**2)} MB")
384
print(f"Total: {total_mem // (1024**2)} MB")
385
386
# Cleanup
387
runtime.cudaFree(device_ptr)
388
runtime.cudaFreeHost(host_ptr)
389
```