0
# Driver API
1
2
Low-level CUDA driver API access providing direct control over contexts, devices, memory, streams, and events. This forms the foundation for all GPU operations with Pythonic error handling and automatic resource management.
3
4
## Capabilities
5
6
### Initialization
7
8
Initialize the CUDA driver API. Must be called before any other CUDA operations.
9
10
```python { .api }
11
def init(flags: int = 0) -> None:
12
"""
13
Initialize the CUDA driver API.
14
15
Parameters:
16
- flags: int, initialization flags (typically 0)
17
18
Raises:
19
CudaError: If CUDA driver cannot be initialized
20
"""
21
```
22
23
### Device Management
24
25
Query and access CUDA-capable devices in the system.
26
27
```python { .api }
28
class Device:
29
@staticmethod
30
def count() -> int:
31
"""Return the number of CUDA-capable devices."""
32
33
def __init__(self, device_no: int):
34
"""Create device object for given device number."""
35
36
def name() -> str:
37
"""Return the device name."""
38
39
def compute_capability() -> tuple[int, int]:
40
"""Return compute capability as (major, minor) version."""
41
42
def total_memory() -> int:
43
"""Return total memory in bytes."""
44
45
def get_attribute(self, attr: int) -> int:
46
"""Get device attribute."""
47
48
def make_context(self, flags: int = 0) -> Context:
49
"""Create CUDA context on this device."""
50
```
51
52
### Context Management
53
54
Manage CUDA execution contexts which maintain state for a particular device.
55
56
```python { .api }
57
class Context:
58
def __init__(self, dev: Device, flags: int = 0):
59
"""Create new CUDA context."""
60
61
def push(self) -> None:
62
"""Push context onto current thread's context stack."""
63
64
def pop(self) -> Context:
65
"""Pop context from current thread's context stack."""
66
67
def get_device(self) -> Device:
68
"""Return device associated with this context."""
69
70
def synchronize(self) -> None:
71
"""Block until all operations complete."""
72
73
def detach(self) -> None:
74
"""Detach and destroy context."""
75
76
@staticmethod
77
def get_current() -> Context:
78
"""Get current context."""
79
```
80
81
### Memory Management
82
83
Allocate and manage GPU memory with automatic cleanup.
84
85
```python { .api }
86
def mem_alloc(size: int) -> DeviceAllocation:
87
"""
88
Allocate GPU memory.
89
90
Parameters:
91
- size: int, size in bytes
92
93
Returns:
94
DeviceAllocation: GPU memory allocation
95
"""
96
97
def mem_get_info() -> tuple[int, int]:
98
"""
99
Get memory information.
100
101
Returns:
102
tuple: (free_memory, total_memory) in bytes
103
"""
104
105
def memcpy_htod(dest: DeviceAllocation, src) -> None:
106
"""
107
Copy from host to device.
108
109
Parameters:
110
- dest: DeviceAllocation, destination GPU memory
111
- src: host memory (numpy array, bytes, etc.)
112
"""
113
114
def memcpy_dtoh(dest, src: DeviceAllocation) -> None:
115
"""
116
Copy from device to host.
117
118
Parameters:
119
- dest: host memory buffer
120
- src: DeviceAllocation, source GPU memory
121
"""
122
123
def memcpy_dtod(dest: DeviceAllocation, src: DeviceAllocation, size: int) -> None:
124
"""
125
Copy from device to device.
126
127
Parameters:
128
- dest: DeviceAllocation, destination GPU memory
129
- src: DeviceAllocation, source GPU memory
130
- size: int, number of bytes to copy
131
"""
132
133
class DeviceAllocation:
134
"""GPU memory allocation with automatic cleanup."""
135
136
def __int__(self) -> int:
137
"""Return memory address as integer."""
138
139
def __len__(self) -> int:
140
"""Return size in bytes."""
141
142
def free(self) -> None:
143
"""Explicitly free GPU memory."""
144
145
def mem_host_alloc(size: int, flags: int = 0) -> HostAllocation:
146
"""
147
Allocate page-locked host memory.
148
149
Parameters:
150
- size: int, size in bytes
151
- flags: int, allocation flags
152
153
Returns:
154
HostAllocation: Page-locked host memory
155
"""
156
157
class HostAllocation:
158
"""Page-locked host memory allocation."""
159
160
def __len__(self) -> int:
161
"""Return size in bytes."""
162
163
def free(self) -> None:
164
"""Free host memory."""
165
```
166
167
### Stream Management
168
169
Manage CUDA streams for asynchronous operations and overlapping computation.
170
171
```python { .api }
172
class Stream:
173
def __init__(self, flags: int = 0):
174
"""
175
Create new CUDA stream.
176
177
Parameters:
178
- flags: int, stream creation flags
179
"""
180
181
def synchronize(self) -> None:
182
"""Block until all operations in stream complete."""
183
184
def is_done(self) -> bool:
185
"""Check if all operations in stream are complete."""
186
187
def wait_for_event(self, event: Event) -> None:
188
"""Make stream wait for event."""
189
```
190
191
### Event Management
192
193
Manage CUDA events for synchronization and timing measurements.
194
195
```python { .api }
196
class Event:
197
def __init__(self, flags: int = 0):
198
"""
199
Create new CUDA event.
200
201
Parameters:
202
- flags: int, event creation flags
203
"""
204
205
def record(self, stream: Stream = None) -> None:
206
"""
207
Record event in stream.
208
209
Parameters:
210
- stream: Stream, stream to record in (default stream if None)
211
"""
212
213
def synchronize(self) -> None:
214
"""Block until event is recorded."""
215
216
def query(self) -> bool:
217
"""Check if event has been recorded."""
218
219
def time_since(self, start_event: Event) -> float:
220
"""
221
Get elapsed time since start event.
222
223
Parameters:
224
- start_event: Event, starting event
225
226
Returns:
227
float: elapsed time in milliseconds
228
"""
229
230
def time_till(self, end_event: Event) -> float:
231
"""
232
Get time until end event.
233
234
Parameters:
235
- end_event: Event, ending event
236
237
Returns:
238
float: time until end event in milliseconds
239
"""
240
```
241
242
### Module and Function Loading
243
244
Load compiled CUDA modules and access kernel functions.
245
246
```python { .api }
247
class Module:
248
def __init__(self, image: bytes):
249
"""
250
Load module from compiled image.
251
252
Parameters:
253
- image: bytes, compiled CUDA module (cubin/ptx)
254
"""
255
256
def get_function(self, name: str) -> Function:
257
"""
258
Get kernel function by name.
259
260
Parameters:
261
- name: str, function name
262
263
Returns:
264
Function: kernel function object
265
"""
266
267
def get_global(self, name: str) -> tuple[DeviceAllocation, int]:
268
"""
269
Get global variable.
270
271
Parameters:
272
- name: str, variable name
273
274
Returns:
275
tuple: (device_ptr, size_in_bytes)
276
"""
277
278
class Function:
279
"""CUDA kernel function."""
280
281
def __call__(self, *args, **kwargs) -> None:
282
"""
283
Launch kernel function.
284
285
Parameters:
286
- args: kernel arguments
287
- block: tuple, block dimensions (x, y, z)
288
- grid: tuple, grid dimensions (x, y, z)
289
- stream: Stream, stream to launch in (optional)
290
- shared: int, shared memory size (optional)
291
"""
292
293
def prepare(self, arg_types: list) -> PreparedFunction:
294
"""
295
Prepare function with argument types for faster launches.
296
297
Parameters:
298
- arg_types: list, argument type strings
299
300
Returns:
301
PreparedFunction: prepared function object
302
"""
303
304
class PreparedFunction:
305
"""Pre-compiled kernel function for faster launches."""
306
307
def __call__(self, *args, **kwargs) -> None:
308
"""Launch prepared function."""
309
310
def prepared_call(self, grid: tuple, block: tuple, *args) -> None:
311
"""Launch with explicit grid/block dimensions."""
312
313
def prepared_async_call(self, grid: tuple, block: tuple, stream: Stream, *args) -> None:
314
"""Launch asynchronously in stream."""
315
```
316
317
### Error Handling
318
319
All CUDA errors are automatically translated into Python exceptions.
320
321
```python { .api }
322
class CudaError(Exception):
323
"""Base class for CUDA errors."""
324
pass
325
326
class CompileError(CudaError):
327
"""CUDA compilation error."""
328
pass
329
330
class MemoryError(CudaError):
331
"""CUDA memory error."""
332
pass
333
334
class LaunchError(CudaError):
335
"""CUDA kernel launch error."""
336
pass
337
```
338
339
## Constants
340
341
```python { .api }
342
# Context creation flags
343
ctx_flags = SimpleNamespace(
344
SCHED_AUTO=0,
345
SCHED_SPIN=1,
346
SCHED_YIELD=2,
347
SCHED_BLOCKING_SYNC=4,
348
MAP_HOST=8,
349
LMEM_RESIZE_TO_MAX=16
350
)
351
352
# Memory allocation flags
353
host_alloc_flags = SimpleNamespace(
354
PORTABLE=1,
355
DEVICE_MAP=2,
356
WRITE_COMBINED=4
357
)
358
359
# Event creation flags
360
event_flags = SimpleNamespace(
361
DEFAULT=0,
362
BLOCKING_SYNC=1,
363
DISABLE_TIMING=2,
364
INTERPROCESS=4
365
)
366
367
# Stream flags
368
stream_flags = SimpleNamespace(
369
DEFAULT=0,
370
NON_BLOCKING=1
371
)
372
```