Tessl Tile for pypi/pycuda@2025.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

algorithm-kernels.md driver-api.md gpu-arrays.md index.md kernel-compilation.md math-functions.md opengl-integration.md random-numbers.md

driver-api.mddocs/

0
# Driver API
1

2
Low-level CUDA driver API access providing direct control over contexts, devices, memory, streams, and events. This forms the foundation for all GPU operations with Pythonic error handling and automatic resource management.
3

4
## Capabilities
5

6
### Initialization
7

8
Initialize the CUDA driver API. Must be called before any other CUDA operations.
9

10
```python { .api }
11
def init(flags: int = 0) -> None:
12
    """
13
    Initialize the CUDA driver API.
14
    
15
    Parameters:
16
    - flags: int, initialization flags (typically 0)
17
    
18
    Raises:
19
    CudaError: If CUDA driver cannot be initialized
20
    """
21
```
22

23
### Device Management
24

25
Query and access CUDA-capable devices in the system.
26

27
```python { .api }
28
class Device:
29
    @staticmethod
30
    def count() -> int:
31
        """Return the number of CUDA-capable devices."""
32
    
33
    def __init__(self, device_no: int):
34
        """Create device object for given device number."""
35
    
36
    def name() -> str:
37
        """Return the device name."""
38
    
39
    def compute_capability() -> tuple[int, int]:
40
        """Return compute capability as (major, minor) version."""
41
    
42
    def total_memory() -> int:
43
        """Return total memory in bytes."""
44
    
45
    def get_attribute(self, attr: int) -> int:
46
        """Get device attribute."""
47
    
48
    def make_context(self, flags: int = 0) -> Context:
49
        """Create CUDA context on this device."""
50
```
51

52
### Context Management
53

54
Manage CUDA execution contexts which maintain state for a particular device.
55

56
```python { .api }
57
class Context:
58
    def __init__(self, dev: Device, flags: int = 0):
59
        """Create new CUDA context."""
60
    
61
    def push(self) -> None:
62
        """Push context onto current thread's context stack."""
63
    
64
    def pop(self) -> Context:
65
        """Pop context from current thread's context stack."""
66
    
67
    def get_device(self) -> Device:
68
        """Return device associated with this context."""
69
    
70
    def synchronize(self) -> None:
71
        """Block until all operations complete."""
72
    
73
    def detach(self) -> None:
74
        """Detach and destroy context."""
75
    
76
    @staticmethod
77
    def get_current() -> Context:
78
        """Get current context."""
79
```
80

81
### Memory Management
82

83
Allocate and manage GPU memory with automatic cleanup.
84

85
```python { .api }
86
def mem_alloc(size: int) -> DeviceAllocation:
87
    """
88
    Allocate GPU memory.
89
    
90
    Parameters:
91
    - size: int, size in bytes
92
    
93
    Returns:
94
    DeviceAllocation: GPU memory allocation
95
    """
96

97
def mem_get_info() -> tuple[int, int]:
98
    """
99
    Get memory information.
100
    
101
    Returns:
102
    tuple: (free_memory, total_memory) in bytes
103
    """
104

105
def memcpy_htod(dest: DeviceAllocation, src) -> None:
106
    """
107
    Copy from host to device.
108
    
109
    Parameters:
110
    - dest: DeviceAllocation, destination GPU memory
111
    - src: host memory (numpy array, bytes, etc.)
112
    """
113

114
def memcpy_dtoh(dest, src: DeviceAllocation) -> None:
115
    """
116
    Copy from device to host.
117
    
118
    Parameters:
119
    - dest: host memory buffer
120
    - src: DeviceAllocation, source GPU memory
121
    """
122

123
def memcpy_dtod(dest: DeviceAllocation, src: DeviceAllocation, size: int) -> None:
124
    """
125
    Copy from device to device.
126
    
127
    Parameters:
128
    - dest: DeviceAllocation, destination GPU memory
129
    - src: DeviceAllocation, source GPU memory  
130
    - size: int, number of bytes to copy
131
    """
132

133
class DeviceAllocation:
134
    """GPU memory allocation with automatic cleanup."""
135
    
136
    def __int__(self) -> int:
137
        """Return memory address as integer."""
138
    
139
    def __len__(self) -> int:
140
        """Return size in bytes."""
141
    
142
    def free(self) -> None:
143
        """Explicitly free GPU memory."""
144

145
def mem_host_alloc(size: int, flags: int = 0) -> HostAllocation:
146
    """
147
    Allocate page-locked host memory.
148
    
149
    Parameters:
150
    - size: int, size in bytes
151
    - flags: int, allocation flags
152
    
153
    Returns:
154
    HostAllocation: Page-locked host memory
155
    """
156

157
class HostAllocation:
158
    """Page-locked host memory allocation."""
159
    
160
    def __len__(self) -> int:
161
        """Return size in bytes."""
162
    
163
    def free(self) -> None:
164
        """Free host memory."""
165
```
166

167
### Stream Management
168

169
Manage CUDA streams for asynchronous operations and overlapping computation.
170

171
```python { .api }
172
class Stream:
173
    def __init__(self, flags: int = 0):
174
        """
175
        Create new CUDA stream.
176
        
177
        Parameters:
178
        - flags: int, stream creation flags
179
        """
180
    
181
    def synchronize(self) -> None:
182
        """Block until all operations in stream complete."""
183
    
184
    def is_done(self) -> bool:
185
        """Check if all operations in stream are complete."""
186
    
187
    def wait_for_event(self, event: Event) -> None:
188
        """Make stream wait for event."""
189
```
190

191
### Event Management
192

193
Manage CUDA events for synchronization and timing measurements.
194

195
```python { .api }
196
class Event:
197
    def __init__(self, flags: int = 0):
198
        """
199
        Create new CUDA event.
200
        
201
        Parameters:
202
        - flags: int, event creation flags
203
        """
204
    
205
    def record(self, stream: Stream = None) -> None:
206
        """
207
        Record event in stream.
208
        
209
        Parameters:
210
        - stream: Stream, stream to record in (default stream if None)
211
        """
212
    
213
    def synchronize(self) -> None:
214
        """Block until event is recorded."""
215
    
216
    def query(self) -> bool:
217
        """Check if event has been recorded."""
218
    
219
    def time_since(self, start_event: Event) -> float:
220
        """
221
        Get elapsed time since start event.
222
        
223
        Parameters:
224
        - start_event: Event, starting event
225
        
226
        Returns:
227
        float: elapsed time in milliseconds
228
        """
229

230
    def time_till(self, end_event: Event) -> float:
231
        """
232
        Get time until end event.
233
        
234
        Parameters:
235
        - end_event: Event, ending event
236
        
237
        Returns:
238
        float: time until end event in milliseconds
239
        """
240
```
241

242
### Module and Function Loading
243

244
Load compiled CUDA modules and access kernel functions.
245

246
```python { .api }
247
class Module:
248
    def __init__(self, image: bytes):
249
        """
250
        Load module from compiled image.
251
        
252
        Parameters:
253
        - image: bytes, compiled CUDA module (cubin/ptx)
254
        """
255
    
256
    def get_function(self, name: str) -> Function:
257
        """
258
        Get kernel function by name.
259
        
260
        Parameters:
261
        - name: str, function name
262
        
263
        Returns:
264
        Function: kernel function object
265
        """
266
    
267
    def get_global(self, name: str) -> tuple[DeviceAllocation, int]:
268
        """
269
        Get global variable.
270
        
271
        Parameters:
272
        - name: str, variable name
273
        
274
        Returns:
275
        tuple: (device_ptr, size_in_bytes)
276
        """
277

278
class Function:
279
    """CUDA kernel function."""
280
    
281
    def __call__(self, *args, **kwargs) -> None:
282
        """
283
        Launch kernel function.
284
        
285
        Parameters:
286
        - args: kernel arguments
287
        - block: tuple, block dimensions (x, y, z)
288
        - grid: tuple, grid dimensions (x, y, z)
289
        - stream: Stream, stream to launch in (optional)
290
        - shared: int, shared memory size (optional)
291
        """
292
    
293
    def prepare(self, arg_types: list) -> PreparedFunction:
294
        """
295
        Prepare function with argument types for faster launches.
296
        
297
        Parameters:
298
        - arg_types: list, argument type strings
299
        
300
        Returns:
301
        PreparedFunction: prepared function object
302
        """
303

304
class PreparedFunction:
305
    """Pre-compiled kernel function for faster launches."""
306
    
307
    def __call__(self, *args, **kwargs) -> None:
308
        """Launch prepared function."""
309
    
310
    def prepared_call(self, grid: tuple, block: tuple, *args) -> None:
311
        """Launch with explicit grid/block dimensions."""
312
    
313
    def prepared_async_call(self, grid: tuple, block: tuple, stream: Stream, *args) -> None:
314
        """Launch asynchronously in stream."""
315
```
316

317
### Error Handling
318

319
All CUDA errors are automatically translated into Python exceptions.
320

321
```python { .api }
322
class CudaError(Exception):
323
    """Base class for CUDA errors."""
324
    pass
325

326
class CompileError(CudaError):
327
    """CUDA compilation error."""
328
    pass
329

330
class MemoryError(CudaError):
331
    """CUDA memory error."""  
332
    pass
333

334
class LaunchError(CudaError):
335
    """CUDA kernel launch error."""
336
    pass
337
```
338

339
## Constants
340

341
```python { .api }
342
# Context creation flags
343
ctx_flags = SimpleNamespace(
344
    SCHED_AUTO=0,
345
    SCHED_SPIN=1,
346
    SCHED_YIELD=2,
347
    SCHED_BLOCKING_SYNC=4,
348
    MAP_HOST=8,
349
    LMEM_RESIZE_TO_MAX=16
350
)
351

352
# Memory allocation flags  
353
host_alloc_flags = SimpleNamespace(
354
    PORTABLE=1,
355
    DEVICE_MAP=2,
356
    WRITE_COMBINED=4
357
)
358

359
# Event creation flags
360
event_flags = SimpleNamespace(
361
    DEFAULT=0,
362
    BLOCKING_SYNC=1,
363
    DISABLE_TIMING=2,
364
    INTERPROCESS=4
365
)
366

367
# Stream flags
368
stream_flags = SimpleNamespace(
369
    DEFAULT=0,
370
    NON_BLOCKING=1
371
)
372
```

Version

Tile

Files

driver-api.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

driver-api.mddocs/