or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

algorithm-kernels.mddriver-api.mdgpu-arrays.mdindex.mdkernel-compilation.mdmath-functions.mdopengl-integration.mdrandom-numbers.md

driver-api.mddocs/

0

# Driver API

1

2

Low-level CUDA driver API access providing direct control over contexts, devices, memory, streams, and events. This forms the foundation for all GPU operations with Pythonic error handling and automatic resource management.

3

4

## Capabilities

5

6

### Initialization

7

8

Initialize the CUDA driver API. Must be called before any other CUDA operations.

9

10

```python { .api }

11

def init(flags: int = 0) -> None:

12

"""

13

Initialize the CUDA driver API.

14

15

Parameters:

16

- flags: int, initialization flags (typically 0)

17

18

Raises:

19

CudaError: If CUDA driver cannot be initialized

20

"""

21

```

22

23

### Device Management

24

25

Query and access CUDA-capable devices in the system.

26

27

```python { .api }

28

class Device:

29

@staticmethod

30

def count() -> int:

31

"""Return the number of CUDA-capable devices."""

32

33

def __init__(self, device_no: int):

34

"""Create device object for given device number."""

35

36

def name() -> str:

37

"""Return the device name."""

38

39

def compute_capability() -> tuple[int, int]:

40

"""Return compute capability as (major, minor) version."""

41

42

def total_memory() -> int:

43

"""Return total memory in bytes."""

44

45

def get_attribute(self, attr: int) -> int:

46

"""Get device attribute."""

47

48

def make_context(self, flags: int = 0) -> Context:

49

"""Create CUDA context on this device."""

50

```

51

52

### Context Management

53

54

Manage CUDA execution contexts which maintain state for a particular device.

55

56

```python { .api }

57

class Context:

58

def __init__(self, dev: Device, flags: int = 0):

59

"""Create new CUDA context."""

60

61

def push(self) -> None:

62

"""Push context onto current thread's context stack."""

63

64

def pop(self) -> Context:

65

"""Pop context from current thread's context stack."""

66

67

def get_device(self) -> Device:

68

"""Return device associated with this context."""

69

70

def synchronize(self) -> None:

71

"""Block until all operations complete."""

72

73

def detach(self) -> None:

74

"""Detach and destroy context."""

75

76

@staticmethod

77

def get_current() -> Context:

78

"""Get current context."""

79

```

80

81

### Memory Management

82

83

Allocate and manage GPU memory with automatic cleanup.

84

85

```python { .api }

86

def mem_alloc(size: int) -> DeviceAllocation:

87

"""

88

Allocate GPU memory.

89

90

Parameters:

91

- size: int, size in bytes

92

93

Returns:

94

DeviceAllocation: GPU memory allocation

95

"""

96

97

def mem_get_info() -> tuple[int, int]:

98

"""

99

Get memory information.

100

101

Returns:

102

tuple: (free_memory, total_memory) in bytes

103

"""

104

105

def memcpy_htod(dest: DeviceAllocation, src) -> None:

106

"""

107

Copy from host to device.

108

109

Parameters:

110

- dest: DeviceAllocation, destination GPU memory

111

- src: host memory (numpy array, bytes, etc.)

112

"""

113

114

def memcpy_dtoh(dest, src: DeviceAllocation) -> None:

115

"""

116

Copy from device to host.

117

118

Parameters:

119

- dest: host memory buffer

120

- src: DeviceAllocation, source GPU memory

121

"""

122

123

def memcpy_dtod(dest: DeviceAllocation, src: DeviceAllocation, size: int) -> None:

124

"""

125

Copy from device to device.

126

127

Parameters:

128

- dest: DeviceAllocation, destination GPU memory

129

- src: DeviceAllocation, source GPU memory

130

- size: int, number of bytes to copy

131

"""

132

133

class DeviceAllocation:

134

"""GPU memory allocation with automatic cleanup."""

135

136

def __int__(self) -> int:

137

"""Return memory address as integer."""

138

139

def __len__(self) -> int:

140

"""Return size in bytes."""

141

142

def free(self) -> None:

143

"""Explicitly free GPU memory."""

144

145

def mem_host_alloc(size: int, flags: int = 0) -> HostAllocation:

146

"""

147

Allocate page-locked host memory.

148

149

Parameters:

150

- size: int, size in bytes

151

- flags: int, allocation flags

152

153

Returns:

154

HostAllocation: Page-locked host memory

155

"""

156

157

class HostAllocation:

158

"""Page-locked host memory allocation."""

159

160

def __len__(self) -> int:

161

"""Return size in bytes."""

162

163

def free(self) -> None:

164

"""Free host memory."""

165

```

166

167

### Stream Management

168

169

Manage CUDA streams for asynchronous operations and overlapping computation.

170

171

```python { .api }

172

class Stream:

173

def __init__(self, flags: int = 0):

174

"""

175

Create new CUDA stream.

176

177

Parameters:

178

- flags: int, stream creation flags

179

"""

180

181

def synchronize(self) -> None:

182

"""Block until all operations in stream complete."""

183

184

def is_done(self) -> bool:

185

"""Check if all operations in stream are complete."""

186

187

def wait_for_event(self, event: Event) -> None:

188

"""Make stream wait for event."""

189

```

190

191

### Event Management

192

193

Manage CUDA events for synchronization and timing measurements.

194

195

```python { .api }

196

class Event:

197

def __init__(self, flags: int = 0):

198

"""

199

Create new CUDA event.

200

201

Parameters:

202

- flags: int, event creation flags

203

"""

204

205

def record(self, stream: Stream = None) -> None:

206

"""

207

Record event in stream.

208

209

Parameters:

210

- stream: Stream, stream to record in (default stream if None)

211

"""

212

213

def synchronize(self) -> None:

214

"""Block until event is recorded."""

215

216

def query(self) -> bool:

217

"""Check if event has been recorded."""

218

219

def time_since(self, start_event: Event) -> float:

220

"""

221

Get elapsed time since start event.

222

223

Parameters:

224

- start_event: Event, starting event

225

226

Returns:

227

float: elapsed time in milliseconds

228

"""

229

230

def time_till(self, end_event: Event) -> float:

231

"""

232

Get time until end event.

233

234

Parameters:

235

- end_event: Event, ending event

236

237

Returns:

238

float: time until end event in milliseconds

239

"""

240

```

241

242

### Module and Function Loading

243

244

Load compiled CUDA modules and access kernel functions.

245

246

```python { .api }

247

class Module:

248

def __init__(self, image: bytes):

249

"""

250

Load module from compiled image.

251

252

Parameters:

253

- image: bytes, compiled CUDA module (cubin/ptx)

254

"""

255

256

def get_function(self, name: str) -> Function:

257

"""

258

Get kernel function by name.

259

260

Parameters:

261

- name: str, function name

262

263

Returns:

264

Function: kernel function object

265

"""

266

267

def get_global(self, name: str) -> tuple[DeviceAllocation, int]:

268

"""

269

Get global variable.

270

271

Parameters:

272

- name: str, variable name

273

274

Returns:

275

tuple: (device_ptr, size_in_bytes)

276

"""

277

278

class Function:

279

"""CUDA kernel function."""

280

281

def __call__(self, *args, **kwargs) -> None:

282

"""

283

Launch kernel function.

284

285

Parameters:

286

- args: kernel arguments

287

- block: tuple, block dimensions (x, y, z)

288

- grid: tuple, grid dimensions (x, y, z)

289

- stream: Stream, stream to launch in (optional)

290

- shared: int, shared memory size (optional)

291

"""

292

293

def prepare(self, arg_types: list) -> PreparedFunction:

294

"""

295

Prepare function with argument types for faster launches.

296

297

Parameters:

298

- arg_types: list, argument type strings

299

300

Returns:

301

PreparedFunction: prepared function object

302

"""

303

304

class PreparedFunction:

305

"""Pre-compiled kernel function for faster launches."""

306

307

def __call__(self, *args, **kwargs) -> None:

308

"""Launch prepared function."""

309

310

def prepared_call(self, grid: tuple, block: tuple, *args) -> None:

311

"""Launch with explicit grid/block dimensions."""

312

313

def prepared_async_call(self, grid: tuple, block: tuple, stream: Stream, *args) -> None:

314

"""Launch asynchronously in stream."""

315

```

316

317

### Error Handling

318

319

All CUDA errors are automatically translated into Python exceptions.

320

321

```python { .api }

322

class CudaError(Exception):

323

"""Base class for CUDA errors."""

324

pass

325

326

class CompileError(CudaError):

327

"""CUDA compilation error."""

328

pass

329

330

class MemoryError(CudaError):

331

"""CUDA memory error."""

332

pass

333

334

class LaunchError(CudaError):

335

"""CUDA kernel launch error."""

336

pass

337

```

338

339

## Constants

340

341

```python { .api }

342

# Context creation flags

343

ctx_flags = SimpleNamespace(

344

SCHED_AUTO=0,

345

SCHED_SPIN=1,

346

SCHED_YIELD=2,

347

SCHED_BLOCKING_SYNC=4,

348

MAP_HOST=8,

349

LMEM_RESIZE_TO_MAX=16

350

)

351

352

# Memory allocation flags

353

host_alloc_flags = SimpleNamespace(

354

PORTABLE=1,

355

DEVICE_MAP=2,

356

WRITE_COMBINED=4

357

)

358

359

# Event creation flags

360

event_flags = SimpleNamespace(

361

DEFAULT=0,

362

BLOCKING_SYNC=1,

363

DISABLE_TIMING=2,

364

INTERPROCESS=4

365

)

366

367

# Stream flags

368

stream_flags = SimpleNamespace(

369

DEFAULT=0,

370

NON_BLOCKING=1

371

)

372

```