or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-cuda-python

CUDA Python metapackage providing unified access to NVIDIA's CUDA platform from Python through comprehensive bindings and utilities

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/cuda-python@13.0.x

To install, run

npx @tessl/cli install tessl/pypi-cuda-python@13.0.0

0

# CUDA Python

1

2

CUDA Python provides comprehensive access to NVIDIA's CUDA platform from Python through a unified metapackage that combines low-level CUDA C API bindings with high-level utilities. It enables end-to-end GPU development entirely in Python while maintaining access to the full breadth of CUDA functionality, serving as the authoritative entry point to NVIDIA's CUDA ecosystem for Python developers.

3

4

## Package Information

5

6

- **Package Name**: cuda-python

7

- **Package Type**: metapackage

8

- **Language**: Python

9

- **Installation**: `pip install cuda-python`

10

- **Complete Installation**: `pip install cuda-python[all]`

11

- **Components**:

12

- `cuda.core@0.3.3a0` - High-level Pythonic CUDA APIs (experimental)

13

- `cuda.bindings@13.0.1` - Low-level CUDA C API bindings

14

- `cuda.pathfinder@1.1.1a0` - NVIDIA library discovery utilities

15

16

## Core Imports

17

18

High-level Pythonic CUDA APIs (recommended for most users):

19

20

```python

21

# High-level device and memory management

22

from cuda.core.experimental import Device, Stream, Event

23

24

# Memory resources and buffers

25

from cuda.core.experimental import Buffer, DeviceMemoryResource

26

27

# Program compilation and kernel execution

28

from cuda.core.experimental import Program, Kernel, launch

29

30

# CUDA graphs for optimization

31

from cuda.core.experimental import Graph, GraphBuilder

32

```

33

34

Low-level CUDA C API bindings:

35

36

```python

37

# CUDA Runtime API

38

from cuda.bindings import runtime

39

40

# CUDA Driver API

41

from cuda.bindings import driver

42

43

# Runtime compilation

44

from cuda.bindings import nvrtc

45

46

# Library loading utilities

47

from cuda.pathfinder import load_nvidia_dynamic_lib

48

```

49

50

Package version information:

51

52

```python

53

import cuda.core.experimental

54

import cuda.bindings

55

import cuda.pathfinder

56

57

print(cuda.core.experimental.__version__) # "0.3.3a0"

58

print(cuda.bindings.__version__) # "13.0.1"

59

print(cuda.pathfinder.__version__) # "1.1.1a0"

60

```

61

62

## Basic Usage

63

64

Pythonic high-level approach (recommended):

65

66

```python

67

from cuda.core.experimental import Device, Stream, Buffer

68

import numpy as np

69

70

# Device management

71

device = Device(0) # Use first CUDA device

72

print(f"Using device: {device.name}")

73

74

# Memory management with high-level Buffer

75

host_data = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32)

76

buffer = Buffer.from_array(host_data, device=device)

77

78

# Stream management

79

stream = Stream(device)

80

81

# Synchronization

82

stream.synchronize()

83

```

84

85

Low-level approach for advanced users:

86

87

```python

88

from cuda.bindings import runtime

89

from cuda.pathfinder import load_nvidia_dynamic_lib

90

91

# Basic device management

92

device_count = runtime.cudaGetDeviceCount()

93

print(f"Available CUDA devices: {device_count}")

94

95

# Memory allocation and management

96

device_ptr = runtime.cudaMalloc(1024) # Allocate 1KB on device

97

host_ptr = runtime.cudaMallocHost(1024) # Allocate page-locked host memory

98

99

# Copy data between host and device

100

runtime.cudaMemcpy(

101

device_ptr, host_ptr, 1024,

102

runtime.cudaMemcpyKind.cudaMemcpyHostToDevice

103

)

104

105

# Synchronize and cleanup

106

runtime.cudaDeviceSynchronize()

107

runtime.cudaFree(device_ptr)

108

runtime.cudaFreeHost(host_ptr)

109

110

# Load NVIDIA libraries dynamically

111

cudart_lib = load_nvidia_dynamic_lib("cudart")

112

print(f"CUDA Runtime loaded from: {cudart_lib.abs_path}")

113

```

114

115

## Architecture

116

117

CUDA Python is structured as a metapackage that provides unified access to multiple specialized components:

118

119

### Core Components

120

121

- **cuda.core** (v0.3.3a0): Experimental high-level Pythonic APIs for idiomatic CUDA development

122

- **cuda.bindings** (v13.0.1): Low-level Python bindings to CUDA C APIs providing complete coverage of CUDA functionality

123

- **cuda.pathfinder** (v1.1.1a0): Utility library for discovering and loading NVIDIA CUDA libraries dynamically

124

125

### API Hierarchy

126

127

The package exposes APIs at multiple abstraction levels:

128

129

- **High-level Pythonic APIs** (`cuda.core.experimental`): Object-oriented CUDA interface with Device, Stream, Buffer, Program classes

130

- **Runtime API** (`cuda.bindings.runtime`): Direct bindings to CUDA Runtime C API

131

- **Driver API** (`cuda.bindings.driver`): Direct bindings to CUDA Driver C API

132

- **Compilation APIs**: Runtime compilation (NVRTC) and LLVM-based compilation (NVVM)

133

- **Utility APIs**: JIT linking, GPU Direct Storage, and library management

134

135

This layered approach allows developers to choose the appropriate abstraction level for their needs while maintaining interoperability between components.

136

137

## Capabilities

138

139

### High-Level Pythonic CUDA (cuda.core.experimental)

140

141

Object-oriented CUDA programming with automatic resource management and Pythonic interfaces for device management, memory allocation, stream handling, and kernel execution.

142

143

```python { .api }

144

# Device management

145

class Device:

146

def __init__(self, device_id: int = 0): ...

147

@property

148

def name(self) -> str: ...

149

@property

150

def compute_capability(self) -> tuple[int, int]: ...

151

def set_current(self) -> None: ...

152

153

# Memory management

154

class Buffer:

155

@classmethod

156

def from_array(cls, array, device: Device) -> Buffer: ...

157

def to_array(self) -> np.ndarray: ...

158

@property

159

def device(self) -> Device: ...

160

@property

161

def size(self) -> int: ...

162

163

# Stream and event management

164

class Stream:

165

def __init__(self, device: Device): ...

166

def synchronize(self) -> None: ...

167

def record(self, event: Event) -> None: ...

168

169

class Event:

170

def __init__(self, device: Device): ...

171

def synchronize(self) -> None: ...

172

def elapsed_time(self, end_event: Event) -> float: ...

173

174

# Program compilation and kernel execution

175

class Program:

176

def __init__(self, code: str, options: ProgramOptions): ...

177

def compile(self) -> None: ...

178

def get_kernel(self, name: str) -> Kernel: ...

179

180

class Kernel:

181

def launch(self, config: LaunchConfig, *args) -> None: ...

182

183

def launch(kernel: Kernel, config: LaunchConfig, *args) -> None: ...

184

```

185

186

[High-Level CUDA Core APIs](./cuda-core.md)

187

188

### Device and Memory Management (Low-Level)

189

190

Essential CUDA device enumeration, selection, and memory allocation operations including unified memory, streams, and events for efficient GPU resource management.

191

192

```python { .api }

193

# Device management

194

def cudaGetDeviceCount() -> int: ...

195

def cudaSetDevice(device: int) -> None: ...

196

def cudaGetDevice() -> int: ...

197

198

# Memory allocation

199

def cudaMalloc(size: int) -> int: ...

200

def cudaMallocHost(size: int) -> int: ...

201

def cudaMemcpy(dst, src, count: int, kind: cudaMemcpyKind) -> None: ...

202

def cudaFree(devPtr: int) -> None: ...

203

```

204

205

[Device and Memory Management](./device-memory.md)

206

207

### Kernel Execution and Streams

208

209

CUDA kernel launching, execution control and asynchronous stream management for optimal GPU utilization and performance.

210

211

```python { .api }

212

# Stream management

213

def cudaStreamCreate() -> int: ...

214

def cudaStreamSynchronize(stream: int) -> None: ...

215

def cudaLaunchKernel(func, gridDim, blockDim, args, sharedMem: int, stream: int) -> None: ...

216

217

# Event management

218

def cudaEventCreate() -> int: ...

219

def cudaEventRecord(event: int, stream: int) -> None: ...

220

def cudaEventSynchronize(event: int) -> None: ...

221

```

222

223

[Kernel Execution and Streams](./kernels-streams.md)

224

225

### Low-Level Driver API

226

227

Direct CUDA Driver API access for advanced GPU programming including context management, module loading, and fine-grained resource control.

228

229

```python { .api }

230

# Driver initialization and devices

231

def cuInit(flags: int) -> None: ...

232

def cuDeviceGet(ordinal: int) -> int: ...

233

def cuCtxCreate(flags: int, device: int) -> int: ...

234

235

# Module and function management

236

def cuModuleLoad(fname: str) -> int: ...

237

def cuModuleGetFunction(hmod: int, name: str) -> int: ...

238

def cuLaunchKernel(f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes: int, hStream: int, kernelParams, extra) -> None: ...

239

```

240

241

[Low-Level Driver API](./driver-api.md)

242

243

### Runtime Compilation

244

245

NVRTC runtime compilation of CUDA C++ source code to PTX and CUBIN formats for dynamic kernel generation and deployment.

246

247

```python { .api }

248

# Program creation and compilation

249

def nvrtcCreateProgram(src: str, name: str, numHeaders: int, headers: List[bytes], includeNames: List[bytes]) -> int: ...

250

def nvrtcCompileProgram(prog: int, numOptions: int, options: List[bytes]) -> None: ...

251

def nvrtcGetPTX(prog: int, ptx: str) -> None: ...

252

def nvrtcGetCUBIN(prog: int, cubin: str) -> None: ...

253

```

254

255

[Runtime Compilation](./runtime-compilation.md)

256

257

### JIT Compilation and Linking

258

259

NVVM LLVM-based compilation and NVJitLink just-in-time linking for advanced code generation workflows.

260

261

```python { .api }

262

# NVVM compilation

263

def create_program() -> int: ...

264

def compile_program(prog: int, num_options: int, options) -> None: ...

265

266

# NVJitLink linking

267

def create(num_options: int, options) -> int: ...

268

def add_data(handle: int, input_type: int, data: bytes, size: int, name: str) -> None: ...

269

def complete(handle: int) -> None: ...

270

```

271

272

[JIT Compilation and Linking](./jit-compilation.md)

273

274

### GPU Direct Storage

275

276

cuFile GPU Direct Storage API for high-performance direct GPU I/O operations bypassing CPU and system memory.

277

278

```python { .api }

279

# File handle management

280

def handle_register(descr: int) -> int: ...

281

def handle_deregister(fh: int) -> None: ...

282

283

# I/O operations

284

def read(fh: int, buf_ptr_base: int, size: int, file_offset: int, buf_ptr_offset: int) -> None: ...

285

def write(fh: int, buf_ptr_base: int, size: int, file_offset: int, buf_ptr_offset: int) -> None: ...

286

```

287

288

[GPU Direct Storage](./gpu-direct-storage.md)

289

290

### Library Management

291

292

Dynamic NVIDIA library loading and discovery utilities for runtime library management and version compatibility.

293

294

```python { .api }

295

def load_nvidia_dynamic_lib(libname: str) -> LoadedDL: ...

296

297

class LoadedDL:

298

abs_path: Optional[str]

299

was_already_loaded_from_elsewhere: bool

300

_handle_uint: int

301

```

302

303

[Library Management](./library-management.md)

304

305

## Types

306

307

### Core Enumerations

308

309

```python { .api }

310

class cudaError_t:

311

"""CUDA Runtime API error codes"""

312

cudaSuccess: int

313

cudaErrorInvalidValue: int

314

cudaErrorMemoryAllocation: int

315

# ... additional error codes

316

317

class cudaMemcpyKind:

318

"""Memory copy direction types"""

319

cudaMemcpyHostToHost: int

320

cudaMemcpyHostToDevice: int

321

cudaMemcpyDeviceToHost: int

322

cudaMemcpyDeviceToDevice: int

323

324

class CUresult:

325

"""CUDA Driver API result codes"""

326

CUDA_SUCCESS: int

327

CUDA_ERROR_INVALID_VALUE: int

328

CUDA_ERROR_OUT_OF_MEMORY: int

329

# ... additional result codes

330

```

331

332

### Device Attributes

333

334

```python { .api }

335

class cudaDeviceAttr:

336

"""CUDA device attribute enumeration"""

337

cudaDevAttrMaxThreadsPerBlock: int

338

cudaDevAttrMaxBlockDimX: int

339

cudaDevAttrMaxGridDimX: int

340

cudaDevAttrMaxSharedMemoryPerBlock: int

341

# ... additional device attributes

342

343

class CUdevice_attribute:

344

"""CUDA Driver API device attributes"""

345

CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK: int

346

CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X: int

347

# ... additional attributes

348

```

349

350

### Exception Classes

351

352

```python { .api }

353

class nvvmError(Exception):

354

"""NVVM compilation exception"""

355

pass

356

357

class nvJitLinkError(Exception):

358

"""NVJitLink exception"""

359

pass

360

361

class cuFileError(Exception):

362

"""cuFile operation exception"""

363

pass

364

365

class DynamicLibNotFoundError(Exception):

366

"""NVIDIA library not found exception"""

367

pass

368

```