CUDA Python metapackage providing unified access to NVIDIA's CUDA platform from Python through comprehensive bindings and utilities
npx @tessl/cli install tessl/pypi-cuda-python@13.0.00
# CUDA Python
1
2
CUDA Python provides comprehensive access to NVIDIA's CUDA platform from Python through a unified metapackage that combines low-level CUDA C API bindings with high-level utilities. It enables end-to-end GPU development entirely in Python while maintaining access to the full breadth of CUDA functionality, serving as the authoritative entry point to NVIDIA's CUDA ecosystem for Python developers.
3
4
## Package Information
5
6
- **Package Name**: cuda-python
7
- **Package Type**: metapackage
8
- **Language**: Python
9
- **Installation**: `pip install cuda-python`
10
- **Complete Installation**: `pip install cuda-python[all]`
11
- **Components**:
12
- `cuda.core@0.3.3a0` - High-level Pythonic CUDA APIs (experimental)
13
- `cuda.bindings@13.0.1` - Low-level CUDA C API bindings
14
- `cuda.pathfinder@1.1.1a0` - NVIDIA library discovery utilities
15
16
## Core Imports
17
18
High-level Pythonic CUDA APIs (recommended for most users):
19
20
```python
21
# High-level device and memory management
22
from cuda.core.experimental import Device, Stream, Event
23
24
# Memory resources and buffers
25
from cuda.core.experimental import Buffer, DeviceMemoryResource
26
27
# Program compilation and kernel execution
28
from cuda.core.experimental import Program, Kernel, launch
29
30
# CUDA graphs for optimization
31
from cuda.core.experimental import Graph, GraphBuilder
32
```
33
34
Low-level CUDA C API bindings:
35
36
```python
37
# CUDA Runtime API
38
from cuda.bindings import runtime
39
40
# CUDA Driver API
41
from cuda.bindings import driver
42
43
# Runtime compilation
44
from cuda.bindings import nvrtc
45
46
# Library loading utilities
47
from cuda.pathfinder import load_nvidia_dynamic_lib
48
```
49
50
Package version information:
51
52
```python
53
import cuda.core.experimental
54
import cuda.bindings
55
import cuda.pathfinder
56
57
print(cuda.core.experimental.__version__) # "0.3.3a0"
58
print(cuda.bindings.__version__) # "13.0.1"
59
print(cuda.pathfinder.__version__) # "1.1.1a0"
60
```
61
62
## Basic Usage
63
64
Pythonic high-level approach (recommended):
65
66
```python
67
from cuda.core.experimental import Device, Stream, Buffer
68
import numpy as np
69
70
# Device management
71
device = Device(0) # Use first CUDA device
72
print(f"Using device: {device.name}")
73
74
# Memory management with high-level Buffer
75
host_data = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32)
76
buffer = Buffer.from_array(host_data, device=device)
77
78
# Stream management
79
stream = Stream(device)
80
81
# Synchronization
82
stream.synchronize()
83
```
84
85
Low-level approach for advanced users:
86
87
```python
88
from cuda.bindings import runtime
89
from cuda.pathfinder import load_nvidia_dynamic_lib
90
91
# Basic device management
92
device_count = runtime.cudaGetDeviceCount()
93
print(f"Available CUDA devices: {device_count}")
94
95
# Memory allocation and management
96
device_ptr = runtime.cudaMalloc(1024) # Allocate 1KB on device
97
host_ptr = runtime.cudaMallocHost(1024) # Allocate page-locked host memory
98
99
# Copy data between host and device
100
runtime.cudaMemcpy(
101
device_ptr, host_ptr, 1024,
102
runtime.cudaMemcpyKind.cudaMemcpyHostToDevice
103
)
104
105
# Synchronize and cleanup
106
runtime.cudaDeviceSynchronize()
107
runtime.cudaFree(device_ptr)
108
runtime.cudaFreeHost(host_ptr)
109
110
# Load NVIDIA libraries dynamically
111
cudart_lib = load_nvidia_dynamic_lib("cudart")
112
print(f"CUDA Runtime loaded from: {cudart_lib.abs_path}")
113
```
114
115
## Architecture
116
117
CUDA Python is structured as a metapackage that provides unified access to multiple specialized components:
118
119
### Core Components
120
121
- **cuda.core** (v0.3.3a0): Experimental high-level Pythonic APIs for idiomatic CUDA development
122
- **cuda.bindings** (v13.0.1): Low-level Python bindings to CUDA C APIs providing complete coverage of CUDA functionality
123
- **cuda.pathfinder** (v1.1.1a0): Utility library for discovering and loading NVIDIA CUDA libraries dynamically
124
125
### API Hierarchy
126
127
The package exposes APIs at multiple abstraction levels:
128
129
- **High-level Pythonic APIs** (`cuda.core.experimental`): Object-oriented CUDA interface with Device, Stream, Buffer, Program classes
130
- **Runtime API** (`cuda.bindings.runtime`): Direct bindings to CUDA Runtime C API
131
- **Driver API** (`cuda.bindings.driver`): Direct bindings to CUDA Driver C API
132
- **Compilation APIs**: Runtime compilation (NVRTC) and LLVM-based compilation (NVVM)
133
- **Utility APIs**: JIT linking, GPU Direct Storage, and library management
134
135
This layered approach allows developers to choose the appropriate abstraction level for their needs while maintaining interoperability between components.
136
137
## Capabilities
138
139
### High-Level Pythonic CUDA (cuda.core.experimental)
140
141
Object-oriented CUDA programming with automatic resource management and Pythonic interfaces for device management, memory allocation, stream handling, and kernel execution.
142
143
```python { .api }
144
# Device management
145
class Device:
146
def __init__(self, device_id: int = 0): ...
147
@property
148
def name(self) -> str: ...
149
@property
150
def compute_capability(self) -> tuple[int, int]: ...
151
def set_current(self) -> None: ...
152
153
# Memory management
154
class Buffer:
155
@classmethod
156
def from_array(cls, array, device: Device) -> Buffer: ...
157
def to_array(self) -> np.ndarray: ...
158
@property
159
def device(self) -> Device: ...
160
@property
161
def size(self) -> int: ...
162
163
# Stream and event management
164
class Stream:
165
def __init__(self, device: Device): ...
166
def synchronize(self) -> None: ...
167
def record(self, event: Event) -> None: ...
168
169
class Event:
170
def __init__(self, device: Device): ...
171
def synchronize(self) -> None: ...
172
def elapsed_time(self, end_event: Event) -> float: ...
173
174
# Program compilation and kernel execution
175
class Program:
176
def __init__(self, code: str, options: ProgramOptions): ...
177
def compile(self) -> None: ...
178
def get_kernel(self, name: str) -> Kernel: ...
179
180
class Kernel:
181
def launch(self, config: LaunchConfig, *args) -> None: ...
182
183
def launch(kernel: Kernel, config: LaunchConfig, *args) -> None: ...
184
```
185
186
[High-Level CUDA Core APIs](./cuda-core.md)
187
188
### Device and Memory Management (Low-Level)
189
190
Essential CUDA device enumeration, selection, and memory allocation operations including unified memory, streams, and events for efficient GPU resource management.
191
192
```python { .api }
193
# Device management
194
def cudaGetDeviceCount() -> int: ...
195
def cudaSetDevice(device: int) -> None: ...
196
def cudaGetDevice() -> int: ...
197
198
# Memory allocation
199
def cudaMalloc(size: int) -> int: ...
200
def cudaMallocHost(size: int) -> int: ...
201
def cudaMemcpy(dst, src, count: int, kind: cudaMemcpyKind) -> None: ...
202
def cudaFree(devPtr: int) -> None: ...
203
```
204
205
[Device and Memory Management](./device-memory.md)
206
207
### Kernel Execution and Streams
208
209
CUDA kernel launching, execution control and asynchronous stream management for optimal GPU utilization and performance.
210
211
```python { .api }
212
# Stream management
213
def cudaStreamCreate() -> int: ...
214
def cudaStreamSynchronize(stream: int) -> None: ...
215
def cudaLaunchKernel(func, gridDim, blockDim, args, sharedMem: int, stream: int) -> None: ...
216
217
# Event management
218
def cudaEventCreate() -> int: ...
219
def cudaEventRecord(event: int, stream: int) -> None: ...
220
def cudaEventSynchronize(event: int) -> None: ...
221
```
222
223
[Kernel Execution and Streams](./kernels-streams.md)
224
225
### Low-Level Driver API
226
227
Direct CUDA Driver API access for advanced GPU programming including context management, module loading, and fine-grained resource control.
228
229
```python { .api }
230
# Driver initialization and devices
231
def cuInit(flags: int) -> None: ...
232
def cuDeviceGet(ordinal: int) -> int: ...
233
def cuCtxCreate(flags: int, device: int) -> int: ...
234
235
# Module and function management
236
def cuModuleLoad(fname: str) -> int: ...
237
def cuModuleGetFunction(hmod: int, name: str) -> int: ...
238
def cuLaunchKernel(f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes: int, hStream: int, kernelParams, extra) -> None: ...
239
```
240
241
[Low-Level Driver API](./driver-api.md)
242
243
### Runtime Compilation
244
245
NVRTC runtime compilation of CUDA C++ source code to PTX and CUBIN formats for dynamic kernel generation and deployment.
246
247
```python { .api }
248
# Program creation and compilation
249
def nvrtcCreateProgram(src: str, name: str, numHeaders: int, headers: List[bytes], includeNames: List[bytes]) -> int: ...
250
def nvrtcCompileProgram(prog: int, numOptions: int, options: List[bytes]) -> None: ...
251
def nvrtcGetPTX(prog: int, ptx: str) -> None: ...
252
def nvrtcGetCUBIN(prog: int, cubin: str) -> None: ...
253
```
254
255
[Runtime Compilation](./runtime-compilation.md)
256
257
### JIT Compilation and Linking
258
259
NVVM LLVM-based compilation and NVJitLink just-in-time linking for advanced code generation workflows.
260
261
```python { .api }
262
# NVVM compilation
263
def create_program() -> int: ...
264
def compile_program(prog: int, num_options: int, options) -> None: ...
265
266
# NVJitLink linking
267
def create(num_options: int, options) -> int: ...
268
def add_data(handle: int, input_type: int, data: bytes, size: int, name: str) -> None: ...
269
def complete(handle: int) -> None: ...
270
```
271
272
[JIT Compilation and Linking](./jit-compilation.md)
273
274
### GPU Direct Storage
275
276
cuFile GPU Direct Storage API for high-performance direct GPU I/O operations bypassing CPU and system memory.
277
278
```python { .api }
279
# File handle management
280
def handle_register(descr: int) -> int: ...
281
def handle_deregister(fh: int) -> None: ...
282
283
# I/O operations
284
def read(fh: int, buf_ptr_base: int, size: int, file_offset: int, buf_ptr_offset: int) -> None: ...
285
def write(fh: int, buf_ptr_base: int, size: int, file_offset: int, buf_ptr_offset: int) -> None: ...
286
```
287
288
[GPU Direct Storage](./gpu-direct-storage.md)
289
290
### Library Management
291
292
Dynamic NVIDIA library loading and discovery utilities for runtime library management and version compatibility.
293
294
```python { .api }
295
def load_nvidia_dynamic_lib(libname: str) -> LoadedDL: ...
296
297
class LoadedDL:
298
abs_path: Optional[str]
299
was_already_loaded_from_elsewhere: bool
300
_handle_uint: int
301
```
302
303
[Library Management](./library-management.md)
304
305
## Types
306
307
### Core Enumerations
308
309
```python { .api }
310
class cudaError_t:
311
"""CUDA Runtime API error codes"""
312
cudaSuccess: int
313
cudaErrorInvalidValue: int
314
cudaErrorMemoryAllocation: int
315
# ... additional error codes
316
317
class cudaMemcpyKind:
318
"""Memory copy direction types"""
319
cudaMemcpyHostToHost: int
320
cudaMemcpyHostToDevice: int
321
cudaMemcpyDeviceToHost: int
322
cudaMemcpyDeviceToDevice: int
323
324
class CUresult:
325
"""CUDA Driver API result codes"""
326
CUDA_SUCCESS: int
327
CUDA_ERROR_INVALID_VALUE: int
328
CUDA_ERROR_OUT_OF_MEMORY: int
329
# ... additional result codes
330
```
331
332
### Device Attributes
333
334
```python { .api }
335
class cudaDeviceAttr:
336
"""CUDA device attribute enumeration"""
337
cudaDevAttrMaxThreadsPerBlock: int
338
cudaDevAttrMaxBlockDimX: int
339
cudaDevAttrMaxGridDimX: int
340
cudaDevAttrMaxSharedMemoryPerBlock: int
341
# ... additional device attributes
342
343
class CUdevice_attribute:
344
"""CUDA Driver API device attributes"""
345
CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK: int
346
CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X: int
347
# ... additional attributes
348
```
349
350
### Exception Classes
351
352
```python { .api }
353
class nvvmError(Exception):
354
"""NVVM compilation exception"""
355
pass
356
357
class nvJitLinkError(Exception):
358
"""NVJitLink exception"""
359
pass
360
361
class cuFileError(Exception):
362
"""cuFile operation exception"""
363
pass
364
365
class DynamicLibNotFoundError(Exception):
366
"""NVIDIA library not found exception"""
367
pass
368
```