Tessl Tile for pypi/cuda-python@13.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-cuda-python

CUDA Python metapackage providing unified access to NVIDIA's CUDA platform from Python through comprehensive bindings and utilities

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/cuda-python@13.0.x

To install, run

npx @tessl/cli install tessl/pypi-cuda-python@13.0.0

0
# CUDA Python
1

2
CUDA Python provides comprehensive access to NVIDIA's CUDA platform from Python through a unified metapackage that combines low-level CUDA C API bindings with high-level utilities. It enables end-to-end GPU development entirely in Python while maintaining access to the full breadth of CUDA functionality, serving as the authoritative entry point to NVIDIA's CUDA ecosystem for Python developers.
3

4
## Package Information
5

6
- **Package Name**: cuda-python
7
- **Package Type**: metapackage
8
- **Language**: Python
9
- **Installation**: `pip install cuda-python`
10
- **Complete Installation**: `pip install cuda-python[all]`
11
- **Components**: 
12
  - `cuda.core@0.3.3a0` - High-level Pythonic CUDA APIs (experimental)
13
  - `cuda.bindings@13.0.1` - Low-level CUDA C API bindings
14
  - `cuda.pathfinder@1.1.1a0` - NVIDIA library discovery utilities
15

16
## Core Imports
17

18
High-level Pythonic CUDA APIs (recommended for most users):
19

20
```python
21
# High-level device and memory management
22
from cuda.core.experimental import Device, Stream, Event
23

24
# Memory resources and buffers
25
from cuda.core.experimental import Buffer, DeviceMemoryResource
26

27
# Program compilation and kernel execution
28
from cuda.core.experimental import Program, Kernel, launch
29

30
# CUDA graphs for optimization
31
from cuda.core.experimental import Graph, GraphBuilder
32
```
33

34
Low-level CUDA C API bindings:
35

36
```python
37
# CUDA Runtime API
38
from cuda.bindings import runtime
39

40
# CUDA Driver API
41
from cuda.bindings import driver
42

43
# Runtime compilation
44
from cuda.bindings import nvrtc
45

46
# Library loading utilities
47
from cuda.pathfinder import load_nvidia_dynamic_lib
48
```
49

50
Package version information:
51

52
```python
53
import cuda.core.experimental
54
import cuda.bindings
55
import cuda.pathfinder
56

57
print(cuda.core.experimental.__version__)  # "0.3.3a0"
58
print(cuda.bindings.__version__)  # "13.0.1"
59
print(cuda.pathfinder.__version__)  # "1.1.1a0"
60
```
61

62
## Basic Usage
63

64
Pythonic high-level approach (recommended):
65

66
```python
67
from cuda.core.experimental import Device, Stream, Buffer
68
import numpy as np
69

70
# Device management
71
device = Device(0)  # Use first CUDA device
72
print(f"Using device: {device.name}")
73

74
# Memory management with high-level Buffer
75
host_data = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32)
76
buffer = Buffer.from_array(host_data, device=device)
77

78
# Stream management
79
stream = Stream(device)
80

81
# Synchronization
82
stream.synchronize()
83
```
84

85
Low-level approach for advanced users:
86

87
```python
88
from cuda.bindings import runtime
89
from cuda.pathfinder import load_nvidia_dynamic_lib
90

91
# Basic device management
92
device_count = runtime.cudaGetDeviceCount()
93
print(f"Available CUDA devices: {device_count}")
94

95
# Memory allocation and management
96
device_ptr = runtime.cudaMalloc(1024)  # Allocate 1KB on device
97
host_ptr = runtime.cudaMallocHost(1024)  # Allocate page-locked host memory
98

99
# Copy data between host and device
100
runtime.cudaMemcpy(
101
    device_ptr, host_ptr, 1024, 
102
    runtime.cudaMemcpyKind.cudaMemcpyHostToDevice
103
)
104

105
# Synchronize and cleanup
106
runtime.cudaDeviceSynchronize()
107
runtime.cudaFree(device_ptr)
108
runtime.cudaFreeHost(host_ptr)
109

110
# Load NVIDIA libraries dynamically
111
cudart_lib = load_nvidia_dynamic_lib("cudart")
112
print(f"CUDA Runtime loaded from: {cudart_lib.abs_path}")
113
```
114

115
## Architecture
116

117
CUDA Python is structured as a metapackage that provides unified access to multiple specialized components:
118

119
### Core Components
120

121
- **cuda.core** (v0.3.3a0): Experimental high-level Pythonic APIs for idiomatic CUDA development
122
- **cuda.bindings** (v13.0.1): Low-level Python bindings to CUDA C APIs providing complete coverage of CUDA functionality
123
- **cuda.pathfinder** (v1.1.1a0): Utility library for discovering and loading NVIDIA CUDA libraries dynamically
124

125
### API Hierarchy
126

127
The package exposes APIs at multiple abstraction levels:
128

129
- **High-level Pythonic APIs** (`cuda.core.experimental`): Object-oriented CUDA interface with Device, Stream, Buffer, Program classes
130
- **Runtime API** (`cuda.bindings.runtime`): Direct bindings to CUDA Runtime C API
131
- **Driver API** (`cuda.bindings.driver`): Direct bindings to CUDA Driver C API  
132
- **Compilation APIs**: Runtime compilation (NVRTC) and LLVM-based compilation (NVVM)
133
- **Utility APIs**: JIT linking, GPU Direct Storage, and library management
134

135
This layered approach allows developers to choose the appropriate abstraction level for their needs while maintaining interoperability between components.
136

137
## Capabilities
138

139
### High-Level Pythonic CUDA (cuda.core.experimental)
140

141
Object-oriented CUDA programming with automatic resource management and Pythonic interfaces for device management, memory allocation, stream handling, and kernel execution.
142

143
```python { .api }
144
# Device management
145
class Device:
146
    def __init__(self, device_id: int = 0): ...
147
    @property
148
    def name(self) -> str: ...
149
    @property
150
    def compute_capability(self) -> tuple[int, int]: ...
151
    def set_current(self) -> None: ...
152

153
# Memory management
154
class Buffer:
155
    @classmethod
156
    def from_array(cls, array, device: Device) -> Buffer: ...
157
    def to_array(self) -> np.ndarray: ...
158
    @property
159
    def device(self) -> Device: ...
160
    @property
161
    def size(self) -> int: ...
162

163
# Stream and event management
164
class Stream:
165
    def __init__(self, device: Device): ...
166
    def synchronize(self) -> None: ...
167
    def record(self, event: Event) -> None: ...
168

169
class Event:
170
    def __init__(self, device: Device): ...
171
    def synchronize(self) -> None: ...
172
    def elapsed_time(self, end_event: Event) -> float: ...
173

174
# Program compilation and kernel execution
175
class Program:
176
    def __init__(self, code: str, options: ProgramOptions): ...
177
    def compile(self) -> None: ...
178
    def get_kernel(self, name: str) -> Kernel: ...
179

180
class Kernel:
181
    def launch(self, config: LaunchConfig, *args) -> None: ...
182

183
def launch(kernel: Kernel, config: LaunchConfig, *args) -> None: ...
184
```
185

186
[High-Level CUDA Core APIs](./cuda-core.md)
187

188
### Device and Memory Management (Low-Level)
189

190
Essential CUDA device enumeration, selection, and memory allocation operations including unified memory, streams, and events for efficient GPU resource management.
191

192
```python { .api }
193
# Device management
194
def cudaGetDeviceCount() -> int: ...
195
def cudaSetDevice(device: int) -> None: ...
196
def cudaGetDevice() -> int: ...
197

198
# Memory allocation
199
def cudaMalloc(size: int) -> int: ...
200
def cudaMallocHost(size: int) -> int: ...
201
def cudaMemcpy(dst, src, count: int, kind: cudaMemcpyKind) -> None: ...
202
def cudaFree(devPtr: int) -> None: ...
203
```
204

205
[Device and Memory Management](./device-memory.md)
206

207
### Kernel Execution and Streams
208

209
CUDA kernel launching, execution control and asynchronous stream management for optimal GPU utilization and performance.
210

211
```python { .api }
212
# Stream management
213
def cudaStreamCreate() -> int: ...
214
def cudaStreamSynchronize(stream: int) -> None: ...
215
def cudaLaunchKernel(func, gridDim, blockDim, args, sharedMem: int, stream: int) -> None: ...
216

217
# Event management
218
def cudaEventCreate() -> int: ...
219
def cudaEventRecord(event: int, stream: int) -> None: ...
220
def cudaEventSynchronize(event: int) -> None: ...
221
```
222

223
[Kernel Execution and Streams](./kernels-streams.md)
224

225
### Low-Level Driver API
226

227
Direct CUDA Driver API access for advanced GPU programming including context management, module loading, and fine-grained resource control.
228

229
```python { .api }
230
# Driver initialization and devices
231
def cuInit(flags: int) -> None: ...
232
def cuDeviceGet(ordinal: int) -> int: ...
233
def cuCtxCreate(flags: int, device: int) -> int: ...
234

235
# Module and function management
236
def cuModuleLoad(fname: str) -> int: ...
237
def cuModuleGetFunction(hmod: int, name: str) -> int: ...
238
def cuLaunchKernel(f, gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes: int, hStream: int, kernelParams, extra) -> None: ...
239
```
240

241
[Low-Level Driver API](./driver-api.md)
242

243
### Runtime Compilation
244

245
NVRTC runtime compilation of CUDA C++ source code to PTX and CUBIN formats for dynamic kernel generation and deployment.
246

247
```python { .api }
248
# Program creation and compilation
249
def nvrtcCreateProgram(src: str, name: str, numHeaders: int, headers: List[bytes], includeNames: List[bytes]) -> int: ...
250
def nvrtcCompileProgram(prog: int, numOptions: int, options: List[bytes]) -> None: ...
251
def nvrtcGetPTX(prog: int, ptx: str) -> None: ...
252
def nvrtcGetCUBIN(prog: int, cubin: str) -> None: ...
253
```
254

255
[Runtime Compilation](./runtime-compilation.md)
256

257
### JIT Compilation and Linking
258

259
NVVM LLVM-based compilation and NVJitLink just-in-time linking for advanced code generation workflows.
260

261
```python { .api }
262
# NVVM compilation
263
def create_program() -> int: ...
264
def compile_program(prog: int, num_options: int, options) -> None: ...
265

266
# NVJitLink linking
267
def create(num_options: int, options) -> int: ...
268
def add_data(handle: int, input_type: int, data: bytes, size: int, name: str) -> None: ...
269
def complete(handle: int) -> None: ...
270
```
271

272
[JIT Compilation and Linking](./jit-compilation.md)
273

274
### GPU Direct Storage
275

276
cuFile GPU Direct Storage API for high-performance direct GPU I/O operations bypassing CPU and system memory.
277

278
```python { .api }
279
# File handle management
280
def handle_register(descr: int) -> int: ...
281
def handle_deregister(fh: int) -> None: ...
282

283
# I/O operations
284
def read(fh: int, buf_ptr_base: int, size: int, file_offset: int, buf_ptr_offset: int) -> None: ...
285
def write(fh: int, buf_ptr_base: int, size: int, file_offset: int, buf_ptr_offset: int) -> None: ...
286
```
287

288
[GPU Direct Storage](./gpu-direct-storage.md)
289

290
### Library Management
291

292
Dynamic NVIDIA library loading and discovery utilities for runtime library management and version compatibility.
293

294
```python { .api }
295
def load_nvidia_dynamic_lib(libname: str) -> LoadedDL: ...
296

297
class LoadedDL:
298
    abs_path: Optional[str]
299
    was_already_loaded_from_elsewhere: bool
300
    _handle_uint: int
301
```
302

303
[Library Management](./library-management.md)
304

305
## Types
306

307
### Core Enumerations
308

309
```python { .api }
310
class cudaError_t:
311
    """CUDA Runtime API error codes"""
312
    cudaSuccess: int
313
    cudaErrorInvalidValue: int
314
    cudaErrorMemoryAllocation: int
315
    # ... additional error codes
316

317
class cudaMemcpyKind:
318
    """Memory copy direction types"""
319
    cudaMemcpyHostToHost: int
320
    cudaMemcpyHostToDevice: int
321
    cudaMemcpyDeviceToHost: int
322
    cudaMemcpyDeviceToDevice: int
323

324
class CUresult:
325
    """CUDA Driver API result codes"""
326
    CUDA_SUCCESS: int
327
    CUDA_ERROR_INVALID_VALUE: int
328
    CUDA_ERROR_OUT_OF_MEMORY: int
329
    # ... additional result codes
330
```
331

332
### Device Attributes
333

334
```python { .api }
335
class cudaDeviceAttr:
336
    """CUDA device attribute enumeration"""
337
    cudaDevAttrMaxThreadsPerBlock: int
338
    cudaDevAttrMaxBlockDimX: int
339
    cudaDevAttrMaxGridDimX: int
340
    cudaDevAttrMaxSharedMemoryPerBlock: int
341
    # ... additional device attributes
342

343
class CUdevice_attribute:
344
    """CUDA Driver API device attributes"""
345
    CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK: int
346
    CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X: int
347
    # ... additional attributes
348
```
349

350
### Exception Classes
351

352
```python { .api }
353
class nvvmError(Exception):
354
    """NVVM compilation exception"""
355
    pass
356

357
class nvJitLinkError(Exception):
358
    """NVJitLink exception"""
359
    pass
360

361
class cuFileError(Exception):
362
    """cuFile operation exception"""
363
    pass
364

365
class DynamicLibNotFoundError(Exception):
366
    """NVIDIA library not found exception"""
367
    pass
368
```