Python wrapper for Nvidia CUDA parallel computation API with object cleanup, automatic error checking, and convenient abstractions.
A Python program that compiles and links multiple CUDA kernel modules together at runtime.
The program should be able to compile multiple separate CUDA source code strings into intermediate PTX (Parallel Thread Execution) format.
The program should link the compiled PTX modules into a single executable module that can be used to call functions from any of the linked modules.
After linking, the program should be able to retrieve and execute kernel functions that were defined in different source modules.
@generates
def compile_to_ptx(source: str) -> bytes:
"""
Compile CUDA source code to PTX intermediate representation.
Parameters:
- source: str, CUDA C/C++ source code
Returns:
bytes: PTX intermediate code
"""
def link_modules(ptx_modules: list) -> object:
"""
Link multiple PTX modules into a single executable module.
Parameters:
- ptx_modules: list of bytes, PTX intermediate code from multiple sources
Returns:
Linked module object that can be used to access functions
"""
def get_function_from_module(module: object, function_name: str) -> object:
"""
Retrieve a kernel function from a linked module.
Parameters:
- module: Linked module object
- function_name: str, name of the kernel function to retrieve
Returns:
Function object that can be called to execute the kernel
"""Provides GPU computing capabilities including module compilation, linking, and kernel execution.
@satisfied-by
tessl i tessl/pypi-pycuda@2025.1.0docs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10