Python wrapper for Nvidia CUDA parallel computation API with object cleanup, automatic error checking, and convenient abstractions.
Build a Python tool that loads a pre-compiled CUDA kernel and analyzes its resource usage characteristics to help optimize GPU performance.
@generates
class KernelAnalyzer:
"""
Analyzes CUDA kernel functions from compiled modules.
"""
def __init__(self, module_path: str):
"""
Initialize the analyzer with a compiled CUDA module.
Args:
module_path: Path to the compiled CUDA module file (.cubin or .ptx)
Raises:
FileNotFoundError: If the module file does not exist
RuntimeError: If the module cannot be loaded
"""
pass
def get_function(self, function_name: str):
"""
Retrieve a kernel function from the loaded module.
Args:
function_name: Name of the kernel function to retrieve
Returns:
The kernel function object
Raises:
ValueError: If the function name does not exist in the module
"""
pass
def get_register_count(self, function) -> int:
"""
Get the number of registers used by the kernel function.
Args:
function: The kernel function object
Returns:
Number of registers used per thread
"""
pass
def get_shared_memory_bytes(self, function) -> int:
"""
Get the amount of shared memory used by the kernel function.
Args:
function: The kernel function object
Returns:
Size of shared memory in bytes
"""
pass
def get_max_threads_per_block(self, function) -> int:
"""
Get the maximum number of threads per block for the kernel function.
Args:
function: The kernel function object
Returns:
Maximum threads per block
"""
passProvides GPU computing support for loading compiled CUDA modules and retrieving kernel function attributes.
@satisfied-by
tessl i tessl/pypi-pycuda@2025.1.0docs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10