tessl/pypi-pycuda

Python wrapper for Nvidia CUDA parallel computation API with object cleanup, automatic error checking, and convenient abstractions.

62%

Overall

Evaluation — 62%

↓ 0.94x

Agent success when using this tile

Overview

Eval results

Files

GPU Kernel Performance Analyzer

Name: tessl/pypi-pycuda
Author: tessl

Build a Python tool that loads a pre-compiled CUDA kernel and analyzes its resource usage characteristics to help optimize GPU performance.

Capabilities

Load and retrieve kernel function

Given a compiled CUDA module file path "kernels.cubin", load the module and retrieve a kernel function named "matrixMultiply" @test
When the module file does not exist, loading raises an appropriate error @test
When retrieving a non-existent function name from a valid module, an appropriate error is raised @test

Query kernel resource usage

Given a loaded kernel function, retrieve and return its register usage count as an integer @test
Given a loaded kernel function, retrieve and return its shared memory usage in bytes @test
Given a loaded kernel function, retrieve and return its maximum thread block size @test

Implementation

@generates

API

class KernelAnalyzer:
    """
    Analyzes CUDA kernel functions from compiled modules.
    """

    def __init__(self, module_path: str):
        """
        Initialize the analyzer with a compiled CUDA module.

        Args:
            module_path: Path to the compiled CUDA module file (.cubin or .ptx)

        Raises:
            FileNotFoundError: If the module file does not exist
            RuntimeError: If the module cannot be loaded
        """
        pass

    def get_function(self, function_name: str):
        """
        Retrieve a kernel function from the loaded module.

        Args:
            function_name: Name of the kernel function to retrieve

        Returns:
            The kernel function object

        Raises:
            ValueError: If the function name does not exist in the module
        """
        pass

    def get_register_count(self, function) -> int:
        """
        Get the number of registers used by the kernel function.

        Args:
            function: The kernel function object

        Returns:
            Number of registers used per thread
        """
        pass

    def get_shared_memory_bytes(self, function) -> int:
        """
        Get the amount of shared memory used by the kernel function.

        Args:
            function: The kernel function object

        Returns:
            Size of shared memory in bytes
        """
        pass

    def get_max_threads_per_block(self, function) -> int:
        """
        Get the maximum number of threads per block for the kernel function.

        Args:
            function: The kernel function object

        Returns:
            Maximum threads per block
        """
        pass

Dependencies { .dependencies }

pycuda { .dependency }

Provides GPU computing support for loading compiled CUDA modules and retrieving kernel function attributes.

@satisfied-by

tessl i tessl/pypi-pycuda@2025.1.0