CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-pycuda

Python wrapper for Nvidia CUDA parallel computation API with object cleanup, automatic error checking, and convenient abstractions.

62%

Overall

Evaluation62%

0.94x

Agent success when using this tile

Overview
Eval results
Files

task.mdevals/scenario-3/

GPU Vector Operations with Automatic Memory Marshaling

Build a Python module that performs element-wise mathematical operations on vectors using GPU acceleration with automatic memory management between host and device.

Requirements

The module should provide a function that takes host (CPU) arrays and performs GPU-accelerated operations with automatic data transfer handling. The implementation should:

  1. Accept input arrays from the host (CPU)
  2. Automatically transfer input data to GPU device memory
  3. Execute a custom CUDA kernel that performs element-wise multiplication and addition: result[i] = a[i] * b[i] + c[i]
  4. Automatically transfer the result back to host memory
  5. Handle memory transfers efficiently without manual allocation

The function should have the following signature:

  • Input: three NumPy arrays of equal length (float32)
  • Output: a NumPy array containing the result
  • The kernel should process arrays of any size (within GPU memory limits)

Test Cases

  • Given three arrays [1.0, 2.0, 3.0], [4.0, 5.0, 6.0], and [7.0, 8.0, 9.0], the result is [11.0, 18.0, 27.0] @test
  • Given three arrays [2.5], [4.0], and [1.0], the result is [11.0] @test
  • Given three arrays of 1000 elements with all values as 1.0, 2.0, and 3.0 respectively, all result elements are 5.0 @test

Implementation

@generates

API

import numpy as np

def fused_multiply_add(a: np.ndarray, b: np.ndarray, c: np.ndarray) -> np.ndarray:
    """
    Performs element-wise fused multiply-add operation: result = a * b + c

    Args:
        a: First input array (float32)
        b: Second input array (float32)
        c: Third input array (float32)

    Returns:
        NumPy array containing the result of a * b + c

    Note:
        All input arrays must have the same length.
        The computation is performed on the GPU with automatic memory transfers.
    """
    pass

Dependencies { .dependencies }

pycuda { .dependency }

Provides GPU computing capabilities with automatic memory management features.

@satisfied-by

tessl i tessl/pypi-pycuda@2025.1.0

tile.json