tessl/pypi-pycuda

Python wrapper for Nvidia CUDA parallel computation API with object cleanup, automatic error checking, and convenient abstractions.

62%

Overall

Evaluation — 62%

↓ 0.94x

Agent success when using this tile

Overview

Eval results

Files

GPU Data Transfer Optimizer

Name: tessl/pypi-pycuda
Author: tessl

Build a utility that optimizes host-to-device and device-to-host data transfers for large numerical arrays by utilizing efficient memory allocation strategies.

Requirements

You need to implement a data transfer utility that:

Accepts NumPy arrays of varying sizes (from small to very large)
Performs optimized host-to-device transfers
Performs optimized device-to-host transfers
Provides a performance comparison capability

The utility should be designed to maximize transfer performance for scenarios involving repeated or large-scale data movements between host and device memory.

Implementation Details

Create a Python module transfer_optimizer.py that provides:

A function create_optimized_buffer(shape, dtype) that creates a host buffer optimized for GPU transfers with the specified shape and data type.
A function transfer_to_device(host_buffer) that transfers data from the optimized host buffer to the GPU device memory and returns a device memory object.
A function transfer_from_device(device_memory, host_buffer) that transfers data from GPU device memory back to the host buffer.
A function measure_transfer_time(data_array, num_iterations=100) that:
- Accepts a NumPy array
- Measures the average time for round-trip transfers (host-to-device-to-host)
- Returns the average transfer time in milliseconds

Test Cases

Given a NumPy array of shape (1000, 1000) with float32 dtype, create_optimized_buffer creates a suitable host buffer @test
Given an optimized host buffer with data, transfer_to_device successfully transfers the data to GPU memory @test
Given device memory and a host buffer, transfer_from_device successfully transfers data back to the host @test
Given a NumPy array, measure_transfer_time returns a valid positive transfer time in milliseconds @test

Dependencies { .dependencies }

pycuda { .dependency }

Provides GPU computing capabilities including memory management and data transfer operations.

tessl i tessl/pypi-pycuda@2025.1.0