tessl install tessl/pypi-cupy-cuda101@9.6.0CuPy: NumPy & SciPy for GPU (CUDA 10.1 version)
Agent Success
Agent success rate when using this tile
87%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.19x
Baseline
Agent success rate without this tile
73%
Build a matrix computation pipeline that performs a sequence of operations repeatedly on GPU arrays. The pipeline should optimize performance by reducing kernel launch overhead for the repeated execution pattern.
When the same sequence of GPU operations is executed repeatedly with minimal variations, there are opportunities to reduce overhead. Your task is to implement a computation pipeline that can efficiently handle repeated execution of a fixed sequence of matrix operations.
Implement a function process_matrices(a, b, c, num_iterations) that:
a, b, and c (all 1000x1000 float32 matrices)num_iterations times:
a by matrix b to get intermediate result temp1c to temp1 to get temp2temp2 to get temp3a with the result: a = temp3a after all iterationsThe implementation should minimize kernel launch overhead when executing the same operation sequence repeatedly. The operations form a fixed computation pattern that does not change between iterations.
The function must produce mathematically correct results - the final matrix should match what would be obtained by executing the operations in sequence.
Provides GPU-accelerated array operations and CUDA programming capabilities.
File: test_matrix_pipeline.py { .test }
import cupy as cp
import numpy as np
def test_basic_pipeline():
"""Test that the pipeline produces correct results"""
cp.random.seed(42)
a = cp.random.rand(1000, 1000).astype(cp.float32)
b = cp.random.rand(1000, 1000).astype(cp.float32)
c = cp.random.rand(1000, 1000).astype(cp.float32)
# Save initial values
a_init = a.copy()
b_init = b.copy()
c_init = c.copy()
result = process_matrices(a, b, c, num_iterations=5)
# Verify result is not None
assert result is not None
# Verify result has correct shape
assert result.shape == (1000, 1000)
# Verify the computation by manually computing first iteration
temp1 = cp.matmul(a_init, b_init)
temp2 = temp1 + c_init
temp3 = temp2 ** 2
# Result after first iteration should have values derived from this
# Just check that result contains finite values
assert cp.all(cp.isfinite(result))File: test_matrix_pipeline.py { .test }
def test_multiple_iterations():
"""Test that multiple iterations work correctly"""
cp.random.seed(123)
a = cp.random.rand(1000, 1000).astype(cp.float32)
b = cp.random.rand(1000, 1000).astype(cp.float32)
c = cp.random.rand(1000, 1000).astype(cp.float32)
result = process_matrices(a, b, c, num_iterations=10)
# Verify result shape and finiteness
assert result.shape == (1000, 1000)
assert cp.all(cp.isfinite(result))
# Result should be different after 10 iterations
# (values should have grown significantly)
assert cp.max(result) > 1.0process_matrices() functiontest_matrix_pipeline.py with the provided test cases