tessl/pypi-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Overall
score

69%

Evaluation — 69%

↑ 1.33x

Agent success when using this tile

Overview

Eval results

Files

Task: Build a Custom Request Manager with Step Control and Async Streaming

Name: tessl/pypi-vllm
Author: tessl

Overview

Build a request management system that can handle text generation requests using two different execution modes: a step-controlled mode and an async streaming mode.

Requirements

Part 1: Step-Controlled Request Manager

Implement a class StepControlledManager that:

Initializes with a pre-trained language model suitable for text generation
Provides a method to add text generation requests with unique request IDs
Implements a method to process one step of computation at a time
Returns generated outputs when requests complete
Checks if there are any unfinished requests remaining
Supports aborting specific requests by ID

The manager should allow fine-grained control over execution, processing work in discrete steps.

Part 2: Async Streaming Request Handler

Implement an async generator function async_stream_responses that:

Initializes an async-capable language model instance
Accepts a list of text prompts
Streams generation results asynchronously as they become available
Yields partial outputs during generation (streaming behavior)
Supports concurrent processing of multiple prompts

The function should leverage async/await patterns and yield results incrementally.

Test Cases { .test-cases }

Test 1: Step-controlled execution { .test-case @test }

Input:

# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Hello, my name is", max_tokens=10)
manager.add_request("req2", "The capital of France is", max_tokens=8)

while manager.has_unfinished():
    manager.process_step()

outputs = manager.get_completed_outputs()

Expected Behavior:

Both requests should complete successfully
outputs should contain results for both "req1" and "req2"
Each output should have generated text with the specified token limits

Test 2: Request abortion { .test-case @test }

Input:

# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Once upon a time", max_tokens=20)
manager.add_request("req2", "The quick brown fox", max_tokens=20)

manager.process_step()
manager.abort_request("req1")

while manager.has_unfinished():
    manager.process_step()

outputs = manager.get_completed_outputs()

Expected Behavior:

Request "req1" should be aborted and not appear in final outputs
Request "req2" should complete normally
outputs should contain only "req2" result

Test 3: Async streaming { .test-case @test }

Input:

# test_async_handler.py
import asyncio

prompts = [
    "Artificial intelligence is",
    "The future of computing",
    "Machine learning enables"
]

async def test_streaming():
    results = []
    async for partial_result in async_stream_responses(
        model_name="facebook/opt-125m",
        prompts=prompts,
        max_tokens=15
    ):
        results.append(partial_result)

    return results

outputs = asyncio.run(test_streaming())

Expected Behavior:

Function should yield intermediate results during generation (streaming)
Final outputs should contain completions for all 3 prompts
Each completion should respect the max_tokens limit
Execution should be non-blocking using async/await

Constraints

Use appropriate model initialization parameters (e.g., trust_remote_code if needed)
Handle GPU memory efficiently
Implement proper error handling for invalid request IDs
The step-controlled manager should maintain request state across multiple steps

Dependencies { .dependencies }

vllm { .dependency }

Provides high-performance LLM inference capabilities.

Install with Tessl CLI

npx tessl i tessl/pypi-vllm

tessl/pypi-vllm

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-1/

Task: Build a Custom Request Manager with Step Control and Async Streaming

Overview

Requirements

Part 1: Step-Controlled Request Manager

Part 2: Async Streaming Request Handler

Test Cases { .test-cases }

Test 1: Step-controlled execution { .test-case @test }

Test 2: Request abortion { .test-case @test }

Test 3: Async streaming { .test-case @test }

Constraints

Dependencies { .dependencies }

vllm { .dependency }

task.mdevals/scenario-1/