CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Overall
score

69%

Evaluation69%

1.33x

Agent success when using this tile

Overview
Eval results
Files

task.mdevals/scenario-1/

Task: Build a Custom Request Manager with Step Control and Async Streaming

Overview

Build a request management system that can handle text generation requests using two different execution modes: a step-controlled mode and an async streaming mode.

Requirements

Part 1: Step-Controlled Request Manager

Implement a class StepControlledManager that:

  1. Initializes with a pre-trained language model suitable for text generation
  2. Provides a method to add text generation requests with unique request IDs
  3. Implements a method to process one step of computation at a time
  4. Returns generated outputs when requests complete
  5. Checks if there are any unfinished requests remaining
  6. Supports aborting specific requests by ID

The manager should allow fine-grained control over execution, processing work in discrete steps.

Part 2: Async Streaming Request Handler

Implement an async generator function async_stream_responses that:

  1. Initializes an async-capable language model instance
  2. Accepts a list of text prompts
  3. Streams generation results asynchronously as they become available
  4. Yields partial outputs during generation (streaming behavior)
  5. Supports concurrent processing of multiple prompts

The function should leverage async/await patterns and yield results incrementally.

Test Cases { .test-cases }

Test 1: Step-controlled execution { .test-case @test }

Input:

# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Hello, my name is", max_tokens=10)
manager.add_request("req2", "The capital of France is", max_tokens=8)

while manager.has_unfinished():
    manager.process_step()

outputs = manager.get_completed_outputs()

Expected Behavior:

  • Both requests should complete successfully
  • outputs should contain results for both "req1" and "req2"
  • Each output should have generated text with the specified token limits

Test 2: Request abortion { .test-case @test }

Input:

# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Once upon a time", max_tokens=20)
manager.add_request("req2", "The quick brown fox", max_tokens=20)

manager.process_step()
manager.abort_request("req1")

while manager.has_unfinished():
    manager.process_step()

outputs = manager.get_completed_outputs()

Expected Behavior:

  • Request "req1" should be aborted and not appear in final outputs
  • Request "req2" should complete normally
  • outputs should contain only "req2" result

Test 3: Async streaming { .test-case @test }

Input:

# test_async_handler.py
import asyncio

prompts = [
    "Artificial intelligence is",
    "The future of computing",
    "Machine learning enables"
]

async def test_streaming():
    results = []
    async for partial_result in async_stream_responses(
        model_name="facebook/opt-125m",
        prompts=prompts,
        max_tokens=15
    ):
        results.append(partial_result)

    return results

outputs = asyncio.run(test_streaming())

Expected Behavior:

  • Function should yield intermediate results during generation (streaming)
  • Final outputs should contain completions for all 3 prompts
  • Each completion should respect the max_tokens limit
  • Execution should be non-blocking using async/await

Constraints

  • Use appropriate model initialization parameters (e.g., trust_remote_code if needed)
  • Handle GPU memory efficiently
  • Implement proper error handling for invalid request IDs
  • The step-controlled manager should maintain request state across multiple steps

Dependencies { .dependencies }

vllm { .dependency }

Provides high-performance LLM inference capabilities.

Install with Tessl CLI

npx tessl i tessl/pypi-vllm

tile.json