CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

task.mdevals/scenario-1/

Task: Build a Custom Request Manager with Step Control and Async Streaming

Overview

Build a request management system that can handle text generation requests using two different execution modes: a step-controlled mode and an async streaming mode.

Requirements

Part 1: Step-Controlled Request Manager

Implement a class StepControlledManager that:

  1. Initializes with a pre-trained language model suitable for text generation
  2. Provides a method to add text generation requests with unique request IDs
  3. Implements a method to process one step of computation at a time
  4. Returns generated outputs when requests complete
  5. Checks if there are any unfinished requests remaining
  6. Supports aborting specific requests by ID

The manager should allow fine-grained control over execution, processing work in discrete steps.

Part 2: Async Streaming Request Handler

Implement an async generator function async_stream_responses that:

  1. Initializes an async-capable language model instance
  2. Accepts a list of text prompts
  3. Streams generation results asynchronously as they become available
  4. Yields partial outputs during generation (streaming behavior)
  5. Supports concurrent processing of multiple prompts

The function should leverage async/await patterns and yield results incrementally.

Test Cases { .test-cases }

Test 1: Step-controlled execution { .test-case @test }

Input:

# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Hello, my name is", max_tokens=10)
manager.add_request("req2", "The capital of France is", max_tokens=8)

while manager.has_unfinished():
    manager.process_step()

outputs = manager.get_completed_outputs()

Expected Behavior:

  • Both requests should complete successfully
  • outputs should contain results for both "req1" and "req2"
  • Each output should have generated text with the specified token limits

Test 2: Request abortion { .test-case @test }

Input:

# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Once upon a time", max_tokens=20)
manager.add_request("req2", "The quick brown fox", max_tokens=20)

manager.process_step()
manager.abort_request("req1")

while manager.has_unfinished():
    manager.process_step()

outputs = manager.get_completed_outputs()

Expected Behavior:

  • Request "req1" should be aborted and not appear in final outputs
  • Request "req2" should complete normally
  • outputs should contain only "req2" result

Test 3: Async streaming { .test-case @test }

Input:

# test_async_handler.py
import asyncio

prompts = [
    "Artificial intelligence is",
    "The future of computing",
    "Machine learning enables"
]

async def test_streaming():
    results = []
    async for partial_result in async_stream_responses(
        model_name="facebook/opt-125m",
        prompts=prompts,
        max_tokens=15
    ):
        results.append(partial_result)

    return results

outputs = asyncio.run(test_streaming())

Expected Behavior:

  • Function should yield intermediate results during generation (streaming)
  • Final outputs should contain completions for all 3 prompts
  • Each completion should respect the max_tokens limit
  • Execution should be non-blocking using async/await

Constraints

  • Use appropriate model initialization parameters (e.g., trust_remote_code if needed)
  • Handle GPU memory efficiently
  • Implement proper error handling for invalid request IDs
  • The step-controlled manager should maintain request state across multiple steps

Dependencies { .dependencies }

vllm { .dependency }

Provides high-performance LLM inference capabilities.

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/vllm@0.10.x
tile.json