tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
Build a request management system that can handle text generation requests using two different execution modes: a step-controlled mode and an async streaming mode.
Implement a class StepControlledManager that:
The manager should allow fine-grained control over execution, processing work in discrete steps.
Implement an async generator function async_stream_responses that:
The function should leverage async/await patterns and yield results incrementally.
Input:
# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Hello, my name is", max_tokens=10)
manager.add_request("req2", "The capital of France is", max_tokens=8)
while manager.has_unfinished():
manager.process_step()
outputs = manager.get_completed_outputs()Expected Behavior:
outputs should contain results for both "req1" and "req2"Input:
# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Once upon a time", max_tokens=20)
manager.add_request("req2", "The quick brown fox", max_tokens=20)
manager.process_step()
manager.abort_request("req1")
while manager.has_unfinished():
manager.process_step()
outputs = manager.get_completed_outputs()Expected Behavior:
outputs should contain only "req2" resultInput:
# test_async_handler.py
import asyncio
prompts = [
"Artificial intelligence is",
"The future of computing",
"Machine learning enables"
]
async def test_streaming():
results = []
async for partial_result in async_stream_responses(
model_name="facebook/opt-125m",
prompts=prompts,
max_tokens=15
):
results.append(partial_result)
return results
outputs = asyncio.run(test_streaming())Expected Behavior:
Provides high-performance LLM inference capabilities.