A high-throughput and memory-efficient inference and serving engine for LLMs
Overall
score
69%
Evaluation — 69%
↑ 1.33xAgent success when using this tile
Build a request management system that can handle text generation requests using two different execution modes: a step-controlled mode and an async streaming mode.
Implement a class StepControlledManager that:
The manager should allow fine-grained control over execution, processing work in discrete steps.
Implement an async generator function async_stream_responses that:
The function should leverage async/await patterns and yield results incrementally.
Input:
# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Hello, my name is", max_tokens=10)
manager.add_request("req2", "The capital of France is", max_tokens=8)
while manager.has_unfinished():
manager.process_step()
outputs = manager.get_completed_outputs()Expected Behavior:
outputs should contain results for both "req1" and "req2"Input:
# test_request_manager.py
manager = StepControlledManager(model_name="facebook/opt-125m")
manager.add_request("req1", "Once upon a time", max_tokens=20)
manager.add_request("req2", "The quick brown fox", max_tokens=20)
manager.process_step()
manager.abort_request("req1")
while manager.has_unfinished():
manager.process_step()
outputs = manager.get_completed_outputs()Expected Behavior:
outputs should contain only "req2" resultInput:
# test_async_handler.py
import asyncio
prompts = [
"Artificial intelligence is",
"The future of computing",
"Machine learning enables"
]
async def test_streaming():
results = []
async for partial_result in async_stream_responses(
model_name="facebook/opt-125m",
prompts=prompts,
max_tokens=15
):
results.append(partial_result)
return results
outputs = asyncio.run(test_streaming())Expected Behavior:
Provides high-performance LLM inference capabilities.
Install with Tessl CLI
npx tessl i tessl/pypi-vllmdocs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10