tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
{
"context": "This evaluation assesses how well the engineer uses vLLM's synchronous LLMEngine and asynchronous AsyncLLMEngine for fine-grained request management and streaming inference. The focus is on proper usage of engine-level APIs rather than high-level LLM class methods.",
"type": "weighted_checklist",
"checklist": [
{
"name": "LLMEngine Initialization",
"description": "Uses LLMEngine class (or EngineArgs with LLMEngine.from_engine_args()) to initialize the synchronous engine with appropriate model configuration",
"max_score": 15
},
{
"name": "add_request() Usage",
"description": "Correctly uses LLMEngine.add_request() method to add text generation requests with request IDs and sampling parameters",
"max_score": 15
},
{
"name": "step() Execution",
"description": "Implements step-by-step execution using LLMEngine.step() method in a loop to process requests incrementally",
"max_score": 15
},
{
"name": "has_unfinished_requests() Check",
"description": "Uses LLMEngine.has_unfinished_requests() method to determine when all requests are complete",
"max_score": 10
},
{
"name": "abort_request() Implementation",
"description": "Correctly calls LLMEngine.abort_request() to cancel a specific request by its ID",
"max_score": 10
},
{
"name": "AsyncLLMEngine Initialization",
"description": "Uses AsyncLLMEngine class (or AsyncEngineArgs with AsyncLLMEngine.from_engine_args()) to initialize the asynchronous engine",
"max_score": 10
},
{
"name": "Async Generator Pattern",
"description": "Implements async generator function using 'async def' and 'yield' to stream results incrementally from AsyncLLMEngine.generate()",
"max_score": 15
},
{
"name": "AsyncLLMEngine.generate() Streaming",
"description": "Uses AsyncLLMEngine.generate() method with async iteration (async for) to process streaming outputs from the async engine",
"max_score": 10
}
]
}