tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
{
"context": "This criteria evaluates how well the engineer uses vLLM's chat-based generation API to implement a multi-turn conversational system. The focus is on proper usage of the LLM class, the chat() method, message formatting, and sampling parameter configuration.",
"type": "weighted_checklist",
"checklist": [
{
"name": "LLM Initialization",
"description": "Correctly initializes the vLLM LLM class with an appropriate chat/instruction-tuned model (e.g., using model parameter).",
"max_score": 15
},
{
"name": "Chat Method Usage",
"description": "Uses the LLM.chat() method (not generate()) to process conversational input, which is the appropriate method for chat-based interactions.",
"max_score": 25
},
{
"name": "Message Format",
"description": "Correctly formats messages as a list of dictionaries with 'role' and 'content' keys, supporting system, user, and assistant roles.",
"max_score": 20
},
{
"name": "SamplingParams Configuration",
"description": "Creates and uses a SamplingParams object to configure generation parameters (max_tokens, temperature), passing it to the chat() method.",
"max_score": 20
},
{
"name": "Response Extraction",
"description": "Correctly extracts the generated text from the RequestOutput object returned by chat(), accessing the outputs attribute and text content.",
"max_score": 15
},
{
"name": "Multi-Turn Handling",
"description": "Properly handles multi-turn conversations by passing the entire message history to chat(), allowing the model to maintain context.",
"max_score": 5
}
]
}