tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
{
"context": "This evaluation assesses how well an engineer uses vLLM's multi-modal capabilities to implement an image description service. The focus is on proper initialization of vision-language models, correct formatting of multi-modal prompts, and appropriate use of vLLM's inference APIs for processing images with text.",
"type": "weighted_checklist",
"checklist": [
{
"name": "LLM initialization",
"description": "Correctly initializes an LLM instance with a vision-language model name (e.g., using vllm.LLM class with a model parameter set to a vision-language model like 'llava-hf/llava-1.5-7b-hf' or similar)",
"max_score": 20
},
{
"name": "Multi-modal prompt format",
"description": "Uses the correct multi-modal prompt format with both 'prompt' and 'multi_modal_data' keys in a dictionary structure (e.g., {'prompt': text, 'multi_modal_data': {'image': image_data}})",
"max_score": 25
},
{
"name": "Image loading",
"description": "Properly loads image data from file paths for use with vLLM (e.g., using PIL/Pillow to load images, or vLLM's MediaIO abstraction)",
"max_score": 15
},
{
"name": "Single image processing",
"description": "Correctly uses LLM.generate() or LLM.chat() method to process single image inputs with text prompts and returns the generated text description",
"max_score": 20
},
{
"name": "Multiple image handling",
"description": "Correctly formats and processes multiple images in a single request using vLLM's multi-modal capabilities (e.g., passing multiple images in multi_modal_data)",
"max_score": 15
},
{
"name": "Error handling",
"description": "Implements appropriate error handling for invalid image paths (e.g., checking file existence before processing, raising FileNotFoundError as specified in the API)",
"max_score": 5
}
]
}