tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
{
"context": "This criteria evaluates how effectively the engineer uses vLLM's LoRA (Low-Rank Adaptation) capabilities to implement a multi-adapter text generation service. The focus is on correct usage of vLLM's adapter configuration, initialization parameters, and request-time adapter specification.",
"type": "weighted_checklist",
"checklist": [
{
"name": "LLM Initialization",
"description": "Uses vLLM's LLM class to initialize the model with enable_lora parameter set appropriately to enable LoRA support",
"max_score": 15
},
{
"name": "Max LoRAs Configuration",
"description": "Configures max_loras parameter during LLM initialization to control the maximum number of concurrent LoRA adapters",
"max_score": 15
},
{
"name": "Max LoRA Rank",
"description": "Configures max_lora_rank parameter during LLM initialization to set the maximum rank for LoRA adapters",
"max_score": 15
},
{
"name": "LoRA Request Object",
"description": "Uses vLLM's LoRARequest class to create adapter request objects with lora_name and lora_path parameters",
"max_score": 25
},
{
"name": "Adapter in Generate",
"description": "Passes the LoRARequest object to the generate() method using the lora_request parameter to apply the adapter during generation",
"max_score": 20
},
{
"name": "Base Model Generation",
"description": "Correctly handles base model generation by calling generate() without a lora_request parameter when no adapter is specified",
"max_score": 10
}
]
}