A high-throughput and memory-efficient inference and serving engine for LLMs
Overall
score
69%
Evaluation — 69%
↑ 1.33xAgent success when using this tile
{
"context": "This criteria evaluates how effectively the engineer uses vLLM's LoRA (Low-Rank Adaptation) capabilities to implement a multi-adapter text generation service. The focus is on correct usage of vLLM's adapter configuration, initialization parameters, and request-time adapter specification.",
"type": "weighted_checklist",
"checklist": [
{
"name": "LLM Initialization",
"description": "Uses vLLM's LLM class to initialize the model with enable_lora parameter set appropriately to enable LoRA support",
"max_score": 15
},
{
"name": "Max LoRAs Configuration",
"description": "Configures max_loras parameter during LLM initialization to control the maximum number of concurrent LoRA adapters",
"max_score": 15
},
{
"name": "Max LoRA Rank",
"description": "Configures max_lora_rank parameter during LLM initialization to set the maximum rank for LoRA adapters",
"max_score": 15
},
{
"name": "LoRA Request Object",
"description": "Uses vLLM's LoRARequest class to create adapter request objects with lora_name and lora_path parameters",
"max_score": 25
},
{
"name": "Adapter in Generate",
"description": "Passes the LoRARequest object to the generate() method using the lora_request parameter to apply the adapter during generation",
"max_score": 20
},
{
"name": "Base Model Generation",
"description": "Correctly handles base model generation by calling generate() without a lora_request parameter when no adapter is specified",
"max_score": 10
}
]
}Install with Tessl CLI
npx tessl i tessl/pypi-vllmdocs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10