tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
Build a text generation service that can dynamically serve multiple fine-tuned model variants using adapter modules. The service should accept generation requests that can optionally specify which adapter to use, allowing different users or applications to access specialized model behaviors without loading separate models.
The service must support:
The service should support:
Create test file test_lora_service.py with the following test cases:
@generates
class MultiAdapterService:
"""
A text generation service supporting multiple adapter modules.
"""
def __init__(
self,
model_name: str,
enable_adapters: bool = True,
max_adapters: int = 4,
max_adapter_rank: int = 64
):
"""
Initialize the multi-adapter service.
Args:
model_name: Name or path of the base model
enable_adapters: Whether to enable adapter support
max_adapters: Maximum number of concurrent adapters
max_adapter_rank: Maximum rank for adapters
"""
pass
def generate(
self,
prompt: str,
adapter_name: str | None = None,
adapter_path: str | None = None,
max_tokens: int = 100
) -> str:
"""
Generate text completion for the given prompt.
Args:
prompt: Input text prompt
adapter_name: Optional unique identifier for the adapter
adapter_path: Optional filesystem path to the adapter module
max_tokens: Maximum number of tokens to generate
Returns:
Generated text completion
"""
passProvides high-performance LLM inference with adapter support.
@satisfied-by