tessl/pypi-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Overall
score

69%

Evaluation — 69%

↑ 1.33x

Agent success when using this tile

Overview

Eval results

Files

Multi-Adapter Text Generation Service

Name: tessl/pypi-vllm
Author: tessl

Build a text generation service that can dynamically serve multiple fine-tuned model variants using adapter modules. The service should accept generation requests that can optionally specify which adapter to use, allowing different users or applications to access specialized model behaviors without loading separate models.

Requirements

Core Functionality

The service must support:

Adapter Configuration: Initialize the service to support multiple adapters with configurable capacity and rank limits
Dynamic Adapter Requests: Process text generation requests that can optionally specify an adapter to use
Adapter Identification: Each adapter should be identifiable by a unique name and have an associated path
Fallback Behavior: Requests without an adapter specification should use the base model

Input/Output

Input: Text prompts (strings) with optional adapter specification (name and path)
Output: Generated text completions

Configuration

The service should support:

Specifying maximum number of concurrent adapters
Specifying maximum adapter rank
Loading adapter modules from filesystem paths

Test Cases

Create test file test_lora_service.py with the following test cases:

Service initializes successfully with adapter support enabled @test
Service generates text using base model when no adapter is specified @test
Service generates text using a specific adapter when adapter is provided @test
Service handles multiple requests with different adapters in sequence @test

Implementation

@generates

API

class MultiAdapterService:
    """
    A text generation service supporting multiple adapter modules.
    """

    def __init__(
        self,
        model_name: str,
        enable_adapters: bool = True,
        max_adapters: int = 4,
        max_adapter_rank: int = 64
    ):
        """
        Initialize the multi-adapter service.

        Args:
            model_name: Name or path of the base model
            enable_adapters: Whether to enable adapter support
            max_adapters: Maximum number of concurrent adapters
            max_adapter_rank: Maximum rank for adapters
        """
        pass

    def generate(
        self,
        prompt: str,
        adapter_name: str | None = None,
        adapter_path: str | None = None,
        max_tokens: int = 100
    ) -> str:
        """
        Generate text completion for the given prompt.

        Args:
            prompt: Input text prompt
            adapter_name: Optional unique identifier for the adapter
            adapter_path: Optional filesystem path to the adapter module
            max_tokens: Maximum number of tokens to generate

        Returns:
            Generated text completion
        """
        pass

Dependencies { .dependencies }

vLLM { .dependency }

Provides high-performance LLM inference with adapter support.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-vllm