CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

task.mdevals/scenario-7/

Multi-Adapter Text Generation Service

Build a text generation service that can dynamically serve multiple fine-tuned model variants using adapter modules. The service should accept generation requests that can optionally specify which adapter to use, allowing different users or applications to access specialized model behaviors without loading separate models.

Requirements

Core Functionality

The service must support:

  1. Adapter Configuration: Initialize the service to support multiple adapters with configurable capacity and rank limits
  2. Dynamic Adapter Requests: Process text generation requests that can optionally specify an adapter to use
  3. Adapter Identification: Each adapter should be identifiable by a unique name and have an associated path
  4. Fallback Behavior: Requests without an adapter specification should use the base model

Input/Output

  • Input: Text prompts (strings) with optional adapter specification (name and path)
  • Output: Generated text completions

Configuration

The service should support:

  • Specifying maximum number of concurrent adapters
  • Specifying maximum adapter rank
  • Loading adapter modules from filesystem paths

Test Cases

Create test file test_lora_service.py with the following test cases:

  1. Service initializes successfully with adapter support enabled @test
  2. Service generates text using base model when no adapter is specified @test
  3. Service generates text using a specific adapter when adapter is provided @test
  4. Service handles multiple requests with different adapters in sequence @test

Implementation

@generates

API

class MultiAdapterService:
    """
    A text generation service supporting multiple adapter modules.
    """

    def __init__(
        self,
        model_name: str,
        enable_adapters: bool = True,
        max_adapters: int = 4,
        max_adapter_rank: int = 64
    ):
        """
        Initialize the multi-adapter service.

        Args:
            model_name: Name or path of the base model
            enable_adapters: Whether to enable adapter support
            max_adapters: Maximum number of concurrent adapters
            max_adapter_rank: Maximum rank for adapters
        """
        pass

    def generate(
        self,
        prompt: str,
        adapter_name: str | None = None,
        adapter_path: str | None = None,
        max_tokens: int = 100
    ) -> str:
        """
        Generate text completion for the given prompt.

        Args:
            prompt: Input text prompt
            adapter_name: Optional unique identifier for the adapter
            adapter_path: Optional filesystem path to the adapter module
            max_tokens: Maximum number of tokens to generate

        Returns:
            Generated text completion
        """
        pass

Dependencies { .dependencies }

vLLM { .dependency }

Provides high-performance LLM inference with adapter support.

@satisfied-by

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/vllm@0.10.x
tile.json