CtrlK

Community Documentation Log in Get started

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

Multi-Adapter Text Generation Service

Name: tessl/pypi-vllm
Author: tessl

Build a text generation service that can dynamically serve multiple fine-tuned model variants using adapter modules. The service should accept generation requests that can optionally specify which adapter to use, allowing different users or applications to access specialized model behaviors without loading separate models.

Requirements

Core Functionality

The service must support:

Adapter Configuration: Initialize the service to support multiple adapters with configurable capacity and rank limits
Dynamic Adapter Requests: Process text generation requests that can optionally specify an adapter to use
Adapter Identification: Each adapter should be identifiable by a unique name and have an associated path
Fallback Behavior: Requests without an adapter specification should use the base model

Input/Output

Input: Text prompts (strings) with optional adapter specification (name and path)
Output: Generated text completions

Configuration

The service should support:

Specifying maximum number of concurrent adapters
Specifying maximum adapter rank
Loading adapter modules from filesystem paths

Test Cases

Create test file test_lora_service.py with the following test cases:

Service initializes successfully with adapter support enabled @test
Service generates text using base model when no adapter is specified @test
Service generates text using a specific adapter when adapter is provided @test
Service handles multiple requests with different adapters in sequence @test

Implementation

@generates

API

class MultiAdapterService:
    """
    A text generation service supporting multiple adapter modules.
    """

    def __init__(
        self,
        model_name: str,
        enable_adapters: bool = True,
        max_adapters: int = 4,
        max_adapter_rank: int = 64
    ):
        """
        Initialize the multi-adapter service.

        Args:
            model_name: Name or path of the base model
            enable_adapters: Whether to enable adapter support
            max_adapters: Maximum number of concurrent adapters
            max_adapter_rank: Maximum rank for adapters
        """
        pass

    def generate(
        self,
        prompt: str,
        adapter_name: str | None = None,
        adapter_path: str | None = None,
        max_tokens: int = 100
    ) -> str:
        """
        Generate text completion for the given prompt.

        Args:
            prompt: Input text prompt
            adapter_name: Optional unique identifier for the adapter
            adapter_path: Optional filesystem path to the adapter module
            max_tokens: Maximum number of tokens to generate

        Returns:
            Generated text completion
        """
        pass

Dependencies { .dependencies }

vLLM { .dependency }

Provides high-performance LLM inference with adapter support.

@satisfied-by

tessl/pypi-vllm

task.mdevals/scenario-7/

Multi-Adapter Text Generation Service

Requirements

Core Functionality

Input/Output

Configuration

Test Cases

Implementation

API

Dependencies { .dependencies }

vLLM { .dependency }

Version

tessl/pypi-vllm

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-7/

Multi-Adapter Text Generation Service

Requirements

Core Functionality

Input/Output

Configuration

Test Cases

Implementation

API

Dependencies { .dependencies }

vLLM { .dependency }

Version

task.mdevals/scenario-7/