CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

task.mdevals/scenario-4/

Model Configuration Service

Build a service that initializes and configures language models for different deployment scenarios.

Requirements

Your service should support three deployment modes:

  1. Local Development Mode: Load a model from a local file path with minimal GPU memory usage
  2. Production Mode: Load a model from HuggingFace Hub with optimized settings for high throughput
  3. Testing Mode: Load a lightweight model suitable for unit testing with specific load format

Functional Requirements

  • Create a function initialize_model(mode, model_path) that returns an initialized model instance based on the deployment mode
  • For local development mode:
    • Use 50% GPU memory utilization
    • Load from the provided local path
  • For production mode:
    • Use 90% GPU memory utilization
    • Load from HuggingFace model identifier
    • Enable automatic model download
  • For testing mode:
    • Load using safetensors format
    • Use the provided model path or identifier

Test Cases

  • Calling initialize_model("local", "/models/llama-7b") returns a model instance configured with 50% GPU memory utilization @test
  • Calling initialize_model("production", "meta-llama/Llama-2-7b-hf") returns a model instance configured with 90% GPU memory utilization @test
  • Calling initialize_model("testing", "facebook/opt-125m") returns a model instance that uses safetensors load format @test
  • Calling initialize_model("invalid", "some-model") raises a ValueError with an appropriate error message @test

Implementation

@generates

API

def initialize_model(mode: str, model_path: str):
    """
    Initialize a language model based on the specified deployment mode.

    Args:
        mode: Deployment mode - one of "local", "production", or "testing"
        model_path: Path or identifier for the model to load

    Returns:
        An initialized model instance

    Raises:
        ValueError: If mode is not one of the supported values
    """
    pass

Dependencies { .dependencies }

vLLM { .dependency }

Provides high-throughput inference engine for large language models with flexible model loading capabilities.

@satisfied-by

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/vllm@0.10.x
tile.json