CtrlK

Community Documentation Log in Get started

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

Model Configuration Service

Name: tessl/pypi-vllm
Author: tessl

Build a service that initializes and configures language models for different deployment scenarios.

Requirements

Your service should support three deployment modes:

Local Development Mode: Load a model from a local file path with minimal GPU memory usage
Production Mode: Load a model from HuggingFace Hub with optimized settings for high throughput
Testing Mode: Load a lightweight model suitable for unit testing with specific load format

Functional Requirements

Create a function initialize_model(mode, model_path) that returns an initialized model instance based on the deployment mode
For local development mode:
- Use 50% GPU memory utilization
- Load from the provided local path
For production mode:
- Use 90% GPU memory utilization
- Load from HuggingFace model identifier
- Enable automatic model download
For testing mode:
- Load using safetensors format
- Use the provided model path or identifier

Test Cases

Calling initialize_model("local", "/models/llama-7b") returns a model instance configured with 50% GPU memory utilization @test
Calling initialize_model("production", "meta-llama/Llama-2-7b-hf") returns a model instance configured with 90% GPU memory utilization @test
Calling initialize_model("testing", "facebook/opt-125m") returns a model instance that uses safetensors load format @test
Calling initialize_model("invalid", "some-model") raises a ValueError with an appropriate error message @test

Implementation

@generates

API

def initialize_model(mode: str, model_path: str):
    """
    Initialize a language model based on the specified deployment mode.

    Args:
        mode: Deployment mode - one of "local", "production", or "testing"
        model_path: Path or identifier for the model to load

    Returns:
        An initialized model instance

    Raises:
        ValueError: If mode is not one of the supported values
    """
    pass

Dependencies { .dependencies }

vLLM { .dependency }

Provides high-throughput inference engine for large language models with flexible model loading capabilities.

@satisfied-by

tessl/pypi-vllm

task.mdevals/scenario-4/

Model Configuration Service

Requirements

Functional Requirements

Test Cases

Implementation

API

Dependencies { .dependencies }

vLLM { .dependency }

Version

tessl/pypi-vllm

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-4/

Model Configuration Service

Requirements

Functional Requirements

Test Cases

Implementation

API

Dependencies { .dependencies }

vLLM { .dependency }

Version

task.mdevals/scenario-4/