tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
Build a service that initializes and configures language models for different deployment scenarios.
Your service should support three deployment modes:
initialize_model(mode, model_path) that returns an initialized model instance based on the deployment modeinitialize_model("local", "/models/llama-7b") returns a model instance configured with 50% GPU memory utilization @testinitialize_model("production", "meta-llama/Llama-2-7b-hf") returns a model instance configured with 90% GPU memory utilization @testinitialize_model("testing", "facebook/opt-125m") returns a model instance that uses safetensors load format @testinitialize_model("invalid", "some-model") raises a ValueError with an appropriate error message @test@generates
def initialize_model(mode: str, model_path: str):
"""
Initialize a language model based on the specified deployment mode.
Args:
mode: Deployment mode - one of "local", "production", or "testing"
model_path: Path or identifier for the model to load
Returns:
An initialized model instance
Raises:
ValueError: If mode is not one of the supported values
"""
passProvides high-throughput inference engine for large language models with flexible model loading capabilities.
@satisfied-by