CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

task.mdevals/scenario-5/

Text Generation with Multiple Candidate Selection

Implement a Python program that generates text using a large language model with support for exploring multiple candidate sequences simultaneously and selecting the best outputs based on quality scoring.

Requirements

Your implementation should:

  1. Initialize a text generation system that loads a pre-trained language model (use "facebook/opt-125m" for testing)

  2. Generate multiple candidate sequences for a given prompt by exploring different paths through the model's prediction space:

    • Configure the system to explore multiple alternatives simultaneously (use 3 parallel paths)
    • Apply length normalization to prevent bias toward shorter sequences (use a penalty factor of 0.8)
    • Generate sequences up to a maximum of 50 tokens
  3. Handle multiple input prompts in a single batch operation:

    • Process at least 2 different prompts together
    • Return the best candidate sequence for each prompt
  4. Support advanced sampling configurations:

    • Allow controlling randomness in generation (temperature parameter)
    • Implement vocabulary filtering to restrict output to specific tokens when needed

Input/Output Specifications

Input:

  • A list of text prompts (strings)
  • Configuration parameters for sequence exploration and selection

Output:

  • For each input prompt, return the highest-scoring generated text sequence
  • Include the score/probability of the selected sequence

Test Cases

Test 1: Basic Multiple Candidate Generation { .test }

@test

Input:

prompts = ["Once upon a time"]

Expected behavior:

  • System should explore 3 candidate paths
  • Return the single best completion
  • Output length should not exceed 50 tokens
  • Returned text should start with the input prompt

Test 2: Batch Processing { .test }

@test

Input:

prompts = [
    "The future of artificial intelligence",
    "In a galaxy far away"
]

Expected behavior:

  • Process both prompts in a single batch
  • Return 2 outputs (one per prompt)
  • Each output should be the best candidate for its respective prompt

Test 3: Vocabulary Restriction { .test }

@test

Input:

prompt = "The color of the sky is"
allowed_tokens = ["blue", "gray", "red", "orange"]

Expected behavior:

  • Generated text should only contain words from the allowed vocabulary
  • System should restrict token generation to specified subset

Implementation

@generates

API

class TextGenerator:
    """
    A text generation system that supports multiple candidate exploration
    and advanced sampling strategies.
    """

    def __init__(self, model_name: str):
        """
        Initialize the text generator with a pre-trained model.

        Args:
            model_name: Name or path of the pre-trained model
        """
        pass

    def generate_with_candidates(
        self,
        prompts: list[str],
        num_candidates: int = 3,
        max_tokens: int = 50,
        length_penalty: float = 1.0,
        temperature: float = 1.0,
        allowed_tokens: list[str] | None = None
    ) -> list[dict]:
        """
        Generate text exploring multiple candidate sequences.

        Args:
            prompts: List of input text prompts
            num_candidates: Number of parallel paths to explore
            max_tokens: Maximum length of generated sequences
            length_penalty: Penalty factor for length normalization
            temperature: Controls randomness (lower = more deterministic)
            allowed_tokens: Optional list of allowed vocabulary tokens

        Returns:
            List of dictionaries, one per prompt, containing:
                - 'text': The best generated sequence
                - 'score': The score/probability of the sequence
        """
        pass

Dependencies { .dependencies }

vLLM { .dependency }

Provides high-performance LLM inference with beam search and advanced sampling capabilities.

Notes

  • The system should automatically handle model loading and GPU/CPU placement
  • Performance should be optimized for batch processing
  • Memory management should be efficient for the model size

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/vllm@0.10.x
tile.json