CtrlK

Community Documentation Log in Get started

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

Text Generation with Multiple Candidate Selection

Name: tessl/pypi-vllm
Author: tessl

Implement a Python program that generates text using a large language model with support for exploring multiple candidate sequences simultaneously and selecting the best outputs based on quality scoring.

Requirements

Your implementation should:

Initialize a text generation system that loads a pre-trained language model (use "facebook/opt-125m" for testing)
Generate multiple candidate sequences for a given prompt by exploring different paths through the model's prediction space:
- Configure the system to explore multiple alternatives simultaneously (use 3 parallel paths)
- Apply length normalization to prevent bias toward shorter sequences (use a penalty factor of 0.8)
- Generate sequences up to a maximum of 50 tokens
Handle multiple input prompts in a single batch operation:
- Process at least 2 different prompts together
- Return the best candidate sequence for each prompt
Support advanced sampling configurations:
- Allow controlling randomness in generation (temperature parameter)
- Implement vocabulary filtering to restrict output to specific tokens when needed

Input/Output Specifications

Input:

A list of text prompts (strings)
Configuration parameters for sequence exploration and selection

Output:

For each input prompt, return the highest-scoring generated text sequence
Include the score/probability of the selected sequence

Test Cases

Test 1: Basic Multiple Candidate Generation { .test }

@test

Input:

prompts = ["Once upon a time"]

Expected behavior:

System should explore 3 candidate paths
Return the single best completion
Output length should not exceed 50 tokens
Returned text should start with the input prompt

Test 2: Batch Processing { .test }

@test

Input:

prompts = [
    "The future of artificial intelligence",
    "In a galaxy far away"
]

Expected behavior:

Process both prompts in a single batch
Return 2 outputs (one per prompt)
Each output should be the best candidate for its respective prompt

Test 3: Vocabulary Restriction { .test }

@test

Input:

prompt = "The color of the sky is"
allowed_tokens = ["blue", "gray", "red", "orange"]

Expected behavior:

Generated text should only contain words from the allowed vocabulary
System should restrict token generation to specified subset

Implementation

@generates

API

class TextGenerator:
    """
    A text generation system that supports multiple candidate exploration
    and advanced sampling strategies.
    """

    def __init__(self, model_name: str):
        """
        Initialize the text generator with a pre-trained model.

        Args:
            model_name: Name or path of the pre-trained model
        """
        pass

    def generate_with_candidates(
        self,
        prompts: list[str],
        num_candidates: int = 3,
        max_tokens: int = 50,
        length_penalty: float = 1.0,
        temperature: float = 1.0,
        allowed_tokens: list[str] | None = None
    ) -> list[dict]:
        """
        Generate text exploring multiple candidate sequences.

        Args:
            prompts: List of input text prompts
            num_candidates: Number of parallel paths to explore
            max_tokens: Maximum length of generated sequences
            length_penalty: Penalty factor for length normalization
            temperature: Controls randomness (lower = more deterministic)
            allowed_tokens: Optional list of allowed vocabulary tokens

        Returns:
            List of dictionaries, one per prompt, containing:
                - 'text': The best generated sequence
                - 'score': The score/probability of the sequence
        """
        pass

Dependencies { .dependencies }

vLLM { .dependency }

Provides high-performance LLM inference with beam search and advanced sampling capabilities.

Notes

The system should automatically handle model loading and GPU/CPU placement
Performance should be optimized for batch processing
Memory management should be efficient for the model size

tessl/pypi-vllm

task.mdevals/scenario-5/

Text Generation with Multiple Candidate Selection

Requirements

Input/Output Specifications

Test Cases

Test 1: Basic Multiple Candidate Generation { .test }

Test 2: Batch Processing { .test }

Test 3: Vocabulary Restriction { .test }

Implementation

API

Dependencies { .dependencies }

vLLM { .dependency }

Notes

Version

tessl/pypi-vllm

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-5/

Text Generation with Multiple Candidate Selection

Requirements

Input/Output Specifications

Test Cases

Test 1: Basic Multiple Candidate Generation { .test }

Test 2: Batch Processing { .test }

Test 3: Vocabulary Restriction { .test }

Implementation

API

Dependencies { .dependencies }

vLLM { .dependency }

Notes

Version

task.mdevals/scenario-5/