tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
Implement a Python program that generates text using a large language model with support for exploring multiple candidate sequences simultaneously and selecting the best outputs based on quality scoring.
Your implementation should:
Initialize a text generation system that loads a pre-trained language model (use "facebook/opt-125m" for testing)
Generate multiple candidate sequences for a given prompt by exploring different paths through the model's prediction space:
Handle multiple input prompts in a single batch operation:
Support advanced sampling configurations:
Input:
Output:
@test
Input:
prompts = ["Once upon a time"]Expected behavior:
@test
Input:
prompts = [
"The future of artificial intelligence",
"In a galaxy far away"
]Expected behavior:
@test
Input:
prompt = "The color of the sky is"
allowed_tokens = ["blue", "gray", "red", "orange"]Expected behavior:
@generates
class TextGenerator:
"""
A text generation system that supports multiple candidate exploration
and advanced sampling strategies.
"""
def __init__(self, model_name: str):
"""
Initialize the text generator with a pre-trained model.
Args:
model_name: Name or path of the pre-trained model
"""
pass
def generate_with_candidates(
self,
prompts: list[str],
num_candidates: int = 3,
max_tokens: int = 50,
length_penalty: float = 1.0,
temperature: float = 1.0,
allowed_tokens: list[str] | None = None
) -> list[dict]:
"""
Generate text exploring multiple candidate sequences.
Args:
prompts: List of input text prompts
num_candidates: Number of parallel paths to explore
max_tokens: Maximum length of generated sequences
length_penalty: Penalty factor for length normalization
temperature: Controls randomness (lower = more deterministic)
allowed_tokens: Optional list of allowed vocabulary tokens
Returns:
List of dictionaries, one per prompt, containing:
- 'text': The best generated sequence
- 'score': The score/probability of the sequence
"""
passProvides high-performance LLM inference with beam search and advanced sampling capabilities.