Tessl Tile for pypi/vllm@0.10.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

async-inference.md chat-completions.md configuration.md index.md parameters-types.md text-classification.md text-embeddings.md text-generation.md text-scoring.md

text-generation.mddocs/

0
# Text Generation
1

2
Primary text generation functionality in vLLM, providing high-throughput inference with intelligent batching and memory optimization. Supports various prompt formats, sampling strategies, and advanced features like guided decoding and structured output generation.
3

4
## Capabilities
5

6
### Generate Text
7

8
Main method for generating text from prompts using the LLM. Supports batch processing, various sampling parameters, and advanced features like LoRA adapters and guided decoding.
9

10
```python { .api }
11
def generate(
12
    self,
13
    prompts: Union[PromptType, Sequence[PromptType]],
14
    sampling_params: Optional[Union[SamplingParams, Sequence[SamplingParams]]] = None,
15
    *,
16
    use_tqdm: Union[bool, Callable[..., tqdm]] = True,
17
    lora_request: Optional[Union[List[LoRARequest], LoRARequest]] = None,
18
    priority: Optional[List[int]] = None
19
) -> List[RequestOutput]:
20
    """
21
    Generate text from prompts using the language model.
22

23
    Parameters:
24
    - prompts: Single prompt or sequence of prompts (str, TextPrompt, TokensPrompt, or EmbedsPrompt)
25
    - sampling_params: Parameters controlling generation behavior (temperature, top_p, etc.)
26
    - use_tqdm: Whether to show progress bar for batch processing (keyword-only)
27
    - lora_request: LoRA adapter request for fine-tuned model variants (keyword-only)
28
    - priority: Priority levels for requests in batching (keyword-only)
29

30
    Returns:
31
    List of RequestOutput objects containing generated text and metadata
32
    """
33
```
34

35
### Beam Search Generation
36

37
Generate text using beam search for exploring multiple generation paths and finding high-quality outputs through systematic search.
38

39
```python { .api }
40
def beam_search(
41
    self,
42
    prompts: Union[PromptType, Sequence[PromptType]],
43
    params: BeamSearchParams
44
) -> List[BeamSearchOutput]:
45
    """
46
    Generate text using beam search algorithm.
47

48
    Parameters:
49
    - prompts: Input prompts for generation
50
    - params: Beam search parameters (beam_width, length_penalty, etc.)
51

52
    Returns:
53
    List of BeamSearchOutput objects with multiple candidate sequences
54
    """
55
```
56

57
### Guided Decoding
58

59
Generate structured output following specific patterns like JSON schemas, regular expressions, or context-free grammars.
60

61
```python { .api }
62
# Used through SamplingParams.guided_decoding
63
class GuidedDecodingParams:
64
    json: Optional[Union[str, dict]] = None
65
    regex: Optional[str] = None
66
    choice: Optional[list[str]] = None
67
    grammar: Optional[str] = None
68
    json_object: Optional[bool] = None
69
    backend: Optional[str] = None
70
    whitespace_pattern: Optional[str] = None
71
```
72

73
## Usage Examples
74

75
### Basic Text Generation
76

77
```python
78
from vllm import LLM, SamplingParams
79

80
# Initialize model
81
llm = LLM(model="microsoft/DialoGPT-medium")
82

83
# Configure sampling
84
sampling_params = SamplingParams(
85
    temperature=0.8,
86
    top_p=0.95,
87
    max_tokens=100
88
)
89

90
# Generate text
91
prompts = ["The future of AI is", "Once upon a time"]
92
outputs = llm.generate(prompts, sampling_params)
93

94
for output in outputs:
95
    print(f"Prompt: {output.prompt}")
96
    print(f"Generated: {output.outputs[0].text}")
97
```
98

99
### Guided JSON Generation
100

101
```python
102
from vllm import LLM, SamplingParams
103
from vllm.sampling_params import GuidedDecodingParams
104

105
llm = LLM(model="microsoft/DialoGPT-medium")
106

107
# Define JSON schema
108
json_schema = {
109
    "type": "object",
110
    "properties": {
111
        "name": {"type": "string"},
112
        "age": {"type": "integer"},
113
        "city": {"type": "string"}
114
    },
115
    "required": ["name", "age", "city"]
116
}
117

118
# Configure guided decoding
119
guided_params = GuidedDecodingParams(json=json_schema)
120
sampling_params = SamplingParams(
121
    temperature=0.7,
122
    max_tokens=150,
123
    guided_decoding=guided_params
124
)
125

126
prompt = "Generate a person's information:"
127
outputs = llm.generate(prompt, sampling_params)
128
print(outputs[0].outputs[0].text)  # Valid JSON output
129
```
130

131
### Batch Generation with Different Parameters
132

133
```python
134
from vllm import LLM, SamplingParams
135

136
llm = LLM(model="microsoft/DialoGPT-medium")
137

138
prompts = ["Creative story:", "Technical explanation:", "Casual conversation:"]
139

140
# Different sampling parameters for each prompt
141
sampling_params = [
142
    SamplingParams(temperature=1.2, top_p=0.9),  # Creative
143
    SamplingParams(temperature=0.3, top_p=0.95), # Technical
144
    SamplingParams(temperature=0.8, top_p=0.9)   # Casual
145
]
146

147
outputs = llm.generate(prompts, sampling_params)
148
for output in outputs:
149
    print(f"{output.prompt} -> {output.outputs[0].text}")
150
```
151

152
### Using Pre-tokenized Input
153

154
```python
155
from vllm import LLM, SamplingParams
156

157
llm = LLM(model="microsoft/DialoGPT-medium")
158

159
# Pre-tokenize input (useful for custom tokenization)
160
prompt_token_ids = [[1, 2, 3, 4, 5]]  # Your tokenized input
161
sampling_params = SamplingParams(temperature=0.8)
162

163
outputs = llm.generate(
164
    prompts=[""],  # Empty string when using token IDs
165
    prompt_token_ids=prompt_token_ids,
166
    sampling_params=sampling_params
167
)
168

169
print(outputs[0].outputs[0].text)
170
```
171

172
## Types
173

174
```python { .api }
175
class RequestOutput:
176
    request_id: str
177
    prompt: Optional[str]
178
    prompt_token_ids: list[int]
179
    prompt_logprobs: Optional[PromptLogprobs]
180
    outputs: list[CompletionOutput]
181
    finished: bool
182
    metrics: Optional[RequestMetrics]
183
    lora_request: Optional[LoRARequest]
184

185
class CompletionOutput:
186
    index: int
187
    text: str
188
    token_ids: list[int]
189
    cumulative_logprob: Optional[float]
190
    logprobs: Optional[SampleLogprobs]
191
    finish_reason: Optional[str]  # "stop", "length", "abort"
192
    stop_reason: Union[int, str, None]  # Specific stop token/string
193
    lora_request: Optional[LoRARequest]
194

195
class BeamSearchOutput:
196
    sequences: list[BeamSearchSequence]
197
    finished: bool
198

199
class BeamSearchSequence:
200
    text: str
201
    token_ids: list[int]
202
    cumulative_logprob: float
203
```

Version

Tile

Files

text-generation.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

text-generation.mddocs/