0
# Text Generation
1
2
Primary text generation functionality in vLLM, providing high-throughput inference with intelligent batching and memory optimization. Supports various prompt formats, sampling strategies, and advanced features like guided decoding and structured output generation.
3
4
## Capabilities
5
6
### Generate Text
7
8
Main method for generating text from prompts using the LLM. Supports batch processing, various sampling parameters, and advanced features like LoRA adapters and guided decoding.
9
10
```python { .api }
11
def generate(
12
self,
13
prompts: Union[PromptType, Sequence[PromptType]],
14
sampling_params: Optional[Union[SamplingParams, Sequence[SamplingParams]]] = None,
15
*,
16
use_tqdm: Union[bool, Callable[..., tqdm]] = True,
17
lora_request: Optional[Union[List[LoRARequest], LoRARequest]] = None,
18
priority: Optional[List[int]] = None
19
) -> List[RequestOutput]:
20
"""
21
Generate text from prompts using the language model.
22
23
Parameters:
24
- prompts: Single prompt or sequence of prompts (str, TextPrompt, TokensPrompt, or EmbedsPrompt)
25
- sampling_params: Parameters controlling generation behavior (temperature, top_p, etc.)
26
- use_tqdm: Whether to show progress bar for batch processing (keyword-only)
27
- lora_request: LoRA adapter request for fine-tuned model variants (keyword-only)
28
- priority: Priority levels for requests in batching (keyword-only)
29
30
Returns:
31
List of RequestOutput objects containing generated text and metadata
32
"""
33
```
34
35
### Beam Search Generation
36
37
Generate text using beam search for exploring multiple generation paths and finding high-quality outputs through systematic search.
38
39
```python { .api }
40
def beam_search(
41
self,
42
prompts: Union[PromptType, Sequence[PromptType]],
43
params: BeamSearchParams
44
) -> List[BeamSearchOutput]:
45
"""
46
Generate text using beam search algorithm.
47
48
Parameters:
49
- prompts: Input prompts for generation
50
- params: Beam search parameters (beam_width, length_penalty, etc.)
51
52
Returns:
53
List of BeamSearchOutput objects with multiple candidate sequences
54
"""
55
```
56
57
### Guided Decoding
58
59
Generate structured output following specific patterns like JSON schemas, regular expressions, or context-free grammars.
60
61
```python { .api }
62
# Used through SamplingParams.guided_decoding
63
class GuidedDecodingParams:
64
json: Optional[Union[str, dict]] = None
65
regex: Optional[str] = None
66
choice: Optional[list[str]] = None
67
grammar: Optional[str] = None
68
json_object: Optional[bool] = None
69
backend: Optional[str] = None
70
whitespace_pattern: Optional[str] = None
71
```
72
73
## Usage Examples
74
75
### Basic Text Generation
76
77
```python
78
from vllm import LLM, SamplingParams
79
80
# Initialize model
81
llm = LLM(model="microsoft/DialoGPT-medium")
82
83
# Configure sampling
84
sampling_params = SamplingParams(
85
temperature=0.8,
86
top_p=0.95,
87
max_tokens=100
88
)
89
90
# Generate text
91
prompts = ["The future of AI is", "Once upon a time"]
92
outputs = llm.generate(prompts, sampling_params)
93
94
for output in outputs:
95
print(f"Prompt: {output.prompt}")
96
print(f"Generated: {output.outputs[0].text}")
97
```
98
99
### Guided JSON Generation
100
101
```python
102
from vllm import LLM, SamplingParams
103
from vllm.sampling_params import GuidedDecodingParams
104
105
llm = LLM(model="microsoft/DialoGPT-medium")
106
107
# Define JSON schema
108
json_schema = {
109
"type": "object",
110
"properties": {
111
"name": {"type": "string"},
112
"age": {"type": "integer"},
113
"city": {"type": "string"}
114
},
115
"required": ["name", "age", "city"]
116
}
117
118
# Configure guided decoding
119
guided_params = GuidedDecodingParams(json=json_schema)
120
sampling_params = SamplingParams(
121
temperature=0.7,
122
max_tokens=150,
123
guided_decoding=guided_params
124
)
125
126
prompt = "Generate a person's information:"
127
outputs = llm.generate(prompt, sampling_params)
128
print(outputs[0].outputs[0].text) # Valid JSON output
129
```
130
131
### Batch Generation with Different Parameters
132
133
```python
134
from vllm import LLM, SamplingParams
135
136
llm = LLM(model="microsoft/DialoGPT-medium")
137
138
prompts = ["Creative story:", "Technical explanation:", "Casual conversation:"]
139
140
# Different sampling parameters for each prompt
141
sampling_params = [
142
SamplingParams(temperature=1.2, top_p=0.9), # Creative
143
SamplingParams(temperature=0.3, top_p=0.95), # Technical
144
SamplingParams(temperature=0.8, top_p=0.9) # Casual
145
]
146
147
outputs = llm.generate(prompts, sampling_params)
148
for output in outputs:
149
print(f"{output.prompt} -> {output.outputs[0].text}")
150
```
151
152
### Using Pre-tokenized Input
153
154
```python
155
from vllm import LLM, SamplingParams
156
157
llm = LLM(model="microsoft/DialoGPT-medium")
158
159
# Pre-tokenize input (useful for custom tokenization)
160
prompt_token_ids = [[1, 2, 3, 4, 5]] # Your tokenized input
161
sampling_params = SamplingParams(temperature=0.8)
162
163
outputs = llm.generate(
164
prompts=[""], # Empty string when using token IDs
165
prompt_token_ids=prompt_token_ids,
166
sampling_params=sampling_params
167
)
168
169
print(outputs[0].outputs[0].text)
170
```
171
172
## Types
173
174
```python { .api }
175
class RequestOutput:
176
request_id: str
177
prompt: Optional[str]
178
prompt_token_ids: list[int]
179
prompt_logprobs: Optional[PromptLogprobs]
180
outputs: list[CompletionOutput]
181
finished: bool
182
metrics: Optional[RequestMetrics]
183
lora_request: Optional[LoRARequest]
184
185
class CompletionOutput:
186
index: int
187
text: str
188
token_ids: list[int]
189
cumulative_logprob: Optional[float]
190
logprobs: Optional[SampleLogprobs]
191
finish_reason: Optional[str] # "stop", "length", "abort"
192
stop_reason: Union[int, str, None] # Specific stop token/string
193
lora_request: Optional[LoRARequest]
194
195
class BeamSearchOutput:
196
sequences: list[BeamSearchSequence]
197
finished: bool
198
199
class BeamSearchSequence:
200
text: str
201
token_ids: list[int]
202
cumulative_logprob: float
203
```