0
# Response Synthesis
1
2
Response generation strategies for combining retrieved context into coherent answers with various summarization approaches and synthesis modes.
3
4
## Capabilities
5
6
### Response Synthesizer Factory
7
8
Factory function for creating response synthesizers with different strategies and configurations.
9
10
```python { .api }
11
def get_response_synthesizer(
12
response_mode="compact",
13
service_context=None,
14
text_qa_template=None,
15
refine_template=None,
16
summary_template=None,
17
simple_template=None,
18
use_async=False,
19
streaming=False,
20
structured_answer_filtering=False,
21
**kwargs
22
):
23
"""
24
Create response synthesizer with specified mode and configuration.
25
26
Args:
27
response_mode: Synthesis strategy ("compact", "refine", "tree_summarize",
28
"simple_summarize", "accumulate", "generation")
29
service_context: Service context (deprecated, use Settings)
30
text_qa_template: Template for question-answering
31
refine_template: Template for iterative refinement
32
summary_template: Template for summarization
33
simple_template: Template for simple responses
34
use_async: Enable asynchronous processing
35
streaming: Enable streaming responses
36
structured_answer_filtering: Filter responses for structured output
37
38
Returns:
39
BaseSynthesizer: Configured response synthesizer
40
"""
41
```
42
43
**Usage Example:**
44
45
```python
46
from llama_index.core import get_response_synthesizer
47
48
# Compact mode (default) - combines chunks efficiently
49
synthesizer = get_response_synthesizer(
50
response_mode="compact",
51
streaming=True
52
)
53
54
# Tree summarize mode - hierarchical summarization
55
tree_synthesizer = get_response_synthesizer(
56
response_mode="tree_summarize",
57
use_async=True
58
)
59
60
# Refine mode - iterative improvement
61
refine_synthesizer = get_response_synthesizer(
62
response_mode="refine",
63
structured_answer_filtering=True
64
)
65
66
# Use with query engine
67
query_engine = index.as_query_engine(
68
response_synthesizer=synthesizer
69
)
70
```
71
72
### Compact Response Synthesis
73
74
Efficient synthesis mode that combines retrieved chunks with intelligent context compression.
75
76
```python { .api }
77
class CompactAndRefine:
78
"""
79
Compact and refine synthesis strategy.
80
81
Combines chunks into larger contexts, then applies refinement for final answer.
82
83
Args:
84
text_qa_template: Template for initial question answering
85
refine_template: Template for iterative refinement
86
max_prompt_size: Maximum prompt size in tokens
87
callback_manager: Callback manager for events
88
use_async: Enable asynchronous processing
89
streaming: Enable streaming responses
90
"""
91
def __init__(
92
self,
93
text_qa_template=None,
94
refine_template=None,
95
max_prompt_size=None,
96
callback_manager=None,
97
use_async=False,
98
streaming=False,
99
**kwargs
100
): ...
101
102
def synthesize(
103
self,
104
query,
105
nodes,
106
additional_source_nodes=None,
107
**kwargs
108
):
109
"""
110
Synthesize response from query and retrieved nodes.
111
112
Args:
113
query: User query or QueryBundle
114
nodes: List of retrieved NodeWithScore objects
115
additional_source_nodes: Extra context nodes
116
117
Returns:
118
Response: Synthesized response with sources
119
"""
120
121
async def asynthesize(self, query, nodes, **kwargs):
122
"""Async version of synthesize."""
123
```
124
125
### Tree Summarization
126
127
Hierarchical summarization strategy that builds responses bottom-up through tree structures.
128
129
```python { .api }
130
class TreeSummarize:
131
"""
132
Tree-based summarization synthesis.
133
134
Recursively summarizes chunks in a tree structure for comprehensive responses.
135
136
Args:
137
summary_template: Template for summarization steps
138
text_qa_template: Template for final question answering
139
use_async: Enable asynchronous processing
140
callback_manager: Callback manager for events
141
"""
142
def __init__(
143
self,
144
summary_template=None,
145
text_qa_template=None,
146
use_async=False,
147
callback_manager=None,
148
**kwargs
149
): ...
150
151
def synthesize(self, query, nodes, **kwargs):
152
"""Tree-based synthesis of response."""
153
154
async def asynthesize(self, query, nodes, **kwargs):
155
"""Async tree synthesis."""
156
```
157
158
**Usage Example:**
159
160
```python
161
from llama_index.core.response_synthesizers import TreeSummarize
162
from llama_index.core.prompts import PromptTemplate
163
164
# Custom summarization template
165
summary_template = PromptTemplate(
166
"Context information is below:\n"
167
"---------------------\n"
168
"{context_str}\n"
169
"---------------------\n"
170
"Summarize the key points relevant to: {query_str}\n"
171
"Summary: "
172
)
173
174
tree_synthesizer = TreeSummarize(
175
summary_template=summary_template,
176
use_async=True
177
)
178
179
# Use with query engine
180
query_engine = index.as_query_engine(
181
response_synthesizer=tree_synthesizer,
182
similarity_top_k=10 # More chunks for tree processing
183
)
184
185
response = query_engine.query("What are the main themes in the documents?")
186
```
187
188
### Iterative Refinement
189
190
Refine synthesis strategy that iteratively improves responses using additional context.
191
192
```python { .api }
193
class Refine:
194
"""
195
Iterative refinement synthesis strategy.
196
197
Starts with initial response and refines it using additional retrieved chunks.
198
199
Args:
200
text_qa_template: Template for initial response
201
refine_template: Template for refinement steps
202
callback_manager: Callback manager for events
203
streaming: Enable streaming responses
204
"""
205
def __init__(
206
self,
207
text_qa_template=None,
208
refine_template=None,
209
callback_manager=None,
210
streaming=False,
211
**kwargs
212
): ...
213
214
def synthesize(self, query, nodes, **kwargs):
215
"""Iteratively refine response using retrieved nodes."""
216
217
async def asynthesize(self, query, nodes, **kwargs):
218
"""Async iterative refinement."""
219
```
220
221
**Usage Example:**
222
223
```python
224
from llama_index.core.response_synthesizers import Refine
225
from llama_index.core.prompts import PromptTemplate
226
227
# Custom refinement template
228
refine_template = PromptTemplate(
229
"The original query is as follows: {query_str}\n"
230
"We have provided an existing answer: {existing_answer}\n"
231
"We have the opportunity to refine the existing answer "
232
"(only if needed) with some more context below.\n"
233
"------------\n"
234
"{context_msg}\n"
235
"------------\n"
236
"Given the new context, refine the original answer to better "
237
"answer the query. If the context isn't useful, return the original answer.\n"
238
"Refined Answer: "
239
)
240
241
refine_synthesizer = Refine(
242
refine_template=refine_template,
243
streaming=True
244
)
245
246
query_engine = index.as_query_engine(
247
response_synthesizer=refine_synthesizer
248
)
249
```
250
251
### Simple Summarization
252
253
Direct summarization strategy for straightforward responses without complex processing.
254
255
```python { .api }
256
class SimpleSummarize:
257
"""
258
Simple summarization synthesis.
259
260
Directly summarizes all retrieved context in a single step.
261
262
Args:
263
text_qa_template: Template for question answering
264
callback_manager: Callback manager for events
265
streaming: Enable streaming responses
266
"""
267
def __init__(
268
self,
269
text_qa_template=None,
270
callback_manager=None,
271
streaming=False,
272
**kwargs
273
): ...
274
275
def synthesize(self, query, nodes, **kwargs):
276
"""Simple one-step summarization."""
277
```
278
279
### Accumulate Responses
280
281
Accumulation strategy that concatenates individual responses from each retrieved chunk.
282
283
```python { .api }
284
class Accumulate:
285
"""
286
Accumulate synthesis strategy.
287
288
Generates individual responses for each chunk and concatenates them.
289
290
Args:
291
text_qa_template: Template for individual chunk responses
292
output_cls: Structured output class
293
callback_manager: Callback manager for events
294
use_async: Enable asynchronous processing
295
"""
296
def __init__(
297
self,
298
text_qa_template=None,
299
output_cls=None,
300
callback_manager=None,
301
use_async=False,
302
**kwargs
303
): ...
304
305
def synthesize(self, query, nodes, **kwargs):
306
"""Accumulate responses from individual chunks."""
307
```
308
309
**Usage Example:**
310
311
```python
312
from llama_index.core.response_synthesizers import Accumulate
313
314
accumulate_synthesizer = Accumulate(
315
use_async=True # Process chunks in parallel
316
)
317
318
# Good for gathering diverse perspectives
319
query_engine = index.as_query_engine(
320
response_synthesizer=accumulate_synthesizer,
321
similarity_top_k=5
322
)
323
324
response = query_engine.query("What are different opinions on this topic?")
325
print(response.response) # Contains accumulated individual responses
326
```
327
328
### Generation Strategy
329
330
Direct generation strategy that creates responses without using retrieved context.
331
332
```python { .api }
333
class Generation:
334
"""
335
Generation synthesis strategy.
336
337
Generates responses directly from the query without using retrieved context.
338
339
Args:
340
simple_template: Template for direct generation
341
callback_manager: Callback manager for events
342
streaming: Enable streaming responses
343
"""
344
def __init__(
345
self,
346
simple_template=None,
347
callback_manager=None,
348
streaming=False,
349
**kwargs
350
): ...
351
352
def synthesize(self, query, nodes, **kwargs):
353
"""Generate response directly from query."""
354
```
355
356
### Base Synthesizer Interface
357
358
Base class for implementing custom response synthesis strategies.
359
360
```python { .api }
361
class BaseSynthesizer:
362
"""
363
Base class for response synthesizers.
364
365
Args:
366
callback_manager: Callback manager for events
367
streaming: Enable streaming responses
368
"""
369
def __init__(
370
self,
371
callback_manager=None,
372
streaming=False,
373
**kwargs
374
): ...
375
376
def synthesize(
377
self,
378
query,
379
nodes,
380
additional_source_nodes=None,
381
**kwargs
382
):
383
"""
384
Synthesize response from query and nodes.
385
386
Args:
387
query: User query string or QueryBundle
388
nodes: List of NodeWithScore objects from retrieval
389
additional_source_nodes: Extra source nodes for context
390
391
Returns:
392
Response: Generated response with metadata
393
"""
394
395
async def asynthesize(self, query, nodes, **kwargs):
396
"""Async version of synthesize method."""
397
398
def get_prompts(self):
399
"""Get prompt templates used by synthesizer."""
400
401
def update_prompts(self, prompts_dict):
402
"""Update prompt templates."""
403
```
404
405
### Structured Output Synthesis
406
407
Advanced synthesis with structured output generation for extracting specific information formats.
408
409
```python { .api }
410
class StructuredResponseSynthesizer(BaseSynthesizer):
411
"""
412
Structured response synthesizer for typed outputs.
413
414
Args:
415
output_cls: Pydantic model class for structured output
416
llm: Language model for generation
417
text_qa_template: Template for question answering
418
streaming: Enable streaming (limited for structured output)
419
"""
420
def __init__(
421
self,
422
output_cls,
423
llm=None,
424
text_qa_template=None,
425
streaming=False,
426
**kwargs
427
): ...
428
429
def synthesize(self, query, nodes, **kwargs):
430
"""Generate structured response matching output_cls schema."""
431
```
432
433
**Structured Output Example:**
434
435
```python
436
from pydantic import BaseModel
437
from typing import List
438
from llama_index.core.response_synthesizers import get_response_synthesizer
439
440
class SummaryOutput(BaseModel):
441
main_points: List[str]
442
sentiment: str
443
confidence_score: float
444
445
# Create structured synthesizer
446
structured_synthesizer = get_response_synthesizer(
447
response_mode="compact",
448
output_cls=SummaryOutput,
449
structured_answer_filtering=True
450
)
451
452
query_engine = index.as_query_engine(
453
response_synthesizer=structured_synthesizer
454
)
455
456
response = query_engine.query("Summarize the main points")
457
structured_data = response.metadata.get("structured_response")
458
# structured_data is now a SummaryOutput instance
459
```
460
461
### Custom Synthesis Strategies
462
463
Framework for implementing custom response synthesis logic with full control over the generation process.
464
465
```python { .api }
466
class CustomSynthesizer(BaseSynthesizer):
467
"""
468
Custom response synthesizer implementation.
469
470
Args:
471
custom_prompt: Custom prompt template
472
processing_fn: Custom processing function
473
**kwargs: BaseSynthesizer arguments
474
"""
475
def __init__(
476
self,
477
custom_prompt=None,
478
processing_fn=None,
479
**kwargs
480
): ...
481
482
def synthesize(self, query, nodes, **kwargs):
483
"""Custom synthesis logic."""
484
context_str = self._prepare_context(nodes)
485
486
if self.processing_fn:
487
return self.processing_fn(query, context_str, **kwargs)
488
489
# Default processing
490
return self._generate_response(query, context_str)
491
492
def _prepare_context(self, nodes):
493
"""Prepare context string from nodes."""
494
return "\n\n".join([node.node.get_content() for node in nodes])
495
496
def _generate_response(self, query, context):
497
"""Generate response using LLM."""
498
# Implementation details
499
pass
500
```
501
502
**Custom Synthesizer Example:**
503
504
```python
505
from llama_index.core.response_synthesizers import BaseSynthesizer
506
from llama_index.core.base.response.schema import Response
507
508
class FactCheckSynthesizer(BaseSynthesizer):
509
"""Custom synthesizer that fact-checks responses."""
510
511
def __init__(self, fact_check_threshold=0.8, **kwargs):
512
super().__init__(**kwargs)
513
self.fact_check_threshold = fact_check_threshold
514
515
def synthesize(self, query, nodes, **kwargs):
516
# Generate initial response
517
context_str = "\n\n".join([node.node.get_content() for node in nodes])
518
519
initial_response = self._llm.complete(
520
f"Context: {context_str}\n\nQuestion: {query}\n\nAnswer:"
521
)
522
523
# Fact-check the response
524
fact_check_score = self._fact_check(initial_response.text, context_str)
525
526
if fact_check_score < self.fact_check_threshold:
527
# Generate more conservative response
528
refined_response = self._llm.complete(
529
f"Based only on the provided context, answer: {query}\n"
530
f"Context: {context_str}\n"
531
f"Conservative Answer:"
532
)
533
response_text = refined_response.text
534
else:
535
response_text = initial_response.text
536
537
return Response(
538
response=response_text,
539
source_nodes=nodes,
540
metadata={"fact_check_score": fact_check_score}
541
)
542
543
def _fact_check(self, response_text, context_str):
544
# Custom fact-checking logic
545
# Return confidence score 0-1
546
return 0.9 # Placeholder
547
548
# Use custom synthesizer
549
fact_check_synthesizer = FactCheckSynthesizer(
550
fact_check_threshold=0.85,
551
streaming=False
552
)
553
554
query_engine = index.as_query_engine(
555
response_synthesizer=fact_check_synthesizer
556
)
557
```
558
559
### Response Metadata and Source Tracking
560
561
Advanced response objects with comprehensive metadata and source attribution.
562
563
```python { .api }
564
class Response:
565
"""
566
Response object with synthesis results and metadata.
567
568
Attributes:
569
response: Generated response text
570
source_nodes: List of source nodes used
571
metadata: Additional response metadata
572
"""
573
response: str
574
source_nodes: List[NodeWithScore]
575
metadata: Dict[str, Any]
576
577
def get_formatted_sources(self, length=100):
578
"""Get formatted source excerpts."""
579
580
def __str__(self):
581
"""String representation of response."""
582
583
class StreamingResponse:
584
"""
585
Streaming response for real-time synthesis.
586
587
Methods:
588
response_gen: Generator yielding response tokens
589
get_response: Get complete response object
590
print_response_stream: Print streaming response
591
"""
592
def response_gen(self):
593
"""Generate response tokens in real-time."""
594
595
def get_response(self):
596
"""Get final complete response."""
597
598
def print_response_stream(self):
599
"""Print response as it's generated."""
600
```
601
602
**Response Usage Example:**
603
604
```python
605
# Regular response
606
response = query_engine.query("What is machine learning?")
607
print(f"Response: {response.response}")
608
print(f"Sources: {len(response.source_nodes)}")
609
print(f"Metadata: {response.metadata}")
610
611
# Streaming response
612
streaming_engine = index.as_query_engine(
613
response_synthesizer=get_response_synthesizer(streaming=True)
614
)
615
616
streaming_response = streaming_engine.query("Explain neural networks")
617
streaming_response.print_response_stream()
618
```