or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

agents-tools.mddocuments-nodes.mdevaluation.mdindex.mdindices.mdllms-embeddings.mdnode-parsers.mdpostprocessors.mdprompts.mdquery-engines.mdretrievers.mdsettings.mdstorage.md

postprocessors.mddocs/

0

# Postprocessors

1

2

Components for processing and refining retrieved results, including similarity filtering, reranking, metadata replacement, and recency scoring. Postprocessors enhance retrieval quality by applying various refinement strategies to improve relevance and remove irrelevant content.

3

4

## Capabilities

5

6

### Base Postprocessor Interface

7

8

Foundation interface for all postprocessing operations with standardized node processing methods.

9

10

```python { .api }

11

class BaseNodePostprocessor:

12

"""

13

Base interface for node postprocessing operations.

14

15

Parameters:

16

- callback_manager: Optional[CallbackManager], callback management system

17

"""

18

def __init__(self, callback_manager: Optional[CallbackManager] = None): ...

19

20

def postprocess_nodes(

21

self,

22

nodes: List[NodeWithScore],

23

query_bundle: Optional[QueryBundle] = None

24

) -> List[NodeWithScore]:

25

"""

26

Process and refine retrieved nodes.

27

28

Parameters:

29

- nodes: List[NodeWithScore], nodes to postprocess

30

- query_bundle: Optional[QueryBundle], original query for context

31

32

Returns:

33

- List[NodeWithScore], processed and refined nodes

34

"""

35

36

def _postprocess_nodes(

37

self,

38

nodes: List[NodeWithScore],

39

query_bundle: Optional[QueryBundle] = None

40

) -> List[NodeWithScore]:

41

"""Internal postprocessing method to be implemented by subclasses."""

42

```

43

44

### Similarity Filtering

45

46

Filters nodes based on relevance scores and similarity thresholds to remove low-quality results.

47

48

```python { .api }

49

class SimilarityPostprocessor(BaseNodePostprocessor):

50

"""

51

Postprocessor that filters nodes based on similarity score thresholds.

52

53

Parameters:

54

- similarity_cutoff: Optional[float], minimum similarity score to retain nodes

55

"""

56

def __init__(self, similarity_cutoff: Optional[float] = None): ...

57

```

58

59

### Keyword Filtering

60

61

Filters nodes based on keyword inclusion or exclusion criteria for content-based filtering.

62

63

```python { .api }

64

class KeywordNodePostprocessor(BaseNodePostprocessor):

65

"""

66

Postprocessor for keyword-based node filtering.

67

68

Parameters:

69

- required_keywords: Optional[List[str]], keywords that must be present

70

- exclude_keywords: Optional[List[str]], keywords that must not be present

71

- lang: str, language for keyword matching

72

"""

73

def __init__(

74

self,

75

required_keywords: Optional[List[str]] = None,

76

exclude_keywords: Optional[List[str]] = None,

77

lang: str = "en"

78

): ...

79

```

80

81

### Context Enhancement

82

83

Enhances nodes by adding adjacent or related content for better context understanding.

84

85

```python { .api }

86

class PrevNextNodePostprocessor(BaseNodePostprocessor):

87

"""

88

Postprocessor that adds previous and next nodes for enhanced context.

89

90

Parameters:

91

- docstore: BaseDocumentStore, document store for node relationships

92

- num_nodes: int, number of previous/next nodes to include

93

- mode: str, inclusion mode (previous, next, or both)

94

"""

95

def __init__(

96

self,

97

docstore: BaseDocumentStore,

98

num_nodes: int = 1,

99

mode: str = "both"

100

): ...

101

102

class AutoPrevNextNodePostprocessor(BaseNodePostprocessor):

103

"""

104

Automatic previous/next node inclusion with intelligent boundary detection.

105

106

Parameters:

107

- docstore: BaseDocumentStore, document store for node relationships

108

- num_nodes: int, number of nodes to include in each direction

109

"""

110

def __init__(

111

self,

112

docstore: BaseDocumentStore,

113

num_nodes: int = 1

114

): ...

115

```

116

117

### Long Context Optimization

118

119

Reorders and optimizes nodes for long context scenarios to improve model performance.

120

121

```python { .api }

122

class LongContextReorder(BaseNodePostprocessor):

123

"""

124

Reorders nodes to optimize performance in long context scenarios.

125

126

Long context reordering places the most relevant information at the beginning

127

and end of the context window where language models pay more attention.

128

"""

129

def __init__(self): ...

130

```

131

132

### Recency Processing

133

134

Applies recency-based scoring and filtering to prioritize recent or time-relevant content.

135

136

```python { .api }

137

class FixedRecencyPostprocessor(BaseNodePostprocessor):

138

"""

139

Postprocessor that applies fixed recency scoring based on date metadata.

140

141

Parameters:

142

- top_k: int, number of top recent nodes to return

143

- date_key: str, metadata key containing date information

144

- service_context: Optional[ServiceContext], service context for processing

145

"""

146

def __init__(

147

self,

148

top_k: int = 1,

149

date_key: str = "date",

150

service_context: Optional[ServiceContext] = None

151

): ...

152

153

class EmbeddingRecencyPostprocessor(BaseNodePostprocessor):

154

"""

155

Recency postprocessor using embedding-based similarity for temporal relevance.

156

157

Parameters:

158

- embed_model: Optional[BaseEmbedding], embedding model for similarity computation

159

- similarity_cutoff: float, minimum similarity threshold

160

- date_key: str, metadata key containing date information

161

- service_context: Optional[ServiceContext], service context for processing

162

"""

163

def __init__(

164

self,

165

embed_model: Optional[BaseEmbedding] = None,

166

similarity_cutoff: float = 0.7,

167

date_key: str = "date",

168

service_context: Optional[ServiceContext] = None

169

): ...

170

171

class TimeWeightedPostprocessor(BaseNodePostprocessor):

172

"""

173

Time-weighted relevance scoring that balances content relevance with recency.

174

175

Parameters:

176

- time_decay: float, decay factor for time-based scoring

177

- time_access_refresh: bool, whether to refresh access times

178

- top_k: int, number of top nodes to return

179

"""

180

def __init__(

181

self,

182

time_decay: float = 0.99,

183

time_access_refresh: bool = True,

184

top_k: int = 1

185

): ...

186

```

187

188

### Privacy & Security Processing

189

190

Removes or masks personally identifiable information (PII) and sensitive data from retrieved content.

191

192

```python { .api }

193

class PIINodePostprocessor(BaseNodePostprocessor):

194

"""

195

Postprocessor for detecting and removing personally identifiable information.

196

197

Parameters:

198

- pii_node_info_key: str, metadata key for storing PII information

199

- pii_str_tmpl: str, template for PII replacement strings

200

- service_context: Optional[ServiceContext], service context for processing

201

"""

202

def __init__(

203

self,

204

pii_node_info_key: str = "__pii_node_info__",

205

pii_str_tmpl: str = "[PII_REMOVED]",

206

service_context: Optional[ServiceContext] = None

207

): ...

208

209

class NERPIINodePostprocessor(BaseNodePostprocessor):

210

"""

211

Named Entity Recognition-based PII detection and removal postprocessor.

212

213

Parameters:

214

- pii_node_info_key: str, metadata key for PII information

215

- pii_str_tmpl: str, template for PII replacement

216

- ner_model_name: str, name of NER model to use

217

- service_context: Optional[ServiceContext], service context for processing

218

"""

219

def __init__(

220

self,

221

pii_node_info_key: str = "__pii_node_info__",

222

pii_str_tmpl: str = "[PII_REMOVED]",

223

ner_model_name: str = "StanfordAIMI/stanford-deidentifier-base",

224

service_context: Optional[ServiceContext] = None

225

): ...

226

```

227

228

### Reranking Systems

229

230

Advanced reranking using language models and specialized algorithms to improve result ordering.

231

232

```python { .api }

233

class LLMRerank(BaseNodePostprocessor):

234

"""

235

LLM-based reranking postprocessor for improved result ordering.

236

237

Parameters:

238

- choice_batch_size: int, batch size for LLM processing

239

- top_n: int, number of top nodes to return after reranking

240

- service_context: Optional[ServiceContext], service context for LLM operations

241

- choice_select_prompt: Optional[PromptTemplate], prompt for node selection

242

- choice_batch_select_prompt: Optional[PromptTemplate], prompt for batch selection

243

- llm: Optional[LLM], language model for reranking

244

"""

245

def __init__(

246

self,

247

choice_batch_size: int = 10,

248

top_n: int = 10,

249

service_context: Optional[ServiceContext] = None,

250

choice_select_prompt: Optional[PromptTemplate] = None,

251

choice_batch_select_prompt: Optional[PromptTemplate] = None,

252

llm: Optional[LLM] = None

253

): ...

254

255

class StructuredLLMRerank(BaseNodePostprocessor):

256

"""

257

Structured LLM reranking with explicit scoring criteria and rationale.

258

259

Parameters:

260

- llm: Optional[LLM], language model for structured reranking

261

- top_n: int, number of top nodes to return

262

- choice_batch_size: int, batch size for processing

263

"""

264

def __init__(

265

self,

266

llm: Optional[LLM] = None,

267

top_n: int = 10,

268

choice_batch_size: int = 10

269

): ...

270

271

class SentenceTransformerRerank(BaseNodePostprocessor):

272

"""

273

Sentence transformer-based reranking for semantic similarity.

274

275

Parameters:

276

- model: str, sentence transformer model name

277

- top_n: int, number of top nodes to return

278

- device: Optional[str], device for model computation (cpu, cuda)

279

- keep_retrieval_score: bool, whether to preserve original retrieval scores

280

"""

281

def __init__(

282

self,

283

model: str = "cross-encoder/ms-marco-MiniLM-L-2-v2",

284

top_n: int = 10,

285

device: Optional[str] = None,

286

keep_retrieval_score: bool = False

287

): ...

288

```

289

290

### Embedding Optimization

291

292

Optimizes embedding-based operations and enhances semantic understanding of retrieved content.

293

294

```python { .api }

295

class SentenceEmbeddingOptimizer(BaseNodePostprocessor):

296

"""

297

Optimizer for sentence embeddings to improve semantic retrieval quality.

298

299

Parameters:

300

- embed_model: Optional[BaseEmbedding], embedding model for optimization

301

- percentile_cutoff: Optional[float], percentile cutoff for optimization

302

- threshold_cutoff: Optional[float], absolute threshold for optimization

303

- mode: str, optimization mode (percentile, threshold, or auto)

304

"""

305

def __init__(

306

self,

307

embed_model: Optional[BaseEmbedding] = None,

308

percentile_cutoff: Optional[float] = None,

309

threshold_cutoff: Optional[float] = None,

310

mode: str = "percentile"

311

): ...

312

```

313

314

### Metadata Processing

315

316

Processes and transforms node metadata to enhance content understanding and presentation.

317

318

```python { .api }

319

class MetadataReplacementPostProcessor(BaseNodePostprocessor):

320

"""

321

Postprocessor for replacing and transforming node metadata.

322

323

Parameters:

324

- target_metadata_key: str, metadata key to replace or transform

325

- new_metadata_key: str, new key name for transformed metadata

326

- replacement_function: Optional[Callable], function for metadata transformation

327

"""

328

def __init__(

329

self,

330

target_metadata_key: str,

331

new_metadata_key: str = "new_metadata",

332

replacement_function: Optional[Callable] = None

333

): ...

334

```

335

336

### Document Relevance Processing

337

338

Advanced relevance scoring and document-level processing for improved result quality.

339

340

```python { .api }

341

class DocumentWithRelevance:

342

"""

343

Document wrapper with relevance scoring for postprocessing operations.

344

345

Parameters:

346

- document: Document, the original document

347

- relevance_score: float, computed relevance score

348

- metadata: Optional[dict], additional relevance metadata

349

"""

350

def __init__(

351

self,

352

document: Document,

353

relevance_score: float,

354

metadata: Optional[dict] = None

355

): ...

356

357

@property

358

def text(self) -> str:

359

"""Get document text content."""

360

361

@property

362

def doc_id(self) -> str:

363

"""Get document identifier."""

364

```

365

366

## Usage Examples

367

368

### Basic Similarity Filtering

369

370

```python

371

from llama_index.core.postprocessor import SimilarityPostprocessor

372

from llama_index.core.schema import NodeWithScore, TextNode

373

374

# Create test nodes with scores

375

nodes = [

376

NodeWithScore(node=TextNode(text="Machine learning algorithms"), score=0.85),

377

NodeWithScore(node=TextNode(text="Deep learning techniques"), score=0.72),

378

NodeWithScore(node=TextNode(text="Unrelated content here"), score=0.45),

379

NodeWithScore(node=TextNode(text="Neural network architectures"), score=0.78)

380

]

381

382

# Filter by similarity threshold

383

similarity_filter = SimilarityPostprocessor(similarity_cutoff=0.7)

384

filtered_nodes = similarity_filter.postprocess_nodes(nodes)

385

386

print(f"Original nodes: {len(nodes)}")

387

print(f"Filtered nodes: {len(filtered_nodes)}")

388

for node in filtered_nodes:

389

print(f"Score: {node.score:.2f}, Text: {node.text}")

390

```

391

392

### Keyword-Based Filtering

393

394

```python

395

from llama_index.core.postprocessor import KeywordNodePostprocessor

396

397

# Keyword filtering

398

keyword_filter = KeywordNodePostprocessor(

399

required_keywords=["machine", "learning"],

400

exclude_keywords=["unrelated", "spam"]

401

)

402

403

filtered_by_keywords = keyword_filter.postprocess_nodes(nodes)

404

print(f"Keyword filtered nodes: {len(filtered_by_keywords)}")

405

```

406

407

### LLM-Based Reranking

408

409

```python

410

from llama_index.core.postprocessor import LLMRerank

411

from llama_index.core.llms import MockLLM

412

413

# Initialize LLM reranker

414

llm = MockLLM()

415

reranker = LLMRerank(

416

llm=llm,

417

top_n=3,

418

choice_batch_size=5

419

)

420

421

# Rerank nodes based on relevance

422

reranked_nodes = reranker.postprocess_nodes(

423

nodes,

424

query_bundle=QueryBundle(query_str="What is machine learning?")

425

)

426

427

print("Reranked results:")

428

for i, node in enumerate(reranked_nodes):

429

print(f"{i+1}. Score: {node.score:.2f}, Text: {node.text}")

430

```

431

432

### Context Enhancement with Previous/Next Nodes

433

434

```python

435

from llama_index.core.postprocessor import PrevNextNodePostprocessor

436

from llama_index.core.storage.docstore import SimpleDocumentStore

437

438

# Setup document store with node relationships

439

docstore = SimpleDocumentStore()

440

# Add nodes with relationships to docstore

441

# docstore.add_documents([...])

442

443

# Context enhancement postprocessor

444

context_enhancer = PrevNextNodePostprocessor(

445

docstore=docstore,

446

num_nodes=1,

447

mode="both"

448

)

449

450

# Add context to retrieved nodes

451

enhanced_nodes = context_enhancer.postprocess_nodes(nodes)

452

print("Enhanced nodes with context:")

453

for node in enhanced_nodes:

454

print(f"Enhanced text length: {len(node.text)}")

455

```

456

457

### Recency-Based Processing

458

459

```python

460

from llama_index.core.postprocessor import FixedRecencyPostprocessor

461

from datetime import datetime, timedelta

462

463

# Create nodes with date metadata

464

recent_nodes = [

465

NodeWithScore(

466

node=TextNode(

467

text="Latest ML research findings",

468

metadata={"date": datetime.now().isoformat()}

469

),

470

score=0.75

471

),

472

NodeWithScore(

473

node=TextNode(

474

text="Historical ML overview",

475

metadata={"date": (datetime.now() - timedelta(days=365)).isoformat()}

476

),

477

score=0.80

478

)

479

]

480

481

# Prioritize recent content

482

recency_processor = FixedRecencyPostprocessor(

483

top_k=1,

484

date_key="date"

485

)

486

487

recent_filtered = recency_processor.postprocess_nodes(recent_nodes)

488

print("Most recent content:")

489

for node in recent_filtered:

490

print(f"Date: {node.node.metadata['date']}")

491

print(f"Text: {node.text}")

492

```

493

494

### PII Removal

495

496

```python

497

from llama_index.core.postprocessor import PIINodePostprocessor

498

499

# Nodes with potential PII

500

pii_nodes = [

501

NodeWithScore(

502

node=TextNode(text="Contact John Doe at john.doe@email.com for more info"),

503

score=0.80

504

),

505

NodeWithScore(

506

node=TextNode(text="The phone number is 555-123-4567"),

507

score=0.75

508

)

509

]

510

511

# Remove PII from nodes

512

pii_remover = PIINodePostprocessor(pii_str_tmpl="[REDACTED]")

513

sanitized_nodes = pii_remover.postprocess_nodes(pii_nodes)

514

515

print("Sanitized content:")

516

for node in sanitized_nodes:

517

print(f"Text: {node.text}")

518

```

519

520

### Long Context Optimization

521

522

```python

523

from llama_index.core.postprocessor import LongContextReorder

524

525

# Reorder for long context optimization

526

long_context_reorder = LongContextReorder()

527

reordered_nodes = long_context_reorder.postprocess_nodes(nodes)

528

529

print("Reordered for long context:")

530

for i, node in enumerate(reordered_nodes):

531

print(f"Position {i}: {node.text[:50]}...")

532

```

533

534

### Sentence Transformer Reranking

535

536

```python

537

from llama_index.core.postprocessor import SentenceTransformerRerank

538

539

# Advanced semantic reranking

540

sentence_reranker = SentenceTransformerRerank(

541

model="cross-encoder/ms-marco-MiniLM-L-2-v2",

542

top_n=3,

543

keep_retrieval_score=True

544

)

545

546

# Note: This requires actual sentence-transformers library

547

# reranked_semantic = sentence_reranker.postprocess_nodes(

548

# nodes,

549

# query_bundle=QueryBundle(query_str="machine learning algorithms")

550

# )

551

```

552

553

### Chaining Multiple Postprocessors

554

555

```python

556

# Chain multiple postprocessors

557

postprocessors = [

558

SimilarityPostprocessor(similarity_cutoff=0.6),

559

KeywordNodePostprocessor(required_keywords=["machine", "learning"]),

560

LLMRerank(llm=llm, top_n=2)

561

]

562

563

# Apply postprocessors in sequence

564

processed_nodes = nodes

565

for processor in postprocessors:

566

processed_nodes = processor.postprocess_nodes(processed_nodes)

567

568

print(f"Final processed nodes: {len(processed_nodes)}")

569

for node in processed_nodes:

570

print(f"Final result: {node.text}")

571

```

572

573

## Configuration & Types

574

575

```python { .api }

576

# Postprocessor modes and configurations

577

class PostprocessorMode(str, Enum):

578

SIMILARITY = "similarity"

579

KEYWORD = "keyword"

580

LLM_RERANK = "llm_rerank"

581

RECENCY = "recency"

582

PII_REMOVAL = "pii_removal"

583

584

# Default configuration values

585

DEFAULT_SIMILARITY_CUTOFF = 0.7

586

DEFAULT_TOP_N = 10

587

DEFAULT_BATCH_SIZE = 10

588

DEFAULT_PII_TEMPLATE = "[PII_REMOVED]"

589

DEFAULT_DATE_KEY = "date"

590

```