0
# Rerankers
1
2
Reranking models for scoring query-document pairs to improve retrieval accuracy. Rerankers take a query and a set of candidate documents and assign relevance scores to help identify the most relevant matches.
3
4
## Capabilities
5
6
### FlagReranker (Base Encoder Reranker)
7
8
Standard reranker for encoder-only models. Efficiently scores query-document pairs using cross-encoder architecture for high-accuracy relevance scoring.
9
10
```python { .api }
11
from typing import Union
12
13
class FlagReranker(AbsReranker):
14
def __init__(
15
self,
16
model_name_or_path: str,
17
use_fp16: bool = False,
18
query_instruction_for_rerank: Optional[str] = None,
19
query_instruction_format: str = "{}{}",
20
passage_instruction_for_rerank: Optional[str] = None,
21
passage_instruction_format: str = "{}{}",
22
devices: Optional[Union[str, List[str]]] = None,
23
batch_size: int = 128,
24
query_max_length: Optional[int] = None,
25
max_length: int = 512,
26
normalize: bool = False,
27
trust_remote_code: bool = False,
28
cache_dir: Optional[str] = None,
29
**kwargs
30
):
31
"""
32
Initialize encoder-only reranker.
33
34
Args:
35
model_name_or_path: Path to reranker model
36
use_fp16: Use half precision for inference
37
query_instruction_for_rerank: Instruction prepended to queries
38
query_instruction_format: Format string for query instructions
39
passage_instruction_for_rerank: Instruction prepended to passages
40
passage_instruction_format: Format string for passage instructions
41
devices: List of devices for multi-GPU inference
42
batch_size: Default batch size for scoring
43
query_max_length: Maximum query token length
44
max_length: Maximum total sequence length
45
normalize: Whether to normalize output scores
46
trust_remote_code: Allow custom model code execution
47
cache_dir: Directory for model cache
48
**kwargs: Additional model parameters
49
"""
50
```
51
52
### FlagLLMReranker (Base LLM Reranker)
53
54
Reranker using large language models for sophisticated relevance assessment. Leverages LLM reasoning capabilities for nuanced query-document relevance scoring.
55
56
```python { .api }
57
class FlagLLMReranker(AbsReranker):
58
def __init__(
59
self,
60
model_name_or_path: str,
61
use_fp16: bool = False,
62
query_instruction_for_rerank: Optional[str] = None,
63
query_instruction_format: str = "{}{}",
64
passage_instruction_for_rerank: Optional[str] = None,
65
passage_instruction_format: str = "{}{}",
66
devices: Optional[Union[str, List[str]]] = None,
67
batch_size: int = 128,
68
query_max_length: Optional[int] = None,
69
max_length: int = 512,
70
normalize: bool = False,
71
**kwargs
72
):
73
"""
74
Initialize LLM-based reranker.
75
76
Args:
77
model_name_or_path: Path to LLM reranker model
78
use_fp16: Use half precision for inference
79
query_instruction_for_rerank: Instruction prepended to queries
80
query_instruction_format: Format string for query instructions
81
passage_instruction_for_rerank: Instruction prepended to passages
82
passage_instruction_format: Format string for passage instructions
83
devices: List of devices for multi-GPU inference
84
batch_size: Default batch size for scoring
85
query_max_length: Maximum query token length
86
max_length: Maximum total sequence length
87
normalize: Whether to normalize output scores
88
**kwargs: Additional model parameters
89
"""
90
```
91
92
### LayerWiseFlagLLMReranker (Layer-wise LLM Reranker)
93
94
Specialized LLM reranker that uses layer-wise processing for enhanced efficiency and performance. Optimized for large-scale reranking tasks.
95
96
```python { .api }
97
class LayerWiseFlagLLMReranker(AbsReranker):
98
def __init__(
99
self,
100
model_name_or_path: str,
101
use_fp16: bool = False,
102
query_instruction_for_rerank: Optional[str] = None,
103
query_instruction_format: str = "{}{}",
104
passage_instruction_for_rerank: Optional[str] = None,
105
passage_instruction_format: str = "{}{}",
106
devices: Optional[Union[str, List[str]]] = None,
107
batch_size: int = 128,
108
query_max_length: Optional[int] = None,
109
max_length: int = 512,
110
normalize: bool = False,
111
**kwargs
112
):
113
"""
114
Initialize layer-wise LLM reranker for efficient processing.
115
116
Args:
117
model_name_or_path: Path to layer-wise reranker model
118
use_fp16: Use half precision for inference
119
query_instruction_for_rerank: Instruction prepended to queries
120
query_instruction_format: Format string for query instructions
121
passage_instruction_for_rerank: Instruction prepended to passages
122
passage_instruction_format: Format string for passage instructions
123
devices: List of devices for multi-GPU inference
124
batch_size: Default batch size for scoring
125
query_max_length: Maximum query token length
126
max_length: Maximum total sequence length
127
normalize: Whether to normalize output scores
128
**kwargs: Additional model parameters
129
"""
130
```
131
132
### LightWeightFlagLLMReranker (Lightweight LLM Reranker)
133
134
Optimized lightweight LLM reranker for resource-constrained environments. Provides good reranking performance with reduced computational requirements.
135
136
```python { .api }
137
class LightWeightFlagLLMReranker(AbsReranker):
138
def __init__(
139
self,
140
model_name_or_path: str,
141
use_fp16: bool = False,
142
query_instruction_for_rerank: Optional[str] = None,
143
query_instruction_format: str = "{}{}",
144
passage_instruction_for_rerank: Optional[str] = None,
145
passage_instruction_format: str = "{}{}",
146
devices: Optional[Union[str, List[str]]] = None,
147
batch_size: int = 128,
148
query_max_length: Optional[int] = None,
149
max_length: int = 512,
150
normalize: bool = False,
151
**kwargs
152
):
153
"""
154
Initialize lightweight LLM reranker for efficient processing.
155
156
Args:
157
model_name_or_path: Path to lightweight reranker model
158
use_fp16: Use half precision for inference
159
query_instruction_for_rerank: Instruction prepended to queries
160
query_instruction_format: Format string for query instructions
161
passage_instruction_for_rerank: Instruction prepended to passages
162
passage_instruction_format: Format string for passage instructions
163
devices: List of devices for multi-GPU inference
164
batch_size: Default batch size for scoring
165
query_max_length: Maximum query token length
166
max_length: Maximum total sequence length
167
normalize: Whether to normalize output scores
168
**kwargs: Additional model parameters
169
"""
170
```
171
172
## Usage Examples
173
174
### Basic Reranking
175
176
```python
177
from FlagEmbedding import FlagReranker
178
179
# Initialize reranker
180
reranker = FlagReranker('bge-reranker-base', use_fp16=True)
181
182
# Score query-document pairs
183
query = "What is machine learning?"
184
documents = [
185
"Machine learning is a subset of artificial intelligence",
186
"Cooking recipes for Italian pasta dishes",
187
"ML algorithms learn patterns from data",
188
"Weather forecast for next week"
189
]
190
191
# Create query-document pairs
192
pairs = [(query, doc) for doc in documents]
193
194
# Get relevance scores
195
scores = reranker.compute_score(pairs)
196
197
# Sort documents by relevance
198
ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
199
200
for doc, score in ranked_docs:
201
print(f"Score: {score:.4f} - {doc[:50]}...")
202
```
203
204
### Batch Processing with Custom Instructions
205
206
```python
207
from FlagEmbedding import FlagReranker
208
209
# Initialize with custom instructions
210
reranker = FlagReranker(
211
'bge-reranker-base',
212
query_instruction_for_rerank="Query: ",
213
passage_instruction_for_rerank="Passage: ",
214
query_instruction_format="{}{}",
215
passage_instruction_format="{}{}",
216
use_fp16=True,
217
batch_size=64
218
)
219
220
# Multiple queries
221
queries = [
222
"Python programming tutorials",
223
"Machine learning algorithms",
224
"Data science techniques"
225
]
226
227
documents = [
228
"Learn Python programming from scratch",
229
"Advanced ML algorithms explained",
230
"Data analysis with pandas and numpy",
231
"Web development with Django",
232
"Deep learning neural networks"
233
]
234
235
# Score all query-document combinations
236
all_pairs = [(q, d) for q in queries for d in documents]
237
scores = reranker.compute_score(all_pairs)
238
239
# Reshape scores for analysis
240
import numpy as np
241
score_matrix = np.array(scores).reshape(len(queries), len(documents))
242
243
for i, query in enumerate(queries):
244
print(f"\\nQuery: {query}")
245
query_scores = score_matrix[i]
246
ranked_indices = np.argsort(query_scores)[::-1]
247
248
for j in ranked_indices[:3]: # Top 3 documents
249
print(f" {query_scores[j]:.4f}: {documents[j]}")
250
```
251
252
### LLM Reranker Usage
253
254
```python
255
from FlagEmbedding import FlagLLMReranker
256
257
# Initialize LLM reranker for nuanced scoring
258
reranker = FlagLLMReranker(
259
'bge-reranker-v2-gemma',
260
use_fp16=True,
261
batch_size=32, # Smaller batch for LLM
262
max_length=1024 # Longer context for LLM
263
)
264
265
# Complex query requiring reasoning
266
query = "How can renewable energy help reduce climate change impacts?"
267
268
documents = [
269
"Solar panels convert sunlight to electricity with zero emissions",
270
"Climate change causes rising sea levels and extreme weather",
271
"Wind turbines generate clean energy without carbon footprint",
272
"Fossil fuels are the primary cause of greenhouse gas emissions",
273
"Electric vehicles reduce transportation emissions significantly"
274
]
275
276
pairs = [(query, doc) for doc in documents]
277
scores = reranker.compute_score(pairs)
278
279
# LLM rerankers often provide more nuanced scoring
280
for doc, score in zip(documents, scores):
281
print(f"{score:.4f}: {doc}")
282
```
283
284
### Multi-GPU Reranking
285
286
```python
287
from FlagEmbedding import FlagReranker
288
289
# Use multiple GPUs for large-scale reranking
290
reranker = FlagReranker(
291
'bge-reranker-large',
292
devices=['cuda:0', 'cuda:1', 'cuda:2'],
293
batch_size=256,
294
use_fp16=True
295
)
296
297
# Large-scale reranking scenario
298
query = "artificial intelligence applications"
299
large_document_set = [f"Document {i} about AI applications" for i in range(10000)]
300
301
# Create pairs (this could be memory intensive)
302
pairs = [(query, doc) for doc in large_document_set]
303
304
# Efficient batch processing across GPUs
305
scores = reranker.compute_score(pairs)
306
307
# Get top-k results
308
k = 100
309
top_indices = np.argsort(scores)[-k:][::-1]
310
top_documents = [large_document_set[i] for i in top_indices]
311
top_scores = [scores[i] for i in top_indices]
312
```
313
314
### Lightweight Reranker for Resource Constraints
315
316
```python
317
from FlagEmbedding import LightWeightFlagLLMReranker
318
319
# Use lightweight reranker for efficiency
320
reranker = LightWeightFlagLLMReranker(
321
'bge-reranker-v2.5-gemma2-lightweight',
322
use_fp16=True,
323
batch_size=128,
324
normalize=True # Normalize scores for consistency
325
)
326
327
# Efficient processing with good performance
328
query = "best practices for software development"
329
candidates = [
330
"Code review processes improve software quality",
331
"Unit testing prevents bugs in production",
332
"Agile methodology enhances team collaboration",
333
"Version control systems track code changes"
334
]
335
336
pairs = [(query, candidate) for candidate in candidates]
337
scores = reranker.compute_score(pairs)
338
339
# Normalized scores for easy interpretation
340
for candidate, score in zip(candidates, scores):
341
print(f"Relevance: {score:.3f} - {candidate}")
342
```
343
344
### Layer-wise Processing
345
346
```python
347
from FlagEmbedding import LayerWiseFlagLLMReranker
348
349
# Layer-wise reranker for balanced performance-efficiency
350
reranker = LayerWiseFlagLLMReranker(
351
'bge-reranker-v2-minicpm-layerwise',
352
use_fp16=True,
353
batch_size=64
354
)
355
356
# Particularly effective for medium-scale tasks
357
query = "quantum computing applications"
358
documents = [
359
"Quantum computers solve complex optimization problems",
360
"Classical computers use binary logic gates",
361
"Quantum algorithms leverage superposition and entanglement",
362
"Cryptography applications of quantum computing",
363
"Machine learning acceleration with quantum processors"
364
]
365
366
pairs = [(query, doc) for doc in documents]
367
scores = reranker.compute_score(pairs)
368
369
# Layer-wise processing often provides good relevance ranking
370
sorted_results = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
371
for doc, score in sorted_results:
372
print(f"{score:.4f}: {doc}")
373
```
374
375
## Supported Models
376
377
### Encoder-Only Rerankers
378
- bge-reranker-base (standard cross-encoder)
379
- bge-reranker-large (larger cross-encoder)
380
381
### LLM Rerankers
382
- bge-reranker-v2-m3 (multi-vector reranker)
383
- bge-reranker-v2-gemma (Gemma-based reranker)
384
385
### Specialized LLM Rerankers
386
- bge-reranker-v2-minicpm-layerwise (layer-wise processing)
387
- bge-reranker-v2.5-gemma2-lightweight (lightweight variant)
388
389
## Model Selection Guidelines
390
391
### FlagReranker (Encoder-Only)
392
- **Best for**: Fast, efficient reranking
393
- **Use when**: Need high throughput, have shorter documents
394
- **Pros**: Fast inference, lower memory usage
395
- **Cons**: Limited context understanding
396
397
### FlagLLMReranker (LLM-Based)
398
- **Best for**: Complex reasoning, nuanced relevance
399
- **Use when**: Need sophisticated understanding, longer contexts
400
- **Pros**: Better understanding, contextual reasoning
401
- **Cons**: Slower inference, higher memory usage
402
403
### LayerWiseFlagLLMReranker
404
- **Best for**: Balanced performance-efficiency
405
- **Use when**: Medium-scale tasks, need LLM benefits with efficiency
406
- **Pros**: Good balance of speed and understanding
407
- **Cons**: Model-specific implementation
408
409
### LightWeightFlagLLMReranker
410
- **Best for**: Resource-constrained environments
411
- **Use when**: Limited compute, need reasonable LLM performance
412
- **Pros**: Lower resource usage, still provides LLM benefits
413
- **Cons**: May sacrifice some accuracy for efficiency
414
415
## Types
416
417
```python { .api }
418
from typing import List, Tuple, Optional, Union
419
import numpy as np
420
421
# Core reranking types
422
QueryDocumentPair = Tuple[str, str]
423
RelevanceScore = float
424
BatchPairs = List[QueryDocumentPair]
425
BatchScores = np.ndarray
426
427
# Instruction formatting
428
InstructionFormat = str # Format string with {} placeholders
429
```