0
# Maximum Marginal Relevance
1
2
Advanced search algorithms that optimize for both similarity to the query and diversity among results. MMR reduces redundancy by balancing relevance and diversity, making it ideal for generating varied search results.
3
4
## Capabilities
5
6
### Text-Based MMR Search
7
8
Perform maximum marginal relevance search using text queries to find diverse, relevant results.
9
10
```python { .api }
11
def max_marginal_relevance_search(
12
query: str,
13
k: int = 4,
14
fetch_k: int = 20,
15
lambda_mult: float = 0.5,
16
filter: Optional[dict[str, str]] = None,
17
where_document: Optional[dict[str, str]] = None,
18
**kwargs: Any
19
) -> list[Document]:
20
"""
21
Return documents selected using maximal marginal relevance.
22
23
Optimizes for similarity to query AND diversity among selected documents.
24
25
Parameters:
26
- query: Text query to search for
27
- k: Number of documents to return (default: 4)
28
- fetch_k: Number of documents to fetch for MMR algorithm (default: 20)
29
- lambda_mult: Diversity parameter (0-1):
30
0 = maximum diversity, 1 = minimum diversity (default: 0.5)
31
- filter: Metadata filter dictionary
32
- where_document: Document content filter
33
- **kwargs: Additional arguments passed to ChromaDB query
34
35
Returns:
36
List of Document objects selected by maximal marginal relevance
37
38
Raises:
39
ValueError: If embedding function is not provided
40
"""
41
```
42
43
**Usage Example:**
44
```python
45
# Basic MMR search with balanced diversity
46
results = vector_store.max_marginal_relevance_search(
47
query="machine learning algorithms",
48
k=5, # Return 5 diverse results
49
fetch_k=50, # Consider 50 candidates
50
lambda_mult=0.5 # Balanced relevance/diversity
51
)
52
53
# High diversity search
54
diverse_results = vector_store.max_marginal_relevance_search(
55
query="python programming",
56
k=10,
57
fetch_k=100,
58
lambda_mult=0.2, # Prioritize diversity
59
filter={"category": "tutorial"}
60
)
61
62
# High relevance search
63
relevant_results = vector_store.max_marginal_relevance_search(
64
query="deep learning",
65
k=5,
66
lambda_mult=0.8, # Prioritize relevance
67
where_document={"$contains": "neural network"}
68
)
69
```
70
71
### Vector-Based MMR Search
72
73
Perform MMR search using pre-computed embedding vectors instead of text queries.
74
75
```python { .api }
76
def max_marginal_relevance_search_by_vector(
77
embedding: list[float],
78
k: int = 4,
79
fetch_k: int = 20,
80
lambda_mult: float = 0.5,
81
filter: Optional[dict[str, str]] = None,
82
where_document: Optional[dict[str, str]] = None,
83
**kwargs: Any
84
) -> list[Document]:
85
"""
86
Return documents selected using MMR with a pre-computed embedding vector.
87
88
Parameters:
89
- embedding: Pre-computed embedding vector to search with
90
- k: Number of documents to return (default: 4)
91
- fetch_k: Number of documents to fetch for MMR algorithm (default: 20)
92
- lambda_mult: Diversity parameter (0-1, default: 0.5)
93
- filter: Metadata filter dictionary
94
- where_document: Document content filter
95
- **kwargs: Additional arguments passed to ChromaDB query
96
97
Returns:
98
List of Document objects selected by maximal marginal relevance
99
"""
100
```
101
102
**Usage Example:**
103
```python
104
# MMR search with pre-computed vector
105
from langchain_openai import OpenAIEmbeddings
106
107
embeddings = OpenAIEmbeddings()
108
query_vector = embeddings.embed_query("artificial intelligence research")
109
110
results = vector_store.max_marginal_relevance_search_by_vector(
111
embedding=query_vector,
112
k=8,
113
fetch_k=40,
114
lambda_mult=0.3, # Favor diversity
115
filter={"domain": "research"}
116
)
117
118
for doc in results:
119
print(f"Content: {doc.page_content[:100]}...")
120
print(f"Metadata: {doc.metadata}")
121
```
122
123
## MMR Algorithm Details
124
125
### How MMR Works
126
127
1. **Initial Retrieval**: Fetch `fetch_k` most similar documents to the query
128
2. **Iterative Selection**:
129
- Select the most similar document first
130
- For each subsequent selection, balance:
131
- Similarity to query (weighted by `lambda_mult`)
132
- Dissimilarity to already selected documents (weighted by `1 - lambda_mult`)
133
3. **Result**: Return `k` documents that are both relevant and diverse
134
135
### Lambda Multiplier Parameter
136
137
The `lambda_mult` parameter controls the trade-off between relevance and diversity:
138
139
- **λ = 1.0**: Pure relevance (equivalent to regular similarity search)
140
- **λ = 0.8**: High relevance, some diversity
141
- **λ = 0.5**: Balanced relevance and diversity (default)
142
- **λ = 0.2**: High diversity, some relevance
143
- **λ = 0.0**: Pure diversity (maximum dissimilarity among results)
144
145
### Recommended Parameters
146
147
**For balanced results (default):**
148
```python
149
k=4, fetch_k=20, lambda_mult=0.5
150
```
151
152
**For high diversity (research, exploration):**
153
```python
154
k=10, fetch_k=50, lambda_mult=0.2
155
```
156
157
**For high relevance (focused search):**
158
```python
159
k=5, fetch_k=15, lambda_mult=0.8
160
```
161
162
**For large result sets:**
163
```python
164
k=20, fetch_k=100, lambda_mult=0.4
165
```
166
167
## Integration with LangChain Retrievers
168
169
MMR search integrates seamlessly with LangChain's retriever interface for RAG applications.
170
171
**Usage Example:**
172
```python
173
# Create MMR retriever
174
retriever = vector_store.as_retriever(
175
search_type="mmr",
176
search_kwargs={
177
"k": 6,
178
"fetch_k": 30,
179
"lambda_mult": 0.4
180
}
181
)
182
183
# Use in RAG chain
184
from langchain.chains import RetrievalQA
185
from langchain_openai import ChatOpenAI
186
187
qa_chain = RetrievalQA.from_chain_type(
188
llm=ChatOpenAI(),
189
retriever=retriever,
190
chain_type="stuff"
191
)
192
193
result = qa_chain.invoke({"query": "What are different machine learning approaches?"})
194
```
195
196
## Use Cases
197
198
### Research and Exploration
199
- **High diversity** (λ=0.2-0.4): Discover varied perspectives on a topic
200
- **Large fetch_k**: Consider many candidates for maximum diversity
201
202
### Question Answering
203
- **Balanced approach** (λ=0.4-0.6): Relevant but non-redundant context
204
- **Moderate k**: 5-10 documents for comprehensive coverage
205
206
### Content Recommendation
207
- **Moderate diversity** (λ=0.3-0.5): Similar but varied recommendations
208
- **User preference filtering**: Combine with metadata filters
209
210
### Document Summarization
211
- **Lower diversity** (λ=0.6-0.8): Focus on most relevant content
212
- **Higher k**: More documents for comprehensive coverage