or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

collection-management.mdconstruction.mddocument-management.mdindex.mdmmr.mdsearch-operations.md

mmr.mddocs/

0

# Maximum Marginal Relevance

1

2

Advanced search algorithms that optimize for both similarity to the query and diversity among results. MMR reduces redundancy by balancing relevance and diversity, making it ideal for generating varied search results.

3

4

## Capabilities

5

6

### Text-Based MMR Search

7

8

Perform maximum marginal relevance search using text queries to find diverse, relevant results.

9

10

```python { .api }

11

def max_marginal_relevance_search(

12

query: str,

13

k: int = 4,

14

fetch_k: int = 20,

15

lambda_mult: float = 0.5,

16

filter: Optional[dict[str, str]] = None,

17

where_document: Optional[dict[str, str]] = None,

18

**kwargs: Any

19

) -> list[Document]:

20

"""

21

Return documents selected using maximal marginal relevance.

22

23

Optimizes for similarity to query AND diversity among selected documents.

24

25

Parameters:

26

- query: Text query to search for

27

- k: Number of documents to return (default: 4)

28

- fetch_k: Number of documents to fetch for MMR algorithm (default: 20)

29

- lambda_mult: Diversity parameter (0-1):

30

0 = maximum diversity, 1 = minimum diversity (default: 0.5)

31

- filter: Metadata filter dictionary

32

- where_document: Document content filter

33

- **kwargs: Additional arguments passed to ChromaDB query

34

35

Returns:

36

List of Document objects selected by maximal marginal relevance

37

38

Raises:

39

ValueError: If embedding function is not provided

40

"""

41

```

42

43

**Usage Example:**

44

```python

45

# Basic MMR search with balanced diversity

46

results = vector_store.max_marginal_relevance_search(

47

query="machine learning algorithms",

48

k=5, # Return 5 diverse results

49

fetch_k=50, # Consider 50 candidates

50

lambda_mult=0.5 # Balanced relevance/diversity

51

)

52

53

# High diversity search

54

diverse_results = vector_store.max_marginal_relevance_search(

55

query="python programming",

56

k=10,

57

fetch_k=100,

58

lambda_mult=0.2, # Prioritize diversity

59

filter={"category": "tutorial"}

60

)

61

62

# High relevance search

63

relevant_results = vector_store.max_marginal_relevance_search(

64

query="deep learning",

65

k=5,

66

lambda_mult=0.8, # Prioritize relevance

67

where_document={"$contains": "neural network"}

68

)

69

```

70

71

### Vector-Based MMR Search

72

73

Perform MMR search using pre-computed embedding vectors instead of text queries.

74

75

```python { .api }

76

def max_marginal_relevance_search_by_vector(

77

embedding: list[float],

78

k: int = 4,

79

fetch_k: int = 20,

80

lambda_mult: float = 0.5,

81

filter: Optional[dict[str, str]] = None,

82

where_document: Optional[dict[str, str]] = None,

83

**kwargs: Any

84

) -> list[Document]:

85

"""

86

Return documents selected using MMR with a pre-computed embedding vector.

87

88

Parameters:

89

- embedding: Pre-computed embedding vector to search with

90

- k: Number of documents to return (default: 4)

91

- fetch_k: Number of documents to fetch for MMR algorithm (default: 20)

92

- lambda_mult: Diversity parameter (0-1, default: 0.5)

93

- filter: Metadata filter dictionary

94

- where_document: Document content filter

95

- **kwargs: Additional arguments passed to ChromaDB query

96

97

Returns:

98

List of Document objects selected by maximal marginal relevance

99

"""

100

```

101

102

**Usage Example:**

103

```python

104

# MMR search with pre-computed vector

105

from langchain_openai import OpenAIEmbeddings

106

107

embeddings = OpenAIEmbeddings()

108

query_vector = embeddings.embed_query("artificial intelligence research")

109

110

results = vector_store.max_marginal_relevance_search_by_vector(

111

embedding=query_vector,

112

k=8,

113

fetch_k=40,

114

lambda_mult=0.3, # Favor diversity

115

filter={"domain": "research"}

116

)

117

118

for doc in results:

119

print(f"Content: {doc.page_content[:100]}...")

120

print(f"Metadata: {doc.metadata}")

121

```

122

123

## MMR Algorithm Details

124

125

### How MMR Works

126

127

1. **Initial Retrieval**: Fetch `fetch_k` most similar documents to the query

128

2. **Iterative Selection**:

129

- Select the most similar document first

130

- For each subsequent selection, balance:

131

- Similarity to query (weighted by `lambda_mult`)

132

- Dissimilarity to already selected documents (weighted by `1 - lambda_mult`)

133

3. **Result**: Return `k` documents that are both relevant and diverse

134

135

### Lambda Multiplier Parameter

136

137

The `lambda_mult` parameter controls the trade-off between relevance and diversity:

138

139

- **λ = 1.0**: Pure relevance (equivalent to regular similarity search)

140

- **λ = 0.8**: High relevance, some diversity

141

- **λ = 0.5**: Balanced relevance and diversity (default)

142

- **λ = 0.2**: High diversity, some relevance

143

- **λ = 0.0**: Pure diversity (maximum dissimilarity among results)

144

145

### Recommended Parameters

146

147

**For balanced results (default):**

148

```python

149

k=4, fetch_k=20, lambda_mult=0.5

150

```

151

152

**For high diversity (research, exploration):**

153

```python

154

k=10, fetch_k=50, lambda_mult=0.2

155

```

156

157

**For high relevance (focused search):**

158

```python

159

k=5, fetch_k=15, lambda_mult=0.8

160

```

161

162

**For large result sets:**

163

```python

164

k=20, fetch_k=100, lambda_mult=0.4

165

```

166

167

## Integration with LangChain Retrievers

168

169

MMR search integrates seamlessly with LangChain's retriever interface for RAG applications.

170

171

**Usage Example:**

172

```python

173

# Create MMR retriever

174

retriever = vector_store.as_retriever(

175

search_type="mmr",

176

search_kwargs={

177

"k": 6,

178

"fetch_k": 30,

179

"lambda_mult": 0.4

180

}

181

)

182

183

# Use in RAG chain

184

from langchain.chains import RetrievalQA

185

from langchain_openai import ChatOpenAI

186

187

qa_chain = RetrievalQA.from_chain_type(

188

llm=ChatOpenAI(),

189

retriever=retriever,

190

chain_type="stuff"

191

)

192

193

result = qa_chain.invoke({"query": "What are different machine learning approaches?"})

194

```

195

196

## Use Cases

197

198

### Research and Exploration

199

- **High diversity** (λ=0.2-0.4): Discover varied perspectives on a topic

200

- **Large fetch_k**: Consider many candidates for maximum diversity

201

202

### Question Answering

203

- **Balanced approach** (λ=0.4-0.6): Relevant but non-redundant context

204

- **Moderate k**: 5-10 documents for comprehensive coverage

205

206

### Content Recommendation

207

- **Moderate diversity** (λ=0.3-0.5): Similar but varied recommendations

208

- **User preference filtering**: Combine with metadata filters

209

210

### Document Summarization

211

- **Lower diversity** (λ=0.6-0.8): Focus on most relevant content

212

- **Higher k**: More documents for comprehensive coverage