0
# Search Operations
1
2
Comprehensive search functionality for finding similar documents in the vector store. Supports text queries, vector queries, image queries, metadata filtering, and relevance scoring.
3
4
## Capabilities
5
6
### Text-Based Similarity Search
7
8
Search for documents similar to a text query using the configured embedding function.
9
10
```python { .api }
11
def similarity_search(
12
query: str,
13
k: int = 4,
14
filter: Optional[dict[str, str]] = None,
15
**kwargs: Any
16
) -> list[Document]:
17
"""
18
Find documents most similar to the query text.
19
20
Parameters:
21
- query: Text query to search for
22
- k: Number of results to return (default: 4)
23
- filter: Metadata filter dictionary (e.g., {"category": "tech"})
24
- **kwargs: Additional arguments passed to ChromaDB query
25
26
Returns:
27
List of Document objects most similar to the query
28
"""
29
30
def similarity_search_with_score(
31
query: str,
32
k: int = 4,
33
filter: Optional[dict[str, str]] = None,
34
where_document: Optional[dict[str, str]] = None,
35
**kwargs: Any
36
) -> list[tuple[Document, float]]:
37
"""
38
Find documents similar to query text with similarity scores.
39
40
Parameters:
41
- query: Text query to search for
42
- k: Number of results to return (default: 4)
43
- filter: Metadata filter dictionary
44
- where_document: Document content filter (e.g., {"$contains": "python"})
45
- **kwargs: Additional arguments passed to ChromaDB query
46
47
Returns:
48
List of tuples containing (Document, similarity_score)
49
Lower scores indicate higher similarity
50
"""
51
```
52
53
**Usage Example:**
54
```python
55
# Basic similarity search
56
results = vector_store.similarity_search("machine learning", k=3)
57
for doc in results:
58
print(f"Content: {doc.page_content}")
59
60
# Search with score and filtering
61
results_with_scores = vector_store.similarity_search_with_score(
62
query="python programming",
63
k=5,
64
filter={"category": "tech"},
65
where_document={"$contains": "code"}
66
)
67
for doc, score in results_with_scores:
68
print(f"Score: {score:.3f}, Content: {doc.page_content}")
69
```
70
71
### Vector-Based Search
72
73
Search using pre-computed embedding vectors instead of text queries.
74
75
```python { .api }
76
def similarity_search_by_vector(
77
embedding: list[float],
78
k: int = 4,
79
filter: Optional[dict[str, str]] = None,
80
where_document: Optional[dict[str, str]] = None,
81
**kwargs: Any
82
) -> list[Document]:
83
"""
84
Find documents most similar to the provided embedding vector.
85
86
Parameters:
87
- embedding: Pre-computed embedding vector
88
- k: Number of results to return (default: 4)
89
- filter: Metadata filter dictionary
90
- where_document: Document content filter
91
- **kwargs: Additional arguments passed to ChromaDB query
92
93
Returns:
94
List of Document objects most similar to the embedding
95
"""
96
97
def similarity_search_by_vector_with_relevance_scores(
98
embedding: list[float],
99
k: int = 4,
100
filter: Optional[dict[str, str]] = None,
101
where_document: Optional[dict[str, str]] = None,
102
**kwargs: Any
103
) -> list[tuple[Document, float]]:
104
"""
105
Find documents similar to embedding vector with relevance scores.
106
107
Parameters:
108
- embedding: Pre-computed embedding vector
109
- k: Number of results to return (default: 4)
110
- filter: Metadata filter dictionary
111
- where_document: Document content filter
112
- **kwargs: Additional arguments passed to ChromaDB query
113
114
Returns:
115
List of tuples containing (Document, relevance_score)
116
Lower scores indicate higher similarity
117
"""
118
```
119
120
**Usage Example:**
121
```python
122
# Search by pre-computed vector
123
from langchain_openai import OpenAIEmbeddings
124
125
embeddings = OpenAIEmbeddings()
126
query_vector = embeddings.embed_query("artificial intelligence")
127
128
results = vector_store.similarity_search_by_vector(query_vector, k=3)
129
for doc in results:
130
print(f"Content: {doc.page_content}")
131
132
# Search with relevance scores
133
results_with_scores = vector_store.similarity_search_by_vector_with_relevance_scores(
134
embedding=query_vector,
135
k=5,
136
filter={"domain": "AI"}
137
)
138
```
139
140
### Search with Vector Embeddings
141
142
Search that returns both documents and their corresponding embedding vectors.
143
144
```python { .api }
145
def similarity_search_with_vectors(
146
query: str,
147
k: int = 4,
148
filter: Optional[dict[str, str]] = None,
149
where_document: Optional[dict[str, str]] = None,
150
**kwargs: Any
151
) -> list[tuple[Document, np.ndarray]]:
152
"""
153
Search for similar documents and return their embedding vectors.
154
155
Parameters:
156
- query: Text query to search for
157
- k: Number of results to return (default: 4)
158
- filter: Metadata filter dictionary
159
- where_document: Document content filter
160
- **kwargs: Additional arguments passed to ChromaDB query
161
162
Returns:
163
List of tuples containing (Document, embedding_vector)
164
"""
165
```
166
167
**Usage Example:**
168
```python
169
import numpy as np
170
171
# Search with vectors for further processing
172
results_with_vectors = vector_store.similarity_search_with_vectors(
173
query="data science",
174
k=3
175
)
176
for doc, vector in results_with_vectors:
177
print(f"Content: {doc.page_content}")
178
print(f"Vector shape: {vector.shape}")
179
```
180
181
### Image-Based Search
182
183
Search for similar documents using image queries. Requires an embedding function that supports image embeddings.
184
185
```python { .api }
186
def similarity_search_by_image(
187
uri: str,
188
k: int = 4,
189
filter: Optional[dict[str, str]] = None,
190
**kwargs: Any
191
) -> list[Document]:
192
"""
193
Search for documents similar to the provided image.
194
195
Parameters:
196
- uri: File path to the query image
197
- k: Number of results to return (default: 4)
198
- filter: Metadata filter dictionary
199
- **kwargs: Additional arguments passed to ChromaDB query
200
201
Returns:
202
List of Document objects most similar to the query image
203
204
Raises:
205
ValueError: If embedding function doesn't support image embeddings
206
"""
207
208
def similarity_search_by_image_with_relevance_score(
209
uri: str,
210
k: int = 4,
211
filter: Optional[dict[str, str]] = None,
212
**kwargs: Any
213
) -> list[tuple[Document, float]]:
214
"""
215
Search for documents similar to image with relevance scores.
216
217
Parameters:
218
- uri: File path to the query image
219
- k: Number of results to return (default: 4)
220
- filter: Metadata filter dictionary
221
- **kwargs: Additional arguments passed to ChromaDB query
222
223
Returns:
224
List of tuples containing (Document, relevance_score)
225
226
Raises:
227
ValueError: If embedding function doesn't support image embeddings
228
"""
229
```
230
231
**Usage Example:**
232
```python
233
# Search by image (requires multimodal embedding function)
234
image_results = vector_store.similarity_search_by_image(
235
uri="/path/to/query_image.jpg",
236
k=5,
237
filter={"type": "visual"}
238
)
239
240
# Image search with scores
241
image_results_with_scores = vector_store.similarity_search_by_image_with_relevance_score(
242
uri="/path/to/query_image.jpg",
243
k=3
244
)
245
for doc, score in image_results_with_scores:
246
print(f"Score: {score:.3f}, Metadata: {doc.metadata}")
247
```
248
249
## Relevance Score Functions
250
251
The Chroma class automatically selects relevance score functions based on the collection's distance metric configuration.
252
253
### Available Distance Metrics
254
255
- **Cosine**: Cosine similarity (space: "cosine")
256
- **Euclidean**: L2 distance (space: "l2")
257
- **Inner Product**: Maximum inner product (space: "ip")
258
259
**Usage Example:**
260
```python
261
# Configure distance metric during initialization
262
from chromadb.api import CreateCollectionConfiguration
263
264
vector_store = Chroma(
265
collection_name="my_collection",
266
embedding_function=embeddings,
267
collection_configuration=CreateCollectionConfiguration({
268
"hnsw": {"space": "cosine"}
269
})
270
)
271
```
272
273
## Advanced Filtering
274
275
### Metadata Filtering
276
277
Filter results based on document metadata using dictionary conditions.
278
279
```python
280
# Simple equality filter
281
filter = {"category": "science", "year": "2023"}
282
283
# Complex filters (ChromaDB-specific syntax)
284
filter = {
285
"$and": [
286
{"category": "science"},
287
{"year": {"$gte": "2020"}}
288
]
289
}
290
```
291
292
### Document Content Filtering
293
294
Filter based on the actual document content using ChromaDB's where_document parameter.
295
296
```python
297
# Content contains specific text
298
where_document = {"$contains": "machine learning"}
299
300
# Content matches pattern
301
where_document = {"$regex": "^Python.*tutorial$"}
302
```