0
# Queries and Filtering
1
2
ChromaDB provides powerful query capabilities including vector similarity search, metadata filtering, and document text matching. The query system supports complex logical operations and flexible result formatting.
3
4
## Capabilities
5
6
### Vector Similarity Search
7
8
Find documents similar to query text, embeddings, or images using vector similarity metrics.
9
10
```python { .api }
11
def query(
12
query_texts: Optional[Documents] = None,
13
query_embeddings: Optional[Embeddings] = None,
14
query_images: Optional[Images] = None,
15
query_uris: Optional[URIs] = None,
16
ids: Optional[IDs] = None,
17
n_results: int = 10,
18
where: Optional[Where] = None,
19
where_document: Optional[WhereDocument] = None,
20
include: Include = ["metadatas", "documents", "distances"]
21
) -> QueryResult:
22
"""
23
Query the collection for similar documents using vector similarity.
24
25
Args:
26
query_texts: Text queries (will be embedded automatically)
27
query_embeddings: Pre-computed embedding vectors
28
query_images: Image arrays for similarity search
29
query_uris: URIs to load and search with
30
ids: Restrict search to specific document IDs
31
n_results: Number of most similar results to return
32
where: Metadata filter conditions
33
where_document: Document text filter conditions
34
include: Fields to include in results
35
36
Returns:
37
QueryResult: Search results with similarity scores and requested fields
38
"""
39
```
40
41
**Usage Examples:**
42
43
```python
44
import chromadb
45
46
client = chromadb.EphemeralClient()
47
collection = client.get_collection("my_documents")
48
49
# Text-based similarity search
50
results = collection.query(
51
query_texts=["machine learning algorithms"],
52
n_results=5,
53
include=["documents", "metadatas", "distances"]
54
)
55
56
# Multi-query search
57
results = collection.query(
58
query_texts=["deep learning", "neural networks", "artificial intelligence"],
59
n_results=3 # 3 results per query
60
)
61
62
# Search with pre-computed embeddings
63
custom_embedding = [0.1, 0.2, 0.3, ...] # Your embedding vector
64
results = collection.query(
65
query_embeddings=[custom_embedding],
66
n_results=10
67
)
68
```
69
70
### Metadata Filtering
71
72
Filter documents based on metadata values using logical operators and comparison functions.
73
74
```python { .api }
75
# Where filter type definition
76
Where = Dict[Union[str, LogicalOperator], Union[LiteralValue, OperatorExpression, List[Where]]]
77
78
# Logical operators
79
LogicalOperator = Literal["$and", "$or"]
80
81
# Comparison operators
82
OperatorExpression = Dict[ComparisonOperator, Any]
83
ComparisonOperator = Literal["$eq", "$ne", "$gt", "$gte", "$lt", "$lte", "$in", "$nin"]
84
85
# Literal values
86
LiteralValue = Union[str, int, float, bool]
87
```
88
89
**Usage Examples:**
90
91
```python
92
# Simple equality filter
93
results = collection.query(
94
query_texts=["search term"],
95
where={"category": "science"}
96
)
97
98
# Comparison operators
99
results = collection.query(
100
query_texts=["search term"],
101
where={"year": {"$gte": 2020}} # Documents from 2020 or later
102
)
103
104
# Multiple conditions with $and (default)
105
results = collection.query(
106
query_texts=["search term"],
107
where={"category": "science", "year": {"$gte": 2020}}
108
)
109
110
# Explicit $and operator
111
results = collection.query(
112
query_texts=["search term"],
113
where={"$and": [
114
{"category": "science"},
115
{"year": {"$gte": 2020}}
116
]}
117
)
118
119
# $or operator
120
results = collection.query(
121
query_texts=["search term"],
122
where={"$or": [
123
{"category": "science"},
124
{"category": "technology"}
125
]}
126
)
127
128
# $in operator for multiple values
129
results = collection.query(
130
query_texts=["search term"],
131
where={"category": {"$in": ["science", "technology", "engineering"]}}
132
)
133
134
# Complex nested conditions
135
results = collection.query(
136
query_texts=["search term"],
137
where={
138
"$and": [
139
{"year": {"$gte": 2020}},
140
{"$or": [
141
{"category": "science"},
142
{"category": "technology"}
143
]},
144
{"priority": {"$in": ["high", "critical"]}}
145
]
146
}
147
)
148
```
149
150
### Document Text Filtering
151
152
Filter documents based on their text content using substring matching.
153
154
```python { .api }
155
# WhereDocument filter type definition
156
WhereDocument = Dict[WhereDocumentOperator, Union[str, List[WhereDocument]]]
157
158
# Document text operators
159
WhereDocumentOperator = Literal["$contains", "$not_contains"]
160
```
161
162
**Usage Examples:**
163
164
```python
165
# Documents containing specific text
166
results = collection.query(
167
query_texts=["search term"],
168
where_document={"$contains": "machine learning"}
169
)
170
171
# Documents not containing specific text
172
results = collection.query(
173
query_texts=["search term"],
174
where_document={"$not_contains": "deprecated"}
175
)
176
177
# Complex document filtering (not supported - use simple contains/not_contains)
178
# For complex text search, retrieve documents and filter programmatically
179
```
180
181
### Result Field Selection
182
183
Control which fields are included in query results to optimize performance and reduce data transfer.
184
185
```python { .api }
186
# Include field specification
187
Include = List[IncludeField]
188
IncludeField = Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]
189
```
190
191
**Usage Examples:**
192
193
```python
194
# Include only documents and distances
195
results = collection.query(
196
query_texts=["search term"],
197
include=["documents", "distances"]
198
)
199
200
# Include all available fields
201
results = collection.query(
202
query_texts=["search term"],
203
include=["documents", "embeddings", "metadatas", "distances", "uris", "data"]
204
)
205
206
# Minimal result for performance
207
results = collection.query(
208
query_texts=["search term"],
209
include=["documents"] # Only document text
210
)
211
212
print(f"Query returned {len(results['ids'][0])} results")
213
print(f"Available fields: {list(results.keys())}")
214
```
215
216
### Combined Filtering
217
218
Combine metadata filtering, document text filtering, and vector similarity for precise document retrieval.
219
220
**Usage Examples:**
221
222
```python
223
# Comprehensive search with all filter types
224
results = collection.query(
225
query_texts=["machine learning research"],
226
n_results=10,
227
where={
228
"$and": [
229
{"category": "research"},
230
{"year": {"$gte": 2020}},
231
{"citations": {"$gt": 100}}
232
]
233
},
234
where_document={"$contains": "neural network"},
235
include=["documents", "metadatas", "distances"]
236
)
237
238
# Process results
239
for i, (doc, metadata, distance) in enumerate(zip(
240
results['documents'][0],
241
results['metadatas'][0],
242
results['distances'][0]
243
)):
244
print(f"Result {i+1} (similarity: {1-distance:.3f}):")
245
print(f" Title: {metadata.get('title', 'Unknown')}")
246
print(f" Year: {metadata.get('year', 'Unknown')}")
247
print(f" Excerpt: {doc[:200]}...")
248
print()
249
```
250
251
### Search Result Processing
252
253
Query results contain lists of matching documents with associated data and similarity scores.
254
255
```python { .api }
256
QueryResult = TypedDict('QueryResult', {
257
'ids': List[List[str]], # Document IDs per query
258
'documents': List[List[Optional[str]]], # Document text per query
259
'metadatas': List[List[Optional[Dict]]], # Metadata per query
260
'embeddings': List[List[Optional[List[float]]]], # Embeddings per query
261
'distances': List[List[float]], # Similarity distances per query
262
'uris': List[List[Optional[str]]], # URIs per query
263
'data': List[List[Optional[Any]]], # Additional data per query
264
'included': List[str] # Fields included in results
265
})
266
```
267
268
**Processing Examples:**
269
270
```python
271
results = collection.query(
272
query_texts=["first query", "second query"], # Multiple queries
273
n_results=3
274
)
275
276
# Process results for each query
277
for query_idx, query_text in enumerate(["first query", "second query"]):
278
print(f"Results for query '{query_text}':")
279
280
query_ids = results['ids'][query_idx]
281
query_docs = results['documents'][query_idx]
282
query_distances = results['distances'][query_idx]
283
284
for doc_idx, (doc_id, doc_text, distance) in enumerate(zip(
285
query_ids, query_docs, query_distances
286
)):
287
similarity_score = 1 - distance # Convert distance to similarity
288
print(f" {doc_idx+1}. {doc_id} (similarity: {similarity_score:.3f})")
289
print(f" {doc_text[:100]}...")
290
print()
291
```
292
293
### Distance Metrics
294
295
ChromaDB supports different distance metrics for vector similarity calculations.
296
297
```python { .api }
298
# Distance metric specification
299
Space = Literal["cosine", "l2", "ip"]
300
301
# cosine: Cosine distance (1 - cosine_similarity)
302
# l2: Euclidean (L2) distance
303
# ip: Inner product (negative for similarity)
304
```
305
306
Distance metrics are configured per collection through embedding functions and cannot be changed during queries.
307
308
## Types
309
310
```python { .api }
311
from typing import Dict, List, Optional, Union, Any, Literal, TypedDict
312
313
# Query input types
314
Documents = List[str]
315
Embeddings = List[List[float]]
316
Images = List[Any] # Image arrays
317
URIs = List[str]
318
IDs = List[str]
319
320
# Filter types
321
Where = Dict[Union[str, Literal["$and", "$or"]], Union[
322
str, int, float, bool, # Literal values
323
Dict[Literal["$eq", "$ne", "$gt", "$gte", "$lt", "$lte", "$in", "$nin"], Any], # Operators
324
List["Where"] # Nested conditions
325
]]
326
327
WhereDocument = Dict[Literal["$contains", "$not_contains"], Union[str, List["WhereDocument"]]]
328
329
# Result field selection
330
Include = List[Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]]
331
332
# Result types
333
QueryResult = TypedDict('QueryResult', {
334
'ids': List[List[str]],
335
'documents': List[List[Optional[str]]],
336
'metadatas': List[List[Optional[Dict[str, Any]]]],
337
'embeddings': List[List[Optional[List[float]]]],
338
'distances': List[List[float]],
339
'uris': List[List[Optional[str]]],
340
'data': List[List[Optional[Any]]],
341
'included': List[str]
342
})
343
344
GetResult = TypedDict('GetResult', {
345
'ids': List[str],
346
'documents': List[Optional[str]],
347
'metadatas': List[Optional[Dict[str, Any]]],
348
'embeddings': List[Optional[List[float]]],
349
'uris': List[Optional[str]],
350
'data': List[Optional[Any]],
351
'included': List[str]
352
})
353
```