Tessl Tile for pypi/chromadb@1.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

clients.md collections.md configuration.md documents.md embedding-functions.md index.md queries.md

queries.mddocs/

0
# Queries and Filtering
1

2
ChromaDB provides powerful query capabilities including vector similarity search, metadata filtering, and document text matching. The query system supports complex logical operations and flexible result formatting.
3

4
## Capabilities
5

6
### Vector Similarity Search
7

8
Find documents similar to query text, embeddings, or images using vector similarity metrics.
9

10
```python { .api }
11
def query(
12
    query_texts: Optional[Documents] = None,
13
    query_embeddings: Optional[Embeddings] = None,
14
    query_images: Optional[Images] = None,
15
    query_uris: Optional[URIs] = None,
16
    ids: Optional[IDs] = None,
17
    n_results: int = 10,
18
    where: Optional[Where] = None,
19
    where_document: Optional[WhereDocument] = None,
20
    include: Include = ["metadatas", "documents", "distances"]
21
) -> QueryResult:
22
    """
23
    Query the collection for similar documents using vector similarity.
24
    
25
    Args:
26
        query_texts: Text queries (will be embedded automatically)
27
        query_embeddings: Pre-computed embedding vectors
28
        query_images: Image arrays for similarity search
29
        query_uris: URIs to load and search with
30
        ids: Restrict search to specific document IDs
31
        n_results: Number of most similar results to return
32
        where: Metadata filter conditions
33
        where_document: Document text filter conditions
34
        include: Fields to include in results
35
        
36
    Returns:
37
        QueryResult: Search results with similarity scores and requested fields
38
    """
39
```
40

41
**Usage Examples:**
42

43
```python
44
import chromadb
45

46
client = chromadb.EphemeralClient()
47
collection = client.get_collection("my_documents")
48

49
# Text-based similarity search
50
results = collection.query(
51
    query_texts=["machine learning algorithms"],
52
    n_results=5,
53
    include=["documents", "metadatas", "distances"]
54
)
55

56
# Multi-query search
57
results = collection.query(
58
    query_texts=["deep learning", "neural networks", "artificial intelligence"],
59
    n_results=3  # 3 results per query
60
)
61

62
# Search with pre-computed embeddings
63
custom_embedding = [0.1, 0.2, 0.3, ...]  # Your embedding vector
64
results = collection.query(
65
    query_embeddings=[custom_embedding],
66
    n_results=10
67
)
68
```
69

70
### Metadata Filtering
71

72
Filter documents based on metadata values using logical operators and comparison functions.
73

74
```python { .api }
75
# Where filter type definition
76
Where = Dict[Union[str, LogicalOperator], Union[LiteralValue, OperatorExpression, List[Where]]]
77

78
# Logical operators
79
LogicalOperator = Literal["$and", "$or"]
80

81
# Comparison operators  
82
OperatorExpression = Dict[ComparisonOperator, Any]
83
ComparisonOperator = Literal["$eq", "$ne", "$gt", "$gte", "$lt", "$lte", "$in", "$nin"]
84

85
# Literal values
86
LiteralValue = Union[str, int, float, bool]
87
```
88

89
**Usage Examples:**
90

91
```python
92
# Simple equality filter
93
results = collection.query(
94
    query_texts=["search term"],
95
    where={"category": "science"}
96
)
97

98
# Comparison operators
99
results = collection.query(
100
    query_texts=["search term"],
101
    where={"year": {"$gte": 2020}}  # Documents from 2020 or later
102
)
103

104
# Multiple conditions with $and (default)
105
results = collection.query(
106
    query_texts=["search term"],
107
    where={"category": "science", "year": {"$gte": 2020}}
108
)
109

110
# Explicit $and operator
111
results = collection.query(
112
    query_texts=["search term"],
113
    where={"$and": [
114
        {"category": "science"},
115
        {"year": {"$gte": 2020}}
116
    ]}
117
)
118

119
# $or operator
120
results = collection.query(
121
    query_texts=["search term"],
122
    where={"$or": [
123
        {"category": "science"},
124
        {"category": "technology"}
125
    ]}
126
)
127

128
# $in operator for multiple values
129
results = collection.query(
130
    query_texts=["search term"],
131
    where={"category": {"$in": ["science", "technology", "engineering"]}}
132
)
133

134
# Complex nested conditions
135
results = collection.query(
136
    query_texts=["search term"],
137
    where={
138
        "$and": [
139
            {"year": {"$gte": 2020}},
140
            {"$or": [
141
                {"category": "science"},
142
                {"category": "technology"}
143
            ]},
144
            {"priority": {"$in": ["high", "critical"]}}
145
        ]
146
    }
147
)
148
```
149

150
### Document Text Filtering
151

152
Filter documents based on their text content using substring matching.
153

154
```python { .api }
155
# WhereDocument filter type definition
156
WhereDocument = Dict[WhereDocumentOperator, Union[str, List[WhereDocument]]]
157

158
# Document text operators
159
WhereDocumentOperator = Literal["$contains", "$not_contains"]
160
```
161

162
**Usage Examples:**
163

164
```python
165
# Documents containing specific text
166
results = collection.query(
167
    query_texts=["search term"],
168
    where_document={"$contains": "machine learning"}
169
)
170

171
# Documents not containing specific text
172
results = collection.query(
173
    query_texts=["search term"],
174
    where_document={"$not_contains": "deprecated"}
175
)
176

177
# Complex document filtering (not supported - use simple contains/not_contains)
178
# For complex text search, retrieve documents and filter programmatically
179
```
180

181
### Result Field Selection
182

183
Control which fields are included in query results to optimize performance and reduce data transfer.
184

185
```python { .api }
186
# Include field specification
187
Include = List[IncludeField]
188
IncludeField = Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]
189
```
190

191
**Usage Examples:**
192

193
```python
194
# Include only documents and distances
195
results = collection.query(
196
    query_texts=["search term"],
197
    include=["documents", "distances"]
198
)
199

200
# Include all available fields
201
results = collection.query(
202
    query_texts=["search term"],
203
    include=["documents", "embeddings", "metadatas", "distances", "uris", "data"]
204
)
205

206
# Minimal result for performance
207
results = collection.query(
208
    query_texts=["search term"],
209
    include=["documents"]  # Only document text
210
)
211

212
print(f"Query returned {len(results['ids'][0])} results")
213
print(f"Available fields: {list(results.keys())}")
214
```
215

216
### Combined Filtering
217

218
Combine metadata filtering, document text filtering, and vector similarity for precise document retrieval.
219

220
**Usage Examples:**
221

222
```python
223
# Comprehensive search with all filter types
224
results = collection.query(
225
    query_texts=["machine learning research"],
226
    n_results=10,
227
    where={
228
        "$and": [
229
            {"category": "research"},
230
            {"year": {"$gte": 2020}},
231
            {"citations": {"$gt": 100}}
232
        ]
233
    },
234
    where_document={"$contains": "neural network"},
235
    include=["documents", "metadatas", "distances"]
236
)
237

238
# Process results
239
for i, (doc, metadata, distance) in enumerate(zip(
240
    results['documents'][0],
241
    results['metadatas'][0], 
242
    results['distances'][0]
243
)):
244
    print(f"Result {i+1} (similarity: {1-distance:.3f}):")
245
    print(f"  Title: {metadata.get('title', 'Unknown')}")
246
    print(f"  Year: {metadata.get('year', 'Unknown')}")
247
    print(f"  Excerpt: {doc[:200]}...")
248
    print()
249
```
250

251
### Search Result Processing
252

253
Query results contain lists of matching documents with associated data and similarity scores.
254

255
```python { .api }
256
QueryResult = TypedDict('QueryResult', {
257
    'ids': List[List[str]],              # Document IDs per query
258
    'documents': List[List[Optional[str]]], # Document text per query
259
    'metadatas': List[List[Optional[Dict]]], # Metadata per query
260
    'embeddings': List[List[Optional[List[float]]]], # Embeddings per query
261
    'distances': List[List[float]],       # Similarity distances per query
262
    'uris': List[List[Optional[str]]],    # URIs per query
263
    'data': List[List[Optional[Any]]],    # Additional data per query
264
    'included': List[str]                 # Fields included in results
265
})
266
```
267

268
**Processing Examples:**
269

270
```python
271
results = collection.query(
272
    query_texts=["first query", "second query"],  # Multiple queries
273
    n_results=3
274
)
275

276
# Process results for each query
277
for query_idx, query_text in enumerate(["first query", "second query"]):
278
    print(f"Results for query '{query_text}':")
279
    
280
    query_ids = results['ids'][query_idx]
281
    query_docs = results['documents'][query_idx] 
282
    query_distances = results['distances'][query_idx]
283
    
284
    for doc_idx, (doc_id, doc_text, distance) in enumerate(zip(
285
        query_ids, query_docs, query_distances
286
    )):
287
        similarity_score = 1 - distance  # Convert distance to similarity
288
        print(f"  {doc_idx+1}. {doc_id} (similarity: {similarity_score:.3f})")
289
        print(f"     {doc_text[:100]}...")
290
    print()
291
```
292

293
### Distance Metrics
294

295
ChromaDB supports different distance metrics for vector similarity calculations.
296

297
```python { .api }
298
# Distance metric specification
299
Space = Literal["cosine", "l2", "ip"]
300

301
# cosine: Cosine distance (1 - cosine_similarity)
302
# l2: Euclidean (L2) distance  
303
# ip: Inner product (negative for similarity)
304
```
305

306
Distance metrics are configured per collection through embedding functions and cannot be changed during queries.
307

308
## Types
309

310
```python { .api }
311
from typing import Dict, List, Optional, Union, Any, Literal, TypedDict
312

313
# Query input types
314
Documents = List[str]
315
Embeddings = List[List[float]]
316
Images = List[Any]  # Image arrays
317
URIs = List[str]
318
IDs = List[str]
319

320
# Filter types
321
Where = Dict[Union[str, Literal["$and", "$or"]], Union[
322
    str, int, float, bool,  # Literal values
323
    Dict[Literal["$eq", "$ne", "$gt", "$gte", "$lt", "$lte", "$in", "$nin"], Any],  # Operators
324
    List["Where"]  # Nested conditions
325
]]
326

327
WhereDocument = Dict[Literal["$contains", "$not_contains"], Union[str, List["WhereDocument"]]]
328

329
# Result field selection
330
Include = List[Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]]
331

332
# Result types
333
QueryResult = TypedDict('QueryResult', {
334
    'ids': List[List[str]],
335
    'documents': List[List[Optional[str]]],
336
    'metadatas': List[List[Optional[Dict[str, Any]]]],
337
    'embeddings': List[List[Optional[List[float]]]],
338
    'distances': List[List[float]],
339
    'uris': List[List[Optional[str]]],
340
    'data': List[List[Optional[Any]]],
341
    'included': List[str]
342
})
343

344
GetResult = TypedDict('GetResult', {
345
    'ids': List[str],
346
    'documents': List[Optional[str]],
347
    'metadatas': List[Optional[Dict[str, Any]]],
348
    'embeddings': List[Optional[List[float]]],
349
    'uris': List[Optional[str]],
350
    'data': List[Optional[Any]],
351
    'included': List[str]
352
})
353
```

Version

Tile

Files

queries.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

queries.mddocs/