or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

clients.mdcollections.mdconfiguration.mddocuments.mdembedding-functions.mdindex.mdqueries.md

queries.mddocs/

0

# Queries and Filtering

1

2

ChromaDB provides powerful query capabilities including vector similarity search, metadata filtering, and document text matching. The query system supports complex logical operations and flexible result formatting.

3

4

## Capabilities

5

6

### Vector Similarity Search

7

8

Find documents similar to query text, embeddings, or images using vector similarity metrics.

9

10

```python { .api }

11

def query(

12

query_texts: Optional[Documents] = None,

13

query_embeddings: Optional[Embeddings] = None,

14

query_images: Optional[Images] = None,

15

query_uris: Optional[URIs] = None,

16

ids: Optional[IDs] = None,

17

n_results: int = 10,

18

where: Optional[Where] = None,

19

where_document: Optional[WhereDocument] = None,

20

include: Include = ["metadatas", "documents", "distances"]

21

) -> QueryResult:

22

"""

23

Query the collection for similar documents using vector similarity.

24

25

Args:

26

query_texts: Text queries (will be embedded automatically)

27

query_embeddings: Pre-computed embedding vectors

28

query_images: Image arrays for similarity search

29

query_uris: URIs to load and search with

30

ids: Restrict search to specific document IDs

31

n_results: Number of most similar results to return

32

where: Metadata filter conditions

33

where_document: Document text filter conditions

34

include: Fields to include in results

35

36

Returns:

37

QueryResult: Search results with similarity scores and requested fields

38

"""

39

```

40

41

**Usage Examples:**

42

43

```python

44

import chromadb

45

46

client = chromadb.EphemeralClient()

47

collection = client.get_collection("my_documents")

48

49

# Text-based similarity search

50

results = collection.query(

51

query_texts=["machine learning algorithms"],

52

n_results=5,

53

include=["documents", "metadatas", "distances"]

54

)

55

56

# Multi-query search

57

results = collection.query(

58

query_texts=["deep learning", "neural networks", "artificial intelligence"],

59

n_results=3 # 3 results per query

60

)

61

62

# Search with pre-computed embeddings

63

custom_embedding = [0.1, 0.2, 0.3, ...] # Your embedding vector

64

results = collection.query(

65

query_embeddings=[custom_embedding],

66

n_results=10

67

)

68

```

69

70

### Metadata Filtering

71

72

Filter documents based on metadata values using logical operators and comparison functions.

73

74

```python { .api }

75

# Where filter type definition

76

Where = Dict[Union[str, LogicalOperator], Union[LiteralValue, OperatorExpression, List[Where]]]

77

78

# Logical operators

79

LogicalOperator = Literal["$and", "$or"]

80

81

# Comparison operators

82

OperatorExpression = Dict[ComparisonOperator, Any]

83

ComparisonOperator = Literal["$eq", "$ne", "$gt", "$gte", "$lt", "$lte", "$in", "$nin"]

84

85

# Literal values

86

LiteralValue = Union[str, int, float, bool]

87

```

88

89

**Usage Examples:**

90

91

```python

92

# Simple equality filter

93

results = collection.query(

94

query_texts=["search term"],

95

where={"category": "science"}

96

)

97

98

# Comparison operators

99

results = collection.query(

100

query_texts=["search term"],

101

where={"year": {"$gte": 2020}} # Documents from 2020 or later

102

)

103

104

# Multiple conditions with $and (default)

105

results = collection.query(

106

query_texts=["search term"],

107

where={"category": "science", "year": {"$gte": 2020}}

108

)

109

110

# Explicit $and operator

111

results = collection.query(

112

query_texts=["search term"],

113

where={"$and": [

114

{"category": "science"},

115

{"year": {"$gte": 2020}}

116

]}

117

)

118

119

# $or operator

120

results = collection.query(

121

query_texts=["search term"],

122

where={"$or": [

123

{"category": "science"},

124

{"category": "technology"}

125

]}

126

)

127

128

# $in operator for multiple values

129

results = collection.query(

130

query_texts=["search term"],

131

where={"category": {"$in": ["science", "technology", "engineering"]}}

132

)

133

134

# Complex nested conditions

135

results = collection.query(

136

query_texts=["search term"],

137

where={

138

"$and": [

139

{"year": {"$gte": 2020}},

140

{"$or": [

141

{"category": "science"},

142

{"category": "technology"}

143

]},

144

{"priority": {"$in": ["high", "critical"]}}

145

]

146

}

147

)

148

```

149

150

### Document Text Filtering

151

152

Filter documents based on their text content using substring matching.

153

154

```python { .api }

155

# WhereDocument filter type definition

156

WhereDocument = Dict[WhereDocumentOperator, Union[str, List[WhereDocument]]]

157

158

# Document text operators

159

WhereDocumentOperator = Literal["$contains", "$not_contains"]

160

```

161

162

**Usage Examples:**

163

164

```python

165

# Documents containing specific text

166

results = collection.query(

167

query_texts=["search term"],

168

where_document={"$contains": "machine learning"}

169

)

170

171

# Documents not containing specific text

172

results = collection.query(

173

query_texts=["search term"],

174

where_document={"$not_contains": "deprecated"}

175

)

176

177

# Complex document filtering (not supported - use simple contains/not_contains)

178

# For complex text search, retrieve documents and filter programmatically

179

```

180

181

### Result Field Selection

182

183

Control which fields are included in query results to optimize performance and reduce data transfer.

184

185

```python { .api }

186

# Include field specification

187

Include = List[IncludeField]

188

IncludeField = Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]

189

```

190

191

**Usage Examples:**

192

193

```python

194

# Include only documents and distances

195

results = collection.query(

196

query_texts=["search term"],

197

include=["documents", "distances"]

198

)

199

200

# Include all available fields

201

results = collection.query(

202

query_texts=["search term"],

203

include=["documents", "embeddings", "metadatas", "distances", "uris", "data"]

204

)

205

206

# Minimal result for performance

207

results = collection.query(

208

query_texts=["search term"],

209

include=["documents"] # Only document text

210

)

211

212

print(f"Query returned {len(results['ids'][0])} results")

213

print(f"Available fields: {list(results.keys())}")

214

```

215

216

### Combined Filtering

217

218

Combine metadata filtering, document text filtering, and vector similarity for precise document retrieval.

219

220

**Usage Examples:**

221

222

```python

223

# Comprehensive search with all filter types

224

results = collection.query(

225

query_texts=["machine learning research"],

226

n_results=10,

227

where={

228

"$and": [

229

{"category": "research"},

230

{"year": {"$gte": 2020}},

231

{"citations": {"$gt": 100}}

232

]

233

},

234

where_document={"$contains": "neural network"},

235

include=["documents", "metadatas", "distances"]

236

)

237

238

# Process results

239

for i, (doc, metadata, distance) in enumerate(zip(

240

results['documents'][0],

241

results['metadatas'][0],

242

results['distances'][0]

243

)):

244

print(f"Result {i+1} (similarity: {1-distance:.3f}):")

245

print(f" Title: {metadata.get('title', 'Unknown')}")

246

print(f" Year: {metadata.get('year', 'Unknown')}")

247

print(f" Excerpt: {doc[:200]}...")

248

print()

249

```

250

251

### Search Result Processing

252

253

Query results contain lists of matching documents with associated data and similarity scores.

254

255

```python { .api }

256

QueryResult = TypedDict('QueryResult', {

257

'ids': List[List[str]], # Document IDs per query

258

'documents': List[List[Optional[str]]], # Document text per query

259

'metadatas': List[List[Optional[Dict]]], # Metadata per query

260

'embeddings': List[List[Optional[List[float]]]], # Embeddings per query

261

'distances': List[List[float]], # Similarity distances per query

262

'uris': List[List[Optional[str]]], # URIs per query

263

'data': List[List[Optional[Any]]], # Additional data per query

264

'included': List[str] # Fields included in results

265

})

266

```

267

268

**Processing Examples:**

269

270

```python

271

results = collection.query(

272

query_texts=["first query", "second query"], # Multiple queries

273

n_results=3

274

)

275

276

# Process results for each query

277

for query_idx, query_text in enumerate(["first query", "second query"]):

278

print(f"Results for query '{query_text}':")

279

280

query_ids = results['ids'][query_idx]

281

query_docs = results['documents'][query_idx]

282

query_distances = results['distances'][query_idx]

283

284

for doc_idx, (doc_id, doc_text, distance) in enumerate(zip(

285

query_ids, query_docs, query_distances

286

)):

287

similarity_score = 1 - distance # Convert distance to similarity

288

print(f" {doc_idx+1}. {doc_id} (similarity: {similarity_score:.3f})")

289

print(f" {doc_text[:100]}...")

290

print()

291

```

292

293

### Distance Metrics

294

295

ChromaDB supports different distance metrics for vector similarity calculations.

296

297

```python { .api }

298

# Distance metric specification

299

Space = Literal["cosine", "l2", "ip"]

300

301

# cosine: Cosine distance (1 - cosine_similarity)

302

# l2: Euclidean (L2) distance

303

# ip: Inner product (negative for similarity)

304

```

305

306

Distance metrics are configured per collection through embedding functions and cannot be changed during queries.

307

308

## Types

309

310

```python { .api }

311

from typing import Dict, List, Optional, Union, Any, Literal, TypedDict

312

313

# Query input types

314

Documents = List[str]

315

Embeddings = List[List[float]]

316

Images = List[Any] # Image arrays

317

URIs = List[str]

318

IDs = List[str]

319

320

# Filter types

321

Where = Dict[Union[str, Literal["$and", "$or"]], Union[

322

str, int, float, bool, # Literal values

323

Dict[Literal["$eq", "$ne", "$gt", "$gte", "$lt", "$lte", "$in", "$nin"], Any], # Operators

324

List["Where"] # Nested conditions

325

]]

326

327

WhereDocument = Dict[Literal["$contains", "$not_contains"], Union[str, List["WhereDocument"]]]

328

329

# Result field selection

330

Include = List[Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]]

331

332

# Result types

333

QueryResult = TypedDict('QueryResult', {

334

'ids': List[List[str]],

335

'documents': List[List[Optional[str]]],

336

'metadatas': List[List[Optional[Dict[str, Any]]]],

337

'embeddings': List[List[Optional[List[float]]]],

338

'distances': List[List[float]],

339

'uris': List[List[Optional[str]]],

340

'data': List[List[Optional[Any]]],

341

'included': List[str]

342

})

343

344

GetResult = TypedDict('GetResult', {

345

'ids': List[str],

346

'documents': List[Optional[str]],

347

'metadatas': List[Optional[Dict[str, Any]]],

348

'embeddings': List[Optional[List[float]]],

349

'uris': List[Optional[str]],

350

'data': List[Optional[Any]],

351

'included': List[str]

352

})

353

```