or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

collection-management.mdconstruction.mddocument-management.mdindex.mdmmr.mdsearch-operations.md

search-operations.mddocs/

0

# Search Operations

1

2

Comprehensive search functionality for finding similar documents in the vector store. Supports text queries, vector queries, image queries, metadata filtering, and relevance scoring.

3

4

## Capabilities

5

6

### Text-Based Similarity Search

7

8

Search for documents similar to a text query using the configured embedding function.

9

10

```python { .api }

11

def similarity_search(

12

query: str,

13

k: int = 4,

14

filter: Optional[dict[str, str]] = None,

15

**kwargs: Any

16

) -> list[Document]:

17

"""

18

Find documents most similar to the query text.

19

20

Parameters:

21

- query: Text query to search for

22

- k: Number of results to return (default: 4)

23

- filter: Metadata filter dictionary (e.g., {"category": "tech"})

24

- **kwargs: Additional arguments passed to ChromaDB query

25

26

Returns:

27

List of Document objects most similar to the query

28

"""

29

30

def similarity_search_with_score(

31

query: str,

32

k: int = 4,

33

filter: Optional[dict[str, str]] = None,

34

where_document: Optional[dict[str, str]] = None,

35

**kwargs: Any

36

) -> list[tuple[Document, float]]:

37

"""

38

Find documents similar to query text with similarity scores.

39

40

Parameters:

41

- query: Text query to search for

42

- k: Number of results to return (default: 4)

43

- filter: Metadata filter dictionary

44

- where_document: Document content filter (e.g., {"$contains": "python"})

45

- **kwargs: Additional arguments passed to ChromaDB query

46

47

Returns:

48

List of tuples containing (Document, similarity_score)

49

Lower scores indicate higher similarity

50

"""

51

```

52

53

**Usage Example:**

54

```python

55

# Basic similarity search

56

results = vector_store.similarity_search("machine learning", k=3)

57

for doc in results:

58

print(f"Content: {doc.page_content}")

59

60

# Search with score and filtering

61

results_with_scores = vector_store.similarity_search_with_score(

62

query="python programming",

63

k=5,

64

filter={"category": "tech"},

65

where_document={"$contains": "code"}

66

)

67

for doc, score in results_with_scores:

68

print(f"Score: {score:.3f}, Content: {doc.page_content}")

69

```

70

71

### Vector-Based Search

72

73

Search using pre-computed embedding vectors instead of text queries.

74

75

```python { .api }

76

def similarity_search_by_vector(

77

embedding: list[float],

78

k: int = 4,

79

filter: Optional[dict[str, str]] = None,

80

where_document: Optional[dict[str, str]] = None,

81

**kwargs: Any

82

) -> list[Document]:

83

"""

84

Find documents most similar to the provided embedding vector.

85

86

Parameters:

87

- embedding: Pre-computed embedding vector

88

- k: Number of results to return (default: 4)

89

- filter: Metadata filter dictionary

90

- where_document: Document content filter

91

- **kwargs: Additional arguments passed to ChromaDB query

92

93

Returns:

94

List of Document objects most similar to the embedding

95

"""

96

97

def similarity_search_by_vector_with_relevance_scores(

98

embedding: list[float],

99

k: int = 4,

100

filter: Optional[dict[str, str]] = None,

101

where_document: Optional[dict[str, str]] = None,

102

**kwargs: Any

103

) -> list[tuple[Document, float]]:

104

"""

105

Find documents similar to embedding vector with relevance scores.

106

107

Parameters:

108

- embedding: Pre-computed embedding vector

109

- k: Number of results to return (default: 4)

110

- filter: Metadata filter dictionary

111

- where_document: Document content filter

112

- **kwargs: Additional arguments passed to ChromaDB query

113

114

Returns:

115

List of tuples containing (Document, relevance_score)

116

Lower scores indicate higher similarity

117

"""

118

```

119

120

**Usage Example:**

121

```python

122

# Search by pre-computed vector

123

from langchain_openai import OpenAIEmbeddings

124

125

embeddings = OpenAIEmbeddings()

126

query_vector = embeddings.embed_query("artificial intelligence")

127

128

results = vector_store.similarity_search_by_vector(query_vector, k=3)

129

for doc in results:

130

print(f"Content: {doc.page_content}")

131

132

# Search with relevance scores

133

results_with_scores = vector_store.similarity_search_by_vector_with_relevance_scores(

134

embedding=query_vector,

135

k=5,

136

filter={"domain": "AI"}

137

)

138

```

139

140

### Search with Vector Embeddings

141

142

Search that returns both documents and their corresponding embedding vectors.

143

144

```python { .api }

145

def similarity_search_with_vectors(

146

query: str,

147

k: int = 4,

148

filter: Optional[dict[str, str]] = None,

149

where_document: Optional[dict[str, str]] = None,

150

**kwargs: Any

151

) -> list[tuple[Document, np.ndarray]]:

152

"""

153

Search for similar documents and return their embedding vectors.

154

155

Parameters:

156

- query: Text query to search for

157

- k: Number of results to return (default: 4)

158

- filter: Metadata filter dictionary

159

- where_document: Document content filter

160

- **kwargs: Additional arguments passed to ChromaDB query

161

162

Returns:

163

List of tuples containing (Document, embedding_vector)

164

"""

165

```

166

167

**Usage Example:**

168

```python

169

import numpy as np

170

171

# Search with vectors for further processing

172

results_with_vectors = vector_store.similarity_search_with_vectors(

173

query="data science",

174

k=3

175

)

176

for doc, vector in results_with_vectors:

177

print(f"Content: {doc.page_content}")

178

print(f"Vector shape: {vector.shape}")

179

```

180

181

### Image-Based Search

182

183

Search for similar documents using image queries. Requires an embedding function that supports image embeddings.

184

185

```python { .api }

186

def similarity_search_by_image(

187

uri: str,

188

k: int = 4,

189

filter: Optional[dict[str, str]] = None,

190

**kwargs: Any

191

) -> list[Document]:

192

"""

193

Search for documents similar to the provided image.

194

195

Parameters:

196

- uri: File path to the query image

197

- k: Number of results to return (default: 4)

198

- filter: Metadata filter dictionary

199

- **kwargs: Additional arguments passed to ChromaDB query

200

201

Returns:

202

List of Document objects most similar to the query image

203

204

Raises:

205

ValueError: If embedding function doesn't support image embeddings

206

"""

207

208

def similarity_search_by_image_with_relevance_score(

209

uri: str,

210

k: int = 4,

211

filter: Optional[dict[str, str]] = None,

212

**kwargs: Any

213

) -> list[tuple[Document, float]]:

214

"""

215

Search for documents similar to image with relevance scores.

216

217

Parameters:

218

- uri: File path to the query image

219

- k: Number of results to return (default: 4)

220

- filter: Metadata filter dictionary

221

- **kwargs: Additional arguments passed to ChromaDB query

222

223

Returns:

224

List of tuples containing (Document, relevance_score)

225

226

Raises:

227

ValueError: If embedding function doesn't support image embeddings

228

"""

229

```

230

231

**Usage Example:**

232

```python

233

# Search by image (requires multimodal embedding function)

234

image_results = vector_store.similarity_search_by_image(

235

uri="/path/to/query_image.jpg",

236

k=5,

237

filter={"type": "visual"}

238

)

239

240

# Image search with scores

241

image_results_with_scores = vector_store.similarity_search_by_image_with_relevance_score(

242

uri="/path/to/query_image.jpg",

243

k=3

244

)

245

for doc, score in image_results_with_scores:

246

print(f"Score: {score:.3f}, Metadata: {doc.metadata}")

247

```

248

249

## Relevance Score Functions

250

251

The Chroma class automatically selects relevance score functions based on the collection's distance metric configuration.

252

253

### Available Distance Metrics

254

255

- **Cosine**: Cosine similarity (space: "cosine")

256

- **Euclidean**: L2 distance (space: "l2")

257

- **Inner Product**: Maximum inner product (space: "ip")

258

259

**Usage Example:**

260

```python

261

# Configure distance metric during initialization

262

from chromadb.api import CreateCollectionConfiguration

263

264

vector_store = Chroma(

265

collection_name="my_collection",

266

embedding_function=embeddings,

267

collection_configuration=CreateCollectionConfiguration({

268

"hnsw": {"space": "cosine"}

269

})

270

)

271

```

272

273

## Advanced Filtering

274

275

### Metadata Filtering

276

277

Filter results based on document metadata using dictionary conditions.

278

279

```python

280

# Simple equality filter

281

filter = {"category": "science", "year": "2023"}

282

283

# Complex filters (ChromaDB-specific syntax)

284

filter = {

285

"$and": [

286

{"category": "science"},

287

{"year": {"$gte": "2020"}}

288

]

289

}

290

```

291

292

### Document Content Filtering

293

294

Filter based on the actual document content using ChromaDB's where_document parameter.

295

296

```python

297

# Content contains specific text

298

where_document = {"$contains": "machine learning"}

299

300

# Content matches pattern

301

where_document = {"$regex": "^Python.*tutorial$"}

302

```