Tessl Tile for pypi/langchain-chroma@0.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

collection-management.md construction.md document-management.md index.md mmr.md search-operations.md

search-operations.mddocs/

0
# Search Operations
1

2
Comprehensive search functionality for finding similar documents in the vector store. Supports text queries, vector queries, image queries, metadata filtering, and relevance scoring.
3

4
## Capabilities
5

6
### Text-Based Similarity Search
7

8
Search for documents similar to a text query using the configured embedding function.
9

10
```python { .api }
11
def similarity_search(
12
    query: str, 
13
    k: int = 4, 
14
    filter: Optional[dict[str, str]] = None, 
15
    **kwargs: Any
16
) -> list[Document]:
17
    """
18
    Find documents most similar to the query text.
19
    
20
    Parameters:
21
    - query: Text query to search for
22
    - k: Number of results to return (default: 4)
23
    - filter: Metadata filter dictionary (e.g., {"category": "tech"})
24
    - **kwargs: Additional arguments passed to ChromaDB query
25
    
26
    Returns:
27
    List of Document objects most similar to the query
28
    """
29

30
def similarity_search_with_score(
31
    query: str, 
32
    k: int = 4, 
33
    filter: Optional[dict[str, str]] = None, 
34
    where_document: Optional[dict[str, str]] = None, 
35
    **kwargs: Any
36
) -> list[tuple[Document, float]]:
37
    """
38
    Find documents similar to query text with similarity scores.
39
    
40
    Parameters:
41
    - query: Text query to search for
42
    - k: Number of results to return (default: 4)
43
    - filter: Metadata filter dictionary
44
    - where_document: Document content filter (e.g., {"$contains": "python"})
45
    - **kwargs: Additional arguments passed to ChromaDB query
46
    
47
    Returns:
48
    List of tuples containing (Document, similarity_score)
49
    Lower scores indicate higher similarity
50
    """
51
```
52

53
**Usage Example:**
54
```python
55
# Basic similarity search
56
results = vector_store.similarity_search("machine learning", k=3)
57
for doc in results:
58
    print(f"Content: {doc.page_content}")
59

60
# Search with score and filtering
61
results_with_scores = vector_store.similarity_search_with_score(
62
    query="python programming",
63
    k=5,
64
    filter={"category": "tech"},
65
    where_document={"$contains": "code"}
66
)
67
for doc, score in results_with_scores:
68
    print(f"Score: {score:.3f}, Content: {doc.page_content}")
69
```
70

71
### Vector-Based Search
72

73
Search using pre-computed embedding vectors instead of text queries.
74

75
```python { .api }
76
def similarity_search_by_vector(
77
    embedding: list[float], 
78
    k: int = 4, 
79
    filter: Optional[dict[str, str]] = None, 
80
    where_document: Optional[dict[str, str]] = None, 
81
    **kwargs: Any
82
) -> list[Document]:
83
    """
84
    Find documents most similar to the provided embedding vector.
85
    
86
    Parameters:
87
    - embedding: Pre-computed embedding vector
88
    - k: Number of results to return (default: 4)
89
    - filter: Metadata filter dictionary
90
    - where_document: Document content filter
91
    - **kwargs: Additional arguments passed to ChromaDB query
92
    
93
    Returns:
94
    List of Document objects most similar to the embedding
95
    """
96

97
def similarity_search_by_vector_with_relevance_scores(
98
    embedding: list[float], 
99
    k: int = 4, 
100
    filter: Optional[dict[str, str]] = None, 
101
    where_document: Optional[dict[str, str]] = None, 
102
    **kwargs: Any
103
) -> list[tuple[Document, float]]:
104
    """
105
    Find documents similar to embedding vector with relevance scores.
106
    
107
    Parameters:
108
    - embedding: Pre-computed embedding vector
109
    - k: Number of results to return (default: 4)
110
    - filter: Metadata filter dictionary
111
    - where_document: Document content filter
112
    - **kwargs: Additional arguments passed to ChromaDB query
113
    
114
    Returns:
115
    List of tuples containing (Document, relevance_score)
116
    Lower scores indicate higher similarity
117
    """
118
```
119

120
**Usage Example:**
121
```python
122
# Search by pre-computed vector
123
from langchain_openai import OpenAIEmbeddings
124

125
embeddings = OpenAIEmbeddings()
126
query_vector = embeddings.embed_query("artificial intelligence")
127

128
results = vector_store.similarity_search_by_vector(query_vector, k=3)
129
for doc in results:
130
    print(f"Content: {doc.page_content}")
131

132
# Search with relevance scores
133
results_with_scores = vector_store.similarity_search_by_vector_with_relevance_scores(
134
    embedding=query_vector,
135
    k=5,
136
    filter={"domain": "AI"}
137
)
138
```
139

140
### Search with Vector Embeddings
141

142
Search that returns both documents and their corresponding embedding vectors.
143

144
```python { .api }
145
def similarity_search_with_vectors(
146
    query: str, 
147
    k: int = 4, 
148
    filter: Optional[dict[str, str]] = None, 
149
    where_document: Optional[dict[str, str]] = None, 
150
    **kwargs: Any
151
) -> list[tuple[Document, np.ndarray]]:
152
    """
153
    Search for similar documents and return their embedding vectors.
154
    
155
    Parameters:
156
    - query: Text query to search for
157
    - k: Number of results to return (default: 4)
158
    - filter: Metadata filter dictionary
159
    - where_document: Document content filter
160
    - **kwargs: Additional arguments passed to ChromaDB query
161
    
162
    Returns:
163
    List of tuples containing (Document, embedding_vector)
164
    """
165
```
166

167
**Usage Example:**
168
```python
169
import numpy as np
170

171
# Search with vectors for further processing
172
results_with_vectors = vector_store.similarity_search_with_vectors(
173
    query="data science",
174
    k=3
175
)
176
for doc, vector in results_with_vectors:
177
    print(f"Content: {doc.page_content}")
178
    print(f"Vector shape: {vector.shape}")
179
```
180

181
### Image-Based Search
182

183
Search for similar documents using image queries. Requires an embedding function that supports image embeddings.
184

185
```python { .api }
186
def similarity_search_by_image(
187
    uri: str, 
188
    k: int = 4, 
189
    filter: Optional[dict[str, str]] = None, 
190
    **kwargs: Any
191
) -> list[Document]:
192
    """
193
    Search for documents similar to the provided image.
194
    
195
    Parameters:
196
    - uri: File path to the query image
197
    - k: Number of results to return (default: 4)
198
    - filter: Metadata filter dictionary
199
    - **kwargs: Additional arguments passed to ChromaDB query
200
    
201
    Returns:
202
    List of Document objects most similar to the query image
203
    
204
    Raises:
205
    ValueError: If embedding function doesn't support image embeddings
206
    """
207

208
def similarity_search_by_image_with_relevance_score(
209
    uri: str, 
210
    k: int = 4, 
211
    filter: Optional[dict[str, str]] = None, 
212
    **kwargs: Any
213
) -> list[tuple[Document, float]]:
214
    """
215
    Search for documents similar to image with relevance scores.
216
    
217
    Parameters:
218
    - uri: File path to the query image
219
    - k: Number of results to return (default: 4)
220
    - filter: Metadata filter dictionary
221
    - **kwargs: Additional arguments passed to ChromaDB query
222
    
223
    Returns:
224
    List of tuples containing (Document, relevance_score)
225
    
226
    Raises:
227
    ValueError: If embedding function doesn't support image embeddings
228
    """
229
```
230

231
**Usage Example:**
232
```python
233
# Search by image (requires multimodal embedding function)
234
image_results = vector_store.similarity_search_by_image(
235
    uri="/path/to/query_image.jpg",
236
    k=5,
237
    filter={"type": "visual"}
238
)
239

240
# Image search with scores
241
image_results_with_scores = vector_store.similarity_search_by_image_with_relevance_score(
242
    uri="/path/to/query_image.jpg",
243
    k=3
244
)
245
for doc, score in image_results_with_scores:
246
    print(f"Score: {score:.3f}, Metadata: {doc.metadata}")
247
```
248

249
## Relevance Score Functions
250

251
The Chroma class automatically selects relevance score functions based on the collection's distance metric configuration.
252

253
### Available Distance Metrics
254

255
- **Cosine**: Cosine similarity (space: "cosine")
256
- **Euclidean**: L2 distance (space: "l2") 
257
- **Inner Product**: Maximum inner product (space: "ip")
258

259
**Usage Example:**
260
```python
261
# Configure distance metric during initialization
262
from chromadb.api import CreateCollectionConfiguration
263

264
vector_store = Chroma(
265
    collection_name="my_collection",
266
    embedding_function=embeddings,
267
    collection_configuration=CreateCollectionConfiguration({
268
        "hnsw": {"space": "cosine"}
269
    })
270
)
271
```
272

273
## Advanced Filtering
274

275
### Metadata Filtering
276

277
Filter results based on document metadata using dictionary conditions.
278

279
```python
280
# Simple equality filter
281
filter = {"category": "science", "year": "2023"}
282

283
# Complex filters (ChromaDB-specific syntax)
284
filter = {
285
    "$and": [
286
        {"category": "science"},
287
        {"year": {"$gte": "2020"}}
288
    ]
289
}
290
```
291

292
### Document Content Filtering
293

294
Filter based on the actual document content using ChromaDB's where_document parameter.
295

296
```python
297
# Content contains specific text
298
where_document = {"$contains": "machine learning"}
299

300
# Content matches pattern
301
where_document = {"$regex": "^Python.*tutorial$"}
302
```

Version

Tile

Files

search-operations.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

search-operations.mddocs/