Tessl Tile for pypi/langchain-chroma@0.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

collection-management.md construction.md document-management.md index.md mmr.md search-operations.md

construction.mddocs/

0
# Vector Store Construction
1

2
Class methods and utilities for creating Chroma vector store instances from various data sources and configurations. Provides convenient factory methods for common initialization patterns.
3

4
## Capabilities
5

6
### Creating from Text Lists
7

8
Factory method to create a Chroma instance and populate it with a list of texts in a single operation.
9

10
```python { .api }
11
@classmethod
12
def from_texts(
13
    cls: type[Chroma],
14
    texts: list[str],
15
    embedding: Optional[Embeddings] = None,
16
    metadatas: Optional[list[dict]] = None,
17
    ids: Optional[list[str]] = None,
18
    collection_name: str = "langchain",
19
    persist_directory: Optional[str] = None,
20
    host: Optional[str] = None,
21
    port: Optional[int] = None,
22
    headers: Optional[dict[str, str]] = None,
23
    chroma_cloud_api_key: Optional[str] = None,
24
    tenant: Optional[str] = None,
25
    database: Optional[str] = None,
26
    client_settings: Optional[chromadb.config.Settings] = None,
27
    client: Optional[chromadb.ClientAPI] = None,
28
    collection_metadata: Optional[dict] = None,
29
    collection_configuration: Optional[CreateCollectionConfiguration] = None,
30
    *,
31
    ssl: bool = False,
32
    **kwargs: Any,
33
) -> Chroma:
34
    """
35
    Create a Chroma vector store from a list of texts.
36
    
37
    Creates the vector store instance and adds all provided texts in batch operations
38
    for efficient initialization.
39
    
40
    Parameters:
41
    - texts: List of text strings to add to the vector store
42
    - embedding: Embedding function for vectorizing texts
43
    - metadatas: Optional list of metadata dictionaries for each text
44
    - ids: Optional list of custom IDs (UUIDs generated if not provided)
45
    - collection_name: Name for the new collection (default: "langchain")
46
    - persist_directory: Directory to persist the collection
47
    - host: Hostname of deployed Chroma server
48
    - port: Connection port for Chroma server (default: 8000)
49
    - ssl: Whether to use SSL connection (default: False)
50
    - headers: HTTP headers for Chroma server
51
    - chroma_cloud_api_key: API key for Chroma Cloud
52
    - tenant: Tenant ID for Chroma Cloud
53
    - database: Database name for Chroma Cloud
54
    - client_settings: Custom ChromaDB client settings
55
    - client: Pre-configured ChromaDB client
56
    - collection_metadata: Metadata for the collection
57
    - collection_configuration: Index configuration for the collection
58
    - **kwargs: Additional arguments for Chroma client initialization
59
    
60
    Returns:
61
    Chroma instance populated with the provided texts
62
    """
63
```
64

65
**Usage Example:**
66
```python
67
from langchain_chroma import Chroma
68
from langchain_openai import OpenAIEmbeddings
69

70
# Basic usage with texts
71
texts = [
72
    "The quick brown fox jumps over the lazy dog",
73
    "Python is a powerful programming language",
74
    "Machine learning is transforming technology"
75
]
76

77
vector_store = Chroma.from_texts(
78
    texts=texts,
79
    embedding=OpenAIEmbeddings(),
80
    collection_name="my_documents"
81
)
82

83
# With metadata and persistence
84
texts = ["Document 1", "Document 2", "Document 3"]
85
metadatas = [
86
    {"source": "file1.txt", "author": "Alice"},
87
    {"source": "file2.txt", "author": "Bob"},
88
    {"source": "file3.txt", "author": "Charlie"}
89
]
90
ids = ["doc_1", "doc_2", "doc_3"]
91

92
persistent_store = Chroma.from_texts(
93
    texts=texts,
94
    embedding=OpenAIEmbeddings(),
95
    metadatas=metadatas,
96
    ids=ids,
97
    collection_name="persistent_docs",
98
    persist_directory="./chroma_db"
99
)
100

101
# With Chroma Cloud
102
cloud_store = Chroma.from_texts(
103
    texts=texts,
104
    embedding=OpenAIEmbeddings(),
105
    collection_name="cloud_collection",
106
    chroma_cloud_api_key="your-api-key",
107
    tenant="your-tenant",
108
    database="your-database"
109
)
110
```
111

112
### Creating from Document Objects
113

114
Factory method to create a Chroma instance from LangChain Document objects.
115

116
```python { .api }
117
@classmethod
118
def from_documents(
119
    cls: type[Chroma],
120
    documents: list[Document],
121
    embedding: Optional[Embeddings] = None,
122
    ids: Optional[list[str]] = None,
123
    collection_name: str = "langchain",
124
    persist_directory: Optional[str] = None,
125
    host: Optional[str] = None,
126
    port: Optional[int] = None,
127
    headers: Optional[dict[str, str]] = None,
128
    chroma_cloud_api_key: Optional[str] = None,
129
    tenant: Optional[str] = None,
130
    database: Optional[str] = None,
131
    client_settings: Optional[chromadb.config.Settings] = None,
132
    client: Optional[chromadb.ClientAPI] = None,
133
    collection_metadata: Optional[dict] = None,
134
    collection_configuration: Optional[CreateCollectionConfiguration] = None,
135
    *,
136
    ssl: bool = False,
137
    **kwargs: Any,
138
) -> Chroma:
139
    """
140
    Create a Chroma vector store from a list of Document objects.
141
    
142
    Extracts text content and metadata from Document objects and creates
143
    a vector store with efficient batch operations.
144
    
145
    Parameters:
146
    - documents: List of Document objects to add to the vector store
147
    - embedding: Embedding function for vectorizing document content
148
    - ids: Optional list of custom IDs (uses document.id or generates UUIDs)
149
    - collection_name: Name for the new collection (default: "langchain")
150
    - persist_directory: Directory to persist the collection
151
    - host: Hostname of deployed Chroma server
152
    - port: Connection port (default: 8000)
153
    - ssl: Whether to use SSL connection (default: False)
154
    - headers: HTTP headers for server connection
155
    - chroma_cloud_api_key: API key for Chroma Cloud
156
    - tenant: Tenant ID for Chroma Cloud  
157
    - database: Database name for Chroma Cloud
158
    - client_settings: Custom ChromaDB client settings
159
    - client: Pre-configured ChromaDB client
160
    - collection_metadata: Metadata for the collection
161
    - collection_configuration: Index configuration
162
    - **kwargs: Additional client initialization arguments
163
    
164
    Returns:
165
    Chroma instance populated with the provided documents
166
    """
167
```
168

169
**Usage Example:**
170
```python
171
from langchain_core.documents import Document
172
from langchain_chroma import Chroma
173
from langchain_openai import OpenAIEmbeddings
174

175
# Create documents
176
documents = [
177
    Document(
178
        page_content="First document content",
179
        metadata={"source": "doc1", "category": "general"},
180
        id="custom_id_1"
181
    ),
182
    Document(
183
        page_content="Second document content", 
184
        metadata={"source": "doc2", "category": "technical"}
185
    ),
186
    Document(
187
        page_content="Third document content",
188
        metadata={"source": "doc3", "category": "general"}
189
    )
190
]
191

192
# Create vector store from documents
193
vector_store = Chroma.from_documents(
194
    documents=documents,
195
    embedding=OpenAIEmbeddings(),
196
    collection_name="document_collection",
197
    persist_directory="./my_vector_db"
198
)
199

200
# With custom configuration
201
from chromadb.api import CreateCollectionConfiguration
202

203
configured_store = Chroma.from_documents(
204
    documents=documents,
205
    embedding=OpenAIEmbeddings(),
206
    collection_name="configured_collection",
207
    collection_configuration=CreateCollectionConfiguration({
208
        "hnsw": {"space": "cosine", "M": 16}
209
    }),
210
    collection_metadata={"version": "1.0", "description": "My documents"}
211
)
212
```
213

214
### Image Encoding Utility
215

216
Static utility method for encoding images to base64 strings for storage or processing.
217

218
```python { .api }
219
@staticmethod
220
def encode_image(uri: str) -> str:
221
    """
222
    Encode an image file to a base64 string.
223
    
224
    Utility function for preparing images for storage in the vector store
225
    or for processing with multimodal embedding functions.
226
    
227
    Parameters:
228
    - uri: File path to the image file
229
    
230
    Returns:
231
    Base64 encoded string representation of the image
232
    
233
    Raises:
234
    FileNotFoundError: If the image file doesn't exist
235
    IOError: If the file cannot be read
236
    """
237
```
238

239
**Usage Example:**
240
```python
241
# Encode image for storage or processing
242
image_path = "/path/to/image.jpg"
243
encoded_image = Chroma.encode_image(image_path)
244

245
# Use encoded image with documents
246
image_document = Document(
247
    page_content=encoded_image,
248
    metadata={"type": "image", "format": "jpg", "source": image_path}
249
)
250

251
# Add to vector store (requires multimodal embeddings)
252
vector_store.add_documents([image_document])
253
```
254

255
## Configuration Options
256

257
### Client Types and Configuration
258

259
Different ChromaDB client configurations for various deployment scenarios.
260

261
**In-Memory Client (Default):**
262
```python
263
vector_store = Chroma.from_texts(
264
    texts=texts,
265
    embedding=embeddings,
266
    collection_name="memory_collection"
267
)
268
```
269

270
**Persistent Client:**
271
```python
272
vector_store = Chroma.from_texts(
273
    texts=texts,
274
    embedding=embeddings,
275
    collection_name="persistent_collection",
276
    persist_directory="/path/to/chroma/db"
277
)
278
```
279

280
**HTTP Client (Remote Server):**
281
```python
282
vector_store = Chroma.from_texts(
283
    texts=texts,
284
    embedding=embeddings,
285
    collection_name="remote_collection",
286
    host="chroma-server.example.com",
287
    port=8000,
288
    ssl=True,
289
    headers={"Authorization": "Bearer token"}
290
)
291
```
292

293
**Chroma Cloud Client:**
294
```python
295
vector_store = Chroma.from_texts(
296
    texts=texts,
297
    embedding=embeddings,
298
    collection_name="cloud_collection",
299
    chroma_cloud_api_key="your-api-key",
300
    tenant="your-tenant",
301
    database="your-database"
302
)
303
```
304

305
### Collection Configuration
306

307
Advanced collection settings for performance and behavior tuning.
308

309
```python
310
from chromadb.api import CreateCollectionConfiguration
311

312
# HNSW index configuration
313
hnsw_config = CreateCollectionConfiguration({
314
    "hnsw": {
315
        "space": "cosine",  # cosine, l2, or ip
316
        "M": 16,            # Number of bi-directional links
317
        "ef_construction": 200,  # Size of dynamic candidate list
318
        "max_elements": 10000    # Maximum number of elements
319
    }
320
})
321

322
vector_store = Chroma.from_texts(
323
    texts=texts,
324
    embedding=embeddings,
325
    collection_configuration=hnsw_config
326
)
327
```
328

329
### Batch Processing
330

331
Factory methods automatically handle batch processing for large datasets.
332

333
```python
334
# Large dataset - automatically batched
335
large_texts = ["Text {}".format(i) for i in range(10000)]
336
large_metadatas = [{"index": i} for i in range(10000)]
337

338
# Efficiently processes in batches
339
vector_store = Chroma.from_texts(
340
    texts=large_texts,
341
    metadatas=large_metadatas,
342
    embedding=embeddings,
343
    collection_name="large_collection"
344
)
345
```
346

347
## Error Handling
348

349
Construction methods include error handling for common failure scenarios.
350

351
```python
352
try:
353
    vector_store = Chroma.from_texts(
354
        texts=texts,
355
        embedding=embeddings,
356
        persist_directory="/invalid/path"
357
    )
358
except ValueError as e:
359
    print(f"Configuration error: {e}")
360
except Exception as e:
361
    print(f"Unexpected error during construction: {e}")
362

363
# Validate before construction
364
if texts and embeddings:
365
    vector_store = Chroma.from_texts(texts=texts, embedding=embeddings)
366
else:
367
    print("Missing required texts or embeddings")
368
```

Version

Tile

Files

construction.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

construction.mddocs/