0
# Vector Store Construction
1
2
Class methods and utilities for creating Chroma vector store instances from various data sources and configurations. Provides convenient factory methods for common initialization patterns.
3
4
## Capabilities
5
6
### Creating from Text Lists
7
8
Factory method to create a Chroma instance and populate it with a list of texts in a single operation.
9
10
```python { .api }
11
@classmethod
12
def from_texts(
13
cls: type[Chroma],
14
texts: list[str],
15
embedding: Optional[Embeddings] = None,
16
metadatas: Optional[list[dict]] = None,
17
ids: Optional[list[str]] = None,
18
collection_name: str = "langchain",
19
persist_directory: Optional[str] = None,
20
host: Optional[str] = None,
21
port: Optional[int] = None,
22
headers: Optional[dict[str, str]] = None,
23
chroma_cloud_api_key: Optional[str] = None,
24
tenant: Optional[str] = None,
25
database: Optional[str] = None,
26
client_settings: Optional[chromadb.config.Settings] = None,
27
client: Optional[chromadb.ClientAPI] = None,
28
collection_metadata: Optional[dict] = None,
29
collection_configuration: Optional[CreateCollectionConfiguration] = None,
30
*,
31
ssl: bool = False,
32
**kwargs: Any,
33
) -> Chroma:
34
"""
35
Create a Chroma vector store from a list of texts.
36
37
Creates the vector store instance and adds all provided texts in batch operations
38
for efficient initialization.
39
40
Parameters:
41
- texts: List of text strings to add to the vector store
42
- embedding: Embedding function for vectorizing texts
43
- metadatas: Optional list of metadata dictionaries for each text
44
- ids: Optional list of custom IDs (UUIDs generated if not provided)
45
- collection_name: Name for the new collection (default: "langchain")
46
- persist_directory: Directory to persist the collection
47
- host: Hostname of deployed Chroma server
48
- port: Connection port for Chroma server (default: 8000)
49
- ssl: Whether to use SSL connection (default: False)
50
- headers: HTTP headers for Chroma server
51
- chroma_cloud_api_key: API key for Chroma Cloud
52
- tenant: Tenant ID for Chroma Cloud
53
- database: Database name for Chroma Cloud
54
- client_settings: Custom ChromaDB client settings
55
- client: Pre-configured ChromaDB client
56
- collection_metadata: Metadata for the collection
57
- collection_configuration: Index configuration for the collection
58
- **kwargs: Additional arguments for Chroma client initialization
59
60
Returns:
61
Chroma instance populated with the provided texts
62
"""
63
```
64
65
**Usage Example:**
66
```python
67
from langchain_chroma import Chroma
68
from langchain_openai import OpenAIEmbeddings
69
70
# Basic usage with texts
71
texts = [
72
"The quick brown fox jumps over the lazy dog",
73
"Python is a powerful programming language",
74
"Machine learning is transforming technology"
75
]
76
77
vector_store = Chroma.from_texts(
78
texts=texts,
79
embedding=OpenAIEmbeddings(),
80
collection_name="my_documents"
81
)
82
83
# With metadata and persistence
84
texts = ["Document 1", "Document 2", "Document 3"]
85
metadatas = [
86
{"source": "file1.txt", "author": "Alice"},
87
{"source": "file2.txt", "author": "Bob"},
88
{"source": "file3.txt", "author": "Charlie"}
89
]
90
ids = ["doc_1", "doc_2", "doc_3"]
91
92
persistent_store = Chroma.from_texts(
93
texts=texts,
94
embedding=OpenAIEmbeddings(),
95
metadatas=metadatas,
96
ids=ids,
97
collection_name="persistent_docs",
98
persist_directory="./chroma_db"
99
)
100
101
# With Chroma Cloud
102
cloud_store = Chroma.from_texts(
103
texts=texts,
104
embedding=OpenAIEmbeddings(),
105
collection_name="cloud_collection",
106
chroma_cloud_api_key="your-api-key",
107
tenant="your-tenant",
108
database="your-database"
109
)
110
```
111
112
### Creating from Document Objects
113
114
Factory method to create a Chroma instance from LangChain Document objects.
115
116
```python { .api }
117
@classmethod
118
def from_documents(
119
cls: type[Chroma],
120
documents: list[Document],
121
embedding: Optional[Embeddings] = None,
122
ids: Optional[list[str]] = None,
123
collection_name: str = "langchain",
124
persist_directory: Optional[str] = None,
125
host: Optional[str] = None,
126
port: Optional[int] = None,
127
headers: Optional[dict[str, str]] = None,
128
chroma_cloud_api_key: Optional[str] = None,
129
tenant: Optional[str] = None,
130
database: Optional[str] = None,
131
client_settings: Optional[chromadb.config.Settings] = None,
132
client: Optional[chromadb.ClientAPI] = None,
133
collection_metadata: Optional[dict] = None,
134
collection_configuration: Optional[CreateCollectionConfiguration] = None,
135
*,
136
ssl: bool = False,
137
**kwargs: Any,
138
) -> Chroma:
139
"""
140
Create a Chroma vector store from a list of Document objects.
141
142
Extracts text content and metadata from Document objects and creates
143
a vector store with efficient batch operations.
144
145
Parameters:
146
- documents: List of Document objects to add to the vector store
147
- embedding: Embedding function for vectorizing document content
148
- ids: Optional list of custom IDs (uses document.id or generates UUIDs)
149
- collection_name: Name for the new collection (default: "langchain")
150
- persist_directory: Directory to persist the collection
151
- host: Hostname of deployed Chroma server
152
- port: Connection port (default: 8000)
153
- ssl: Whether to use SSL connection (default: False)
154
- headers: HTTP headers for server connection
155
- chroma_cloud_api_key: API key for Chroma Cloud
156
- tenant: Tenant ID for Chroma Cloud
157
- database: Database name for Chroma Cloud
158
- client_settings: Custom ChromaDB client settings
159
- client: Pre-configured ChromaDB client
160
- collection_metadata: Metadata for the collection
161
- collection_configuration: Index configuration
162
- **kwargs: Additional client initialization arguments
163
164
Returns:
165
Chroma instance populated with the provided documents
166
"""
167
```
168
169
**Usage Example:**
170
```python
171
from langchain_core.documents import Document
172
from langchain_chroma import Chroma
173
from langchain_openai import OpenAIEmbeddings
174
175
# Create documents
176
documents = [
177
Document(
178
page_content="First document content",
179
metadata={"source": "doc1", "category": "general"},
180
id="custom_id_1"
181
),
182
Document(
183
page_content="Second document content",
184
metadata={"source": "doc2", "category": "technical"}
185
),
186
Document(
187
page_content="Third document content",
188
metadata={"source": "doc3", "category": "general"}
189
)
190
]
191
192
# Create vector store from documents
193
vector_store = Chroma.from_documents(
194
documents=documents,
195
embedding=OpenAIEmbeddings(),
196
collection_name="document_collection",
197
persist_directory="./my_vector_db"
198
)
199
200
# With custom configuration
201
from chromadb.api import CreateCollectionConfiguration
202
203
configured_store = Chroma.from_documents(
204
documents=documents,
205
embedding=OpenAIEmbeddings(),
206
collection_name="configured_collection",
207
collection_configuration=CreateCollectionConfiguration({
208
"hnsw": {"space": "cosine", "M": 16}
209
}),
210
collection_metadata={"version": "1.0", "description": "My documents"}
211
)
212
```
213
214
### Image Encoding Utility
215
216
Static utility method for encoding images to base64 strings for storage or processing.
217
218
```python { .api }
219
@staticmethod
220
def encode_image(uri: str) -> str:
221
"""
222
Encode an image file to a base64 string.
223
224
Utility function for preparing images for storage in the vector store
225
or for processing with multimodal embedding functions.
226
227
Parameters:
228
- uri: File path to the image file
229
230
Returns:
231
Base64 encoded string representation of the image
232
233
Raises:
234
FileNotFoundError: If the image file doesn't exist
235
IOError: If the file cannot be read
236
"""
237
```
238
239
**Usage Example:**
240
```python
241
# Encode image for storage or processing
242
image_path = "/path/to/image.jpg"
243
encoded_image = Chroma.encode_image(image_path)
244
245
# Use encoded image with documents
246
image_document = Document(
247
page_content=encoded_image,
248
metadata={"type": "image", "format": "jpg", "source": image_path}
249
)
250
251
# Add to vector store (requires multimodal embeddings)
252
vector_store.add_documents([image_document])
253
```
254
255
## Configuration Options
256
257
### Client Types and Configuration
258
259
Different ChromaDB client configurations for various deployment scenarios.
260
261
**In-Memory Client (Default):**
262
```python
263
vector_store = Chroma.from_texts(
264
texts=texts,
265
embedding=embeddings,
266
collection_name="memory_collection"
267
)
268
```
269
270
**Persistent Client:**
271
```python
272
vector_store = Chroma.from_texts(
273
texts=texts,
274
embedding=embeddings,
275
collection_name="persistent_collection",
276
persist_directory="/path/to/chroma/db"
277
)
278
```
279
280
**HTTP Client (Remote Server):**
281
```python
282
vector_store = Chroma.from_texts(
283
texts=texts,
284
embedding=embeddings,
285
collection_name="remote_collection",
286
host="chroma-server.example.com",
287
port=8000,
288
ssl=True,
289
headers={"Authorization": "Bearer token"}
290
)
291
```
292
293
**Chroma Cloud Client:**
294
```python
295
vector_store = Chroma.from_texts(
296
texts=texts,
297
embedding=embeddings,
298
collection_name="cloud_collection",
299
chroma_cloud_api_key="your-api-key",
300
tenant="your-tenant",
301
database="your-database"
302
)
303
```
304
305
### Collection Configuration
306
307
Advanced collection settings for performance and behavior tuning.
308
309
```python
310
from chromadb.api import CreateCollectionConfiguration
311
312
# HNSW index configuration
313
hnsw_config = CreateCollectionConfiguration({
314
"hnsw": {
315
"space": "cosine", # cosine, l2, or ip
316
"M": 16, # Number of bi-directional links
317
"ef_construction": 200, # Size of dynamic candidate list
318
"max_elements": 10000 # Maximum number of elements
319
}
320
})
321
322
vector_store = Chroma.from_texts(
323
texts=texts,
324
embedding=embeddings,
325
collection_configuration=hnsw_config
326
)
327
```
328
329
### Batch Processing
330
331
Factory methods automatically handle batch processing for large datasets.
332
333
```python
334
# Large dataset - automatically batched
335
large_texts = ["Text {}".format(i) for i in range(10000)]
336
large_metadatas = [{"index": i} for i in range(10000)]
337
338
# Efficiently processes in batches
339
vector_store = Chroma.from_texts(
340
texts=large_texts,
341
metadatas=large_metadatas,
342
embedding=embeddings,
343
collection_name="large_collection"
344
)
345
```
346
347
## Error Handling
348
349
Construction methods include error handling for common failure scenarios.
350
351
```python
352
try:
353
vector_store = Chroma.from_texts(
354
texts=texts,
355
embedding=embeddings,
356
persist_directory="/invalid/path"
357
)
358
except ValueError as e:
359
print(f"Configuration error: {e}")
360
except Exception as e:
361
print(f"Unexpected error during construction: {e}")
362
363
# Validate before construction
364
if texts and embeddings:
365
vector_store = Chroma.from_texts(texts=texts, embedding=embeddings)
366
else:
367
print("Missing required texts or embeddings")
368
```