or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

collection-management.mdconstruction.mddocument-management.mdindex.mdmmr.mdsearch-operations.md

construction.mddocs/

0

# Vector Store Construction

1

2

Class methods and utilities for creating Chroma vector store instances from various data sources and configurations. Provides convenient factory methods for common initialization patterns.

3

4

## Capabilities

5

6

### Creating from Text Lists

7

8

Factory method to create a Chroma instance and populate it with a list of texts in a single operation.

9

10

```python { .api }

11

@classmethod

12

def from_texts(

13

cls: type[Chroma],

14

texts: list[str],

15

embedding: Optional[Embeddings] = None,

16

metadatas: Optional[list[dict]] = None,

17

ids: Optional[list[str]] = None,

18

collection_name: str = "langchain",

19

persist_directory: Optional[str] = None,

20

host: Optional[str] = None,

21

port: Optional[int] = None,

22

headers: Optional[dict[str, str]] = None,

23

chroma_cloud_api_key: Optional[str] = None,

24

tenant: Optional[str] = None,

25

database: Optional[str] = None,

26

client_settings: Optional[chromadb.config.Settings] = None,

27

client: Optional[chromadb.ClientAPI] = None,

28

collection_metadata: Optional[dict] = None,

29

collection_configuration: Optional[CreateCollectionConfiguration] = None,

30

*,

31

ssl: bool = False,

32

**kwargs: Any,

33

) -> Chroma:

34

"""

35

Create a Chroma vector store from a list of texts.

36

37

Creates the vector store instance and adds all provided texts in batch operations

38

for efficient initialization.

39

40

Parameters:

41

- texts: List of text strings to add to the vector store

42

- embedding: Embedding function for vectorizing texts

43

- metadatas: Optional list of metadata dictionaries for each text

44

- ids: Optional list of custom IDs (UUIDs generated if not provided)

45

- collection_name: Name for the new collection (default: "langchain")

46

- persist_directory: Directory to persist the collection

47

- host: Hostname of deployed Chroma server

48

- port: Connection port for Chroma server (default: 8000)

49

- ssl: Whether to use SSL connection (default: False)

50

- headers: HTTP headers for Chroma server

51

- chroma_cloud_api_key: API key for Chroma Cloud

52

- tenant: Tenant ID for Chroma Cloud

53

- database: Database name for Chroma Cloud

54

- client_settings: Custom ChromaDB client settings

55

- client: Pre-configured ChromaDB client

56

- collection_metadata: Metadata for the collection

57

- collection_configuration: Index configuration for the collection

58

- **kwargs: Additional arguments for Chroma client initialization

59

60

Returns:

61

Chroma instance populated with the provided texts

62

"""

63

```

64

65

**Usage Example:**

66

```python

67

from langchain_chroma import Chroma

68

from langchain_openai import OpenAIEmbeddings

69

70

# Basic usage with texts

71

texts = [

72

"The quick brown fox jumps over the lazy dog",

73

"Python is a powerful programming language",

74

"Machine learning is transforming technology"

75

]

76

77

vector_store = Chroma.from_texts(

78

texts=texts,

79

embedding=OpenAIEmbeddings(),

80

collection_name="my_documents"

81

)

82

83

# With metadata and persistence

84

texts = ["Document 1", "Document 2", "Document 3"]

85

metadatas = [

86

{"source": "file1.txt", "author": "Alice"},

87

{"source": "file2.txt", "author": "Bob"},

88

{"source": "file3.txt", "author": "Charlie"}

89

]

90

ids = ["doc_1", "doc_2", "doc_3"]

91

92

persistent_store = Chroma.from_texts(

93

texts=texts,

94

embedding=OpenAIEmbeddings(),

95

metadatas=metadatas,

96

ids=ids,

97

collection_name="persistent_docs",

98

persist_directory="./chroma_db"

99

)

100

101

# With Chroma Cloud

102

cloud_store = Chroma.from_texts(

103

texts=texts,

104

embedding=OpenAIEmbeddings(),

105

collection_name="cloud_collection",

106

chroma_cloud_api_key="your-api-key",

107

tenant="your-tenant",

108

database="your-database"

109

)

110

```

111

112

### Creating from Document Objects

113

114

Factory method to create a Chroma instance from LangChain Document objects.

115

116

```python { .api }

117

@classmethod

118

def from_documents(

119

cls: type[Chroma],

120

documents: list[Document],

121

embedding: Optional[Embeddings] = None,

122

ids: Optional[list[str]] = None,

123

collection_name: str = "langchain",

124

persist_directory: Optional[str] = None,

125

host: Optional[str] = None,

126

port: Optional[int] = None,

127

headers: Optional[dict[str, str]] = None,

128

chroma_cloud_api_key: Optional[str] = None,

129

tenant: Optional[str] = None,

130

database: Optional[str] = None,

131

client_settings: Optional[chromadb.config.Settings] = None,

132

client: Optional[chromadb.ClientAPI] = None,

133

collection_metadata: Optional[dict] = None,

134

collection_configuration: Optional[CreateCollectionConfiguration] = None,

135

*,

136

ssl: bool = False,

137

**kwargs: Any,

138

) -> Chroma:

139

"""

140

Create a Chroma vector store from a list of Document objects.

141

142

Extracts text content and metadata from Document objects and creates

143

a vector store with efficient batch operations.

144

145

Parameters:

146

- documents: List of Document objects to add to the vector store

147

- embedding: Embedding function for vectorizing document content

148

- ids: Optional list of custom IDs (uses document.id or generates UUIDs)

149

- collection_name: Name for the new collection (default: "langchain")

150

- persist_directory: Directory to persist the collection

151

- host: Hostname of deployed Chroma server

152

- port: Connection port (default: 8000)

153

- ssl: Whether to use SSL connection (default: False)

154

- headers: HTTP headers for server connection

155

- chroma_cloud_api_key: API key for Chroma Cloud

156

- tenant: Tenant ID for Chroma Cloud

157

- database: Database name for Chroma Cloud

158

- client_settings: Custom ChromaDB client settings

159

- client: Pre-configured ChromaDB client

160

- collection_metadata: Metadata for the collection

161

- collection_configuration: Index configuration

162

- **kwargs: Additional client initialization arguments

163

164

Returns:

165

Chroma instance populated with the provided documents

166

"""

167

```

168

169

**Usage Example:**

170

```python

171

from langchain_core.documents import Document

172

from langchain_chroma import Chroma

173

from langchain_openai import OpenAIEmbeddings

174

175

# Create documents

176

documents = [

177

Document(

178

page_content="First document content",

179

metadata={"source": "doc1", "category": "general"},

180

id="custom_id_1"

181

),

182

Document(

183

page_content="Second document content",

184

metadata={"source": "doc2", "category": "technical"}

185

),

186

Document(

187

page_content="Third document content",

188

metadata={"source": "doc3", "category": "general"}

189

)

190

]

191

192

# Create vector store from documents

193

vector_store = Chroma.from_documents(

194

documents=documents,

195

embedding=OpenAIEmbeddings(),

196

collection_name="document_collection",

197

persist_directory="./my_vector_db"

198

)

199

200

# With custom configuration

201

from chromadb.api import CreateCollectionConfiguration

202

203

configured_store = Chroma.from_documents(

204

documents=documents,

205

embedding=OpenAIEmbeddings(),

206

collection_name="configured_collection",

207

collection_configuration=CreateCollectionConfiguration({

208

"hnsw": {"space": "cosine", "M": 16}

209

}),

210

collection_metadata={"version": "1.0", "description": "My documents"}

211

)

212

```

213

214

### Image Encoding Utility

215

216

Static utility method for encoding images to base64 strings for storage or processing.

217

218

```python { .api }

219

@staticmethod

220

def encode_image(uri: str) -> str:

221

"""

222

Encode an image file to a base64 string.

223

224

Utility function for preparing images for storage in the vector store

225

or for processing with multimodal embedding functions.

226

227

Parameters:

228

- uri: File path to the image file

229

230

Returns:

231

Base64 encoded string representation of the image

232

233

Raises:

234

FileNotFoundError: If the image file doesn't exist

235

IOError: If the file cannot be read

236

"""

237

```

238

239

**Usage Example:**

240

```python

241

# Encode image for storage or processing

242

image_path = "/path/to/image.jpg"

243

encoded_image = Chroma.encode_image(image_path)

244

245

# Use encoded image with documents

246

image_document = Document(

247

page_content=encoded_image,

248

metadata={"type": "image", "format": "jpg", "source": image_path}

249

)

250

251

# Add to vector store (requires multimodal embeddings)

252

vector_store.add_documents([image_document])

253

```

254

255

## Configuration Options

256

257

### Client Types and Configuration

258

259

Different ChromaDB client configurations for various deployment scenarios.

260

261

**In-Memory Client (Default):**

262

```python

263

vector_store = Chroma.from_texts(

264

texts=texts,

265

embedding=embeddings,

266

collection_name="memory_collection"

267

)

268

```

269

270

**Persistent Client:**

271

```python

272

vector_store = Chroma.from_texts(

273

texts=texts,

274

embedding=embeddings,

275

collection_name="persistent_collection",

276

persist_directory="/path/to/chroma/db"

277

)

278

```

279

280

**HTTP Client (Remote Server):**

281

```python

282

vector_store = Chroma.from_texts(

283

texts=texts,

284

embedding=embeddings,

285

collection_name="remote_collection",

286

host="chroma-server.example.com",

287

port=8000,

288

ssl=True,

289

headers={"Authorization": "Bearer token"}

290

)

291

```

292

293

**Chroma Cloud Client:**

294

```python

295

vector_store = Chroma.from_texts(

296

texts=texts,

297

embedding=embeddings,

298

collection_name="cloud_collection",

299

chroma_cloud_api_key="your-api-key",

300

tenant="your-tenant",

301

database="your-database"

302

)

303

```

304

305

### Collection Configuration

306

307

Advanced collection settings for performance and behavior tuning.

308

309

```python

310

from chromadb.api import CreateCollectionConfiguration

311

312

# HNSW index configuration

313

hnsw_config = CreateCollectionConfiguration({

314

"hnsw": {

315

"space": "cosine", # cosine, l2, or ip

316

"M": 16, # Number of bi-directional links

317

"ef_construction": 200, # Size of dynamic candidate list

318

"max_elements": 10000 # Maximum number of elements

319

}

320

})

321

322

vector_store = Chroma.from_texts(

323

texts=texts,

324

embedding=embeddings,

325

collection_configuration=hnsw_config

326

)

327

```

328

329

### Batch Processing

330

331

Factory methods automatically handle batch processing for large datasets.

332

333

```python

334

# Large dataset - automatically batched

335

large_texts = ["Text {}".format(i) for i in range(10000)]

336

large_metadatas = [{"index": i} for i in range(10000)]

337

338

# Efficiently processes in batches

339

vector_store = Chroma.from_texts(

340

texts=large_texts,

341

metadatas=large_metadatas,

342

embedding=embeddings,

343

collection_name="large_collection"

344

)

345

```

346

347

## Error Handling

348

349

Construction methods include error handling for common failure scenarios.

350

351

```python

352

try:

353

vector_store = Chroma.from_texts(

354

texts=texts,

355

embedding=embeddings,

356

persist_directory="/invalid/path"

357

)

358

except ValueError as e:

359

print(f"Configuration error: {e}")

360

except Exception as e:

361

print(f"Unexpected error during construction: {e}")

362

363

# Validate before construction

364

if texts and embeddings:

365

vector_store = Chroma.from_texts(texts=texts, embedding=embeddings)

366

else:

367

print("Missing required texts or embeddings")

368

```