0
# LangChain Chroma
1
2
An integration package connecting Chroma and LangChain for vector database operations. This package provides a LangChain-compatible interface to ChromaDB, enabling developers to use ChromaDB as a vector store for embedding-based search and retrieval in AI applications, particularly for semantic search, question-answering systems, and retrieval-augmented generation (RAG) pipelines.
3
4
## Package Information
5
6
- **Package Name**: langchain-chroma
7
- **Language**: Python
8
- **Installation**: `pip install langchain-chroma`
9
- **Dependencies**: `chromadb>=1.0.9`, `langchain-core>=0.3.70`, `numpy>=1.26.0`
10
11
## Core Imports
12
13
```python
14
from langchain_chroma import Chroma
15
```
16
17
## Basic Usage
18
19
```python
20
from langchain_chroma import Chroma
21
from langchain_core.documents import Document
22
from langchain_openai import OpenAIEmbeddings
23
24
# Initialize the vector store
25
embeddings = OpenAIEmbeddings()
26
vector_store = Chroma(
27
collection_name="my_collection",
28
embedding_function=embeddings,
29
persist_directory="./chroma_db"
30
)
31
32
# Add documents
33
documents = [
34
Document(page_content="Hello world", metadata={"source": "greeting"}),
35
Document(page_content="Python is great", metadata={"source": "programming"})
36
]
37
vector_store.add_documents(documents)
38
39
# Perform similarity search
40
results = vector_store.similarity_search("programming language", k=2)
41
for doc in results:
42
print(f"Content: {doc.page_content}")
43
print(f"Metadata: {doc.metadata}")
44
```
45
46
## Architecture
47
48
The langchain-chroma package implements the LangChain VectorStore interface with ChromaDB as the backend:
49
50
- **Chroma Class**: Main vector store class that handles all vector operations
51
- **Client Management**: Supports multiple ChromaDB client types (local, persistent, HTTP, cloud)
52
- **Embedding Integration**: Works with any LangChain-compatible embedding function
53
- **Document Management**: Full CRUD operations for documents with metadata support
54
- **Search Operations**: Multiple search modes including similarity search, MMR, and image search
55
56
## Capabilities
57
58
### Document Management
59
60
Core document operations including adding, updating, and deleting documents in the vector store. Supports batch operations and automatic ID generation.
61
62
```python { .api }
63
def add_texts(texts: Iterable[str], metadatas: Optional[list[dict]] = None, ids: Optional[list[str]] = None, **kwargs: Any) -> list[str]
64
def add_documents(documents: list[Document], ids: Optional[list[str]] = None, **kwargs: Any) -> list[str]
65
def add_images(uris: list[str], metadatas: Optional[list[dict]] = None, ids: Optional[list[str]] = None) -> list[str]
66
def update_document(document_id: str, document: Document) -> None
67
def update_documents(ids: list[str], documents: list[Document]) -> None
68
def delete(ids: Optional[list[str]] = None, **kwargs: Any) -> None
69
```
70
71
[Document Management](./document-management.md)
72
73
### Search Operations
74
75
Comprehensive search functionality including similarity search, vector search, and relevance scoring. Supports metadata filtering and document content filtering.
76
77
```python { .api }
78
def similarity_search(query: str, k: int = 4, filter: Optional[dict[str, str]] = None, **kwargs: Any) -> list[Document]
79
def similarity_search_with_score(query: str, k: int = 4, filter: Optional[dict[str, str]] = None, where_document: Optional[dict[str, str]] = None, **kwargs: Any) -> list[tuple[Document, float]]
80
def similarity_search_by_vector(embedding: list[float], k: int = 4, filter: Optional[dict[str, str]] = None, where_document: Optional[dict[str, str]] = None, **kwargs: Any) -> list[Document]
81
def similarity_search_by_image(uri: str, k: int = 4, filter: Optional[dict[str, str]] = None, **kwargs: Any) -> list[Document]
82
```
83
84
[Search Operations](./search-operations.md)
85
86
### Maximum Marginal Relevance
87
88
Advanced search algorithms that optimize for both similarity to query and diversity among results, reducing redundancy in search results.
89
90
```python { .api }
91
def max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filter: Optional[dict[str, str]] = None, where_document: Optional[dict[str, str]] = None, **kwargs: Any) -> list[Document]
92
def max_marginal_relevance_search_by_vector(embedding: list[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filter: Optional[dict[str, str]] = None, where_document: Optional[dict[str, str]] = None, **kwargs: Any) -> list[Document]
93
```
94
95
[Maximum Marginal Relevance](./mmr.md)
96
97
### Collection Management
98
99
Collection-level operations for managing the underlying ChromaDB collections, including retrieval, resetting, and deletion.
100
101
```python { .api }
102
def get(ids: Optional[Union[str, list[str]]] = None, where: Optional[Where] = None, limit: Optional[int] = None, offset: Optional[int] = None, where_document: Optional[WhereDocument] = None, include: Optional[list[str]] = None) -> dict[str, Any]
103
def get_by_ids(ids: Sequence[str], /) -> list[Document]
104
def reset_collection() -> None
105
def delete_collection() -> None
106
```
107
108
[Collection Management](./collection-management.md)
109
110
### Vector Store Construction
111
112
Class methods and utilities for creating Chroma instances from various data sources and configurations.
113
114
```python { .api }
115
@classmethod
116
def from_texts(cls: type[Chroma], texts: list[str], embedding: Optional[Embeddings] = None, metadatas: Optional[list[dict]] = None, ids: Optional[list[str]] = None, collection_name: str = "langchain", **kwargs: Any) -> Chroma
117
118
@classmethod
119
def from_documents(cls: type[Chroma], documents: list[Document], embedding: Optional[Embeddings] = None, ids: Optional[list[str]] = None, collection_name: str = "langchain", **kwargs: Any) -> Chroma
120
121
@staticmethod
122
def encode_image(uri: str) -> str
123
```
124
125
[Vector Store Construction](./construction.md)
126
127
## Types
128
129
```python { .api }
130
from typing import Union, Optional, Any, Callable, Iterable
131
from collections.abc import Sequence
132
import numpy as np
133
from langchain_core.documents import Document
134
from langchain_core.embeddings import Embeddings
135
from chromadb.api.types import Where, WhereDocument
136
from chromadb.api import CreateCollectionConfiguration
137
import chromadb
138
139
Matrix = Union[list[list[float]], list[np.ndarray], np.ndarray]
140
141
class Chroma(VectorStore):
142
"""
143
Chroma vector store integration for LangChain.
144
145
Provides a LangChain-compatible interface to ChromaDB for vector storage,
146
similarity search, and document retrieval operations.
147
"""
148
149
def __init__(
150
self,
151
collection_name: str = "langchain",
152
embedding_function: Optional[Embeddings] = None,
153
persist_directory: Optional[str] = None,
154
host: Optional[str] = None,
155
port: Optional[int] = None,
156
headers: Optional[dict[str, str]] = None,
157
chroma_cloud_api_key: Optional[str] = None,
158
tenant: Optional[str] = None,
159
database: Optional[str] = None,
160
client_settings: Optional[chromadb.config.Settings] = None,
161
collection_metadata: Optional[dict] = None,
162
collection_configuration: Optional[CreateCollectionConfiguration] = None,
163
client: Optional[chromadb.ClientAPI] = None,
164
relevance_score_fn: Optional[Callable[[float], float]] = None,
165
create_collection_if_not_exists: Optional[bool] = True,
166
*,
167
ssl: bool = False,
168
) -> None
169
```