0
# Document Management
1
2
Core document operations for managing text and image documents in the Chroma vector store. Supports adding, updating, and deleting documents with metadata and automatic ID generation.
3
4
## Capabilities
5
6
### Adding Text Documents
7
8
Add text documents to the vector store with optional metadata and custom IDs. Documents are automatically embedded using the configured embedding function.
9
10
```python { .api }
11
def add_texts(
12
texts: Iterable[str],
13
metadatas: Optional[list[dict]] = None,
14
ids: Optional[list[str]] = None,
15
**kwargs: Any
16
) -> list[str]:
17
"""
18
Add texts to the vector store.
19
20
Parameters:
21
- texts: Iterable of text strings to add
22
- metadatas: Optional list of metadata dictionaries for each text
23
- ids: Optional list of custom IDs (UUIDs generated if not provided)
24
- **kwargs: Additional keyword arguments
25
26
Returns:
27
List of document IDs that were added
28
29
Raises:
30
ValueError: When metadata format is incorrect
31
"""
32
33
def add_documents(
34
documents: list[Document],
35
ids: Optional[list[str]] = None,
36
**kwargs: Any
37
) -> list[str]:
38
"""
39
Add Document objects to the vector store.
40
41
Parameters:
42
- documents: List of Document objects to add
43
- ids: Optional list of custom IDs (uses document.id or generates UUIDs)
44
- **kwargs: Additional keyword arguments
45
46
Returns:
47
List of document IDs that were added
48
"""
49
```
50
51
**Usage Example:**
52
```python
53
# Add texts with metadata
54
texts = ["Hello world", "Python is great", "AI is fascinating"]
55
metadatas = [
56
{"source": "greeting", "category": "social"},
57
{"source": "programming", "category": "tech"},
58
{"source": "ai", "category": "tech"}
59
]
60
ids = vector_store.add_texts(texts, metadatas=metadatas)
61
62
# Add Document objects
63
from langchain_core.documents import Document
64
documents = [
65
Document(page_content="Machine Learning", metadata={"topic": "AI"}),
66
Document(page_content="Deep Learning", metadata={"topic": "AI"})
67
]
68
doc_ids = vector_store.add_documents(documents)
69
```
70
71
### Adding Image Documents
72
73
Add images to the vector store using file URIs. Requires an embedding function that supports image embeddings.
74
75
```python { .api }
76
def add_images(
77
uris: list[str],
78
metadatas: Optional[list[dict]] = None,
79
ids: Optional[list[str]] = None
80
) -> list[str]:
81
"""
82
Add images to the vector store.
83
84
Parameters:
85
- uris: List of file paths to images
86
- metadatas: Optional list of metadata dictionaries for each image
87
- ids: Optional list of custom IDs (UUIDs generated if not provided)
88
89
Returns:
90
List of document IDs that were added
91
92
Raises:
93
ValueError: When metadata format is incorrect or embedding function doesn't support images
94
"""
95
```
96
97
**Usage Example:**
98
```python
99
# Add images (requires embedding function with image support)
100
image_paths = ["/path/to/image1.jpg", "/path/to/image2.png"]
101
metadatas = [{"type": "photo"}, {"type": "diagram"}]
102
image_ids = vector_store.add_images(image_paths, metadatas=metadatas)
103
```
104
105
### Updating Documents
106
107
Update existing documents in the vector store by their IDs.
108
109
```python { .api }
110
def update_document(document_id: str, document: Document) -> None:
111
"""
112
Update a single document in the collection.
113
114
Parameters:
115
- document_id: ID of the document to update
116
- document: New Document object to replace the existing one
117
118
Raises:
119
ValueError: If embedding function is not provided
120
"""
121
122
def update_documents(ids: list[str], documents: list[Document]) -> None:
123
"""
124
Update multiple documents in the collection.
125
126
Parameters:
127
- ids: List of document IDs to update
128
- documents: List of new Document objects
129
130
Raises:
131
ValueError: If embedding function is not provided
132
"""
133
```
134
135
**Usage Example:**
136
```python
137
# Update a single document
138
updated_doc = Document(
139
page_content="Updated content",
140
metadata={"status": "revised"}
141
)
142
vector_store.update_document("doc_id_123", updated_doc)
143
144
# Update multiple documents
145
updated_docs = [
146
Document(page_content="New content 1", metadata={"version": 2}),
147
Document(page_content="New content 2", metadata={"version": 2})
148
]
149
vector_store.update_documents(["id_1", "id_2"], updated_docs)
150
```
151
152
### Deleting Documents
153
154
Remove documents from the vector store by their IDs.
155
156
```python { .api }
157
def delete(ids: Optional[list[str]] = None, **kwargs: Any) -> None:
158
"""
159
Delete documents from the vector store.
160
161
Parameters:
162
- ids: List of document IDs to delete
163
- **kwargs: Additional keyword arguments passed to ChromaDB
164
"""
165
```
166
167
**Usage Example:**
168
```python
169
# Delete specific documents
170
vector_store.delete(ids=["doc_id_1", "doc_id_2"])
171
172
# Delete with additional ChromaDB parameters
173
vector_store.delete(ids=["doc_id_3"], where={"category": "obsolete"})
174
```
175
176
## Utility Functions
177
178
### Image Encoding
179
180
Static method for encoding images to base64 strings.
181
182
```python { .api }
183
@staticmethod
184
def encode_image(uri: str) -> str:
185
"""
186
Encode an image file to base64 string.
187
188
Parameters:
189
- uri: File path to the image
190
191
Returns:
192
Base64 encoded string representation of the image
193
"""
194
```
195
196
**Usage Example:**
197
```python
198
# Encode image for manual processing
199
encoded_image = Chroma.encode_image("/path/to/image.jpg")
200
```