0
# Collection Management
1
2
Collections are the primary containers for documents and embeddings in ChromaDB. They provide isolation, configuration, and organization for related documents with consistent embedding functions and metadata schemas.
3
4
## Capabilities
5
6
### Creating Collections
7
8
Create new collections with optional configuration, metadata, and custom embedding functions.
9
10
```python { .api }
11
def create_collection(
12
name: str,
13
configuration: Optional[CollectionConfiguration] = None,
14
metadata: Optional[CollectionMetadata] = None,
15
embedding_function: Optional[EmbeddingFunction] = None,
16
data_loader: Optional[DataLoader] = None,
17
get_or_create: bool = False
18
) -> Collection:
19
"""
20
Create a new collection.
21
22
Args:
23
name: The name of the collection
24
configuration: Optional configuration for the collection
25
metadata: Optional metadata for the collection
26
embedding_function: Function to generate embeddings
27
data_loader: Function to load data from URIs
28
get_or_create: If True, get existing collection instead of failing
29
30
Returns:
31
Collection: The created collection object
32
33
Raises:
34
ValueError: If collection already exists and get_or_create=False
35
"""
36
```
37
38
**Usage Example:**
39
40
```python
41
import chromadb
42
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
43
44
client = chromadb.EphemeralClient()
45
46
# Create a simple collection
47
collection = client.create_collection(name="my_documents")
48
49
# Create with custom embedding function
50
openai_ef = OpenAIEmbeddingFunction(api_key="your-api-key")
51
collection = client.create_collection(
52
name="openai_collection",
53
embedding_function=openai_ef,
54
metadata={"description": "Documents with OpenAI embeddings"}
55
)
56
```
57
58
### Getting Collections
59
60
Retrieve existing collections by name with optional embedding function specification.
61
62
```python { .api }
63
def get_collection(
64
name: str,
65
embedding_function: Optional[EmbeddingFunction] = None,
66
data_loader: Optional[DataLoader] = None
67
) -> Collection:
68
"""
69
Get an existing collection by name.
70
71
Args:
72
name: The name of the collection
73
embedding_function: Function to generate embeddings
74
data_loader: Function to load data from URIs
75
76
Returns:
77
Collection: The retrieved collection object
78
79
Raises:
80
ValueError: If collection does not exist
81
"""
82
```
83
84
**Usage Example:**
85
86
```python
87
# Get existing collection
88
collection = client.get_collection("my_documents")
89
90
# Get with specific embedding function
91
collection = client.get_collection(
92
"openai_collection",
93
embedding_function=OpenAIEmbeddingFunction(api_key="your-api-key")
94
)
95
```
96
97
### Get or Create Collections
98
99
Retrieve an existing collection or create it if it doesn't exist, providing convenience for idempotent operations.
100
101
```python { .api }
102
def get_or_create_collection(
103
name: str,
104
configuration: Optional[CollectionConfiguration] = None,
105
metadata: Optional[CollectionMetadata] = None,
106
embedding_function: Optional[EmbeddingFunction] = None,
107
data_loader: Optional[DataLoader] = None
108
) -> Collection:
109
"""
110
Get an existing collection or create it if it doesn't exist.
111
112
Args:
113
name: The name of the collection
114
configuration: Optional configuration for the collection
115
metadata: Optional metadata for the collection
116
embedding_function: Function to generate embeddings
117
data_loader: Function to load data from URIs
118
119
Returns:
120
Collection: The retrieved or created collection object
121
"""
122
```
123
124
### Listing Collections
125
126
Get a list of all collections with optional pagination support.
127
128
```python { .api }
129
def list_collections(
130
limit: Optional[int] = None,
131
offset: Optional[int] = None
132
) -> Sequence[Collection]:
133
"""
134
List all collections.
135
136
Args:
137
limit: Maximum number of collections to return
138
offset: Number of collections to skip
139
140
Returns:
141
Sequence[Collection]: List of collection objects
142
"""
143
```
144
145
**Usage Example:**
146
147
```python
148
# List all collections
149
collections = client.list_collections()
150
for collection in collections:
151
print(f"Collection: {collection.name}")
152
153
# List with pagination
154
first_10 = client.list_collections(limit=10)
155
next_10 = client.list_collections(limit=10, offset=10)
156
```
157
158
### Counting Collections
159
160
Get the total number of collections in the database.
161
162
```python { .api }
163
def count_collections() -> int:
164
"""
165
Count the number of collections.
166
167
Returns:
168
int: The number of collections in the database
169
"""
170
```
171
172
### Deleting Collections
173
174
Remove collections and all their contained documents and embeddings.
175
176
```python { .api }
177
def delete_collection(name: str) -> None:
178
"""
179
Delete a collection by name.
180
181
Args:
182
name: The name of the collection to delete
183
184
Raises:
185
ValueError: If collection does not exist
186
"""
187
```
188
189
**Usage Example:**
190
191
```python
192
# Delete a collection
193
client.delete_collection("old_collection")
194
```
195
196
### Collection Properties
197
198
Access collection metadata and properties.
199
200
```python { .api }
201
class Collection:
202
@property
203
def name(self) -> str:
204
"""The name of the collection."""
205
206
@property
207
def id(self) -> UUID:
208
"""The unique identifier of the collection."""
209
210
@property
211
def metadata(self) -> Optional[CollectionMetadata]:
212
"""The metadata associated with the collection."""
213
```
214
215
### Modifying Collections
216
217
Update collection properties including name, metadata, and configuration.
218
219
```python { .api }
220
def modify(
221
name: Optional[str] = None,
222
metadata: Optional[CollectionMetadata] = None,
223
configuration: Optional[CollectionConfiguration] = None
224
) -> None:
225
"""
226
Modify collection properties.
227
228
Args:
229
name: New name for the collection
230
metadata: New metadata for the collection
231
configuration: New configuration for the collection
232
"""
233
```
234
235
**Usage Example:**
236
237
```python
238
collection = client.get_collection("my_collection")
239
240
# Update metadata
241
collection.modify(metadata={"version": "2.0", "updated": "2024-01-01"})
242
243
# Rename collection
244
collection.modify(name="renamed_collection")
245
```
246
247
### Collection Counting
248
249
Get the number of documents in a collection.
250
251
```python { .api }
252
def count(self) -> int:
253
"""
254
Count the number of documents in the collection.
255
256
Returns:
257
int: The number of documents in the collection
258
"""
259
```
260
261
### Forking Collections
262
263
Create a copy of a collection with a new name while preserving all documents and embeddings.
264
265
```python { .api }
266
def fork(self, new_name: str) -> Collection:
267
"""
268
Create a copy of the collection with a new name.
269
270
Args:
271
new_name: The name for the new collection
272
273
Returns:
274
Collection: The newly created collection copy
275
"""
276
```
277
278
**Usage Example:**
279
280
```python
281
original = client.get_collection("original_collection")
282
copy = original.fork("collection_backup")
283
```
284
285
## Types
286
287
```python { .api }
288
from typing import Dict, Any, Optional, Sequence
289
from uuid import UUID
290
291
CollectionMetadata = Dict[str, Any]
292
CollectionConfiguration = Dict[str, Any]
293
294
class Collection:
295
name: str
296
id: UUID
297
metadata: Optional[CollectionMetadata]
298
```