Chroma - the open-source embedding database
npx @tessl/cli install tessl/pypi-chromadb@1.0.00
# ChromaDB
1
2
ChromaDB is an open-source embedding database designed as the fastest way to build Python or JavaScript LLM applications with memory. It provides a simple API for storing, querying, and managing document embeddings with automatic tokenization, embedding generation, and indexing capabilities.
3
4
## Package Information
5
6
- **Package Name**: chromadb
7
- **Language**: Python
8
- **Installation**: `pip install chromadb`
9
10
## Core Imports
11
12
```python
13
import chromadb
14
```
15
16
For specific client types:
17
18
```python
19
from chromadb import EphemeralClient, PersistentClient, HttpClient
20
```
21
22
For types and embedding functions:
23
24
```python
25
from chromadb.api.types import Documents, Embeddings, Metadatas, Where, Include
26
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
27
```
28
29
## Basic Usage
30
31
```python
32
import chromadb
33
34
# Create a client (in-memory for testing)
35
client = chromadb.EphemeralClient()
36
37
# Create a collection
38
collection = client.create_collection(name="my_collection")
39
40
# Add documents with metadata
41
collection.add(
42
documents=["This is a document", "This is another document"],
43
metadatas=[{"source": "my_source"}, {"source": "my_source"}],
44
ids=["id1", "id2"]
45
)
46
47
# Query for similar documents
48
results = collection.query(
49
query_texts=["This is a query document"],
50
n_results=2
51
)
52
53
print(results)
54
```
55
56
## Architecture
57
58
ChromaDB operates on a client-server architecture with multiple deployment options:
59
60
- **Client Types**: Multiple client implementations for different deployment scenarios (in-memory, persistent, remote HTTP, cloud)
61
- **Collections**: Named containers for documents and embeddings with configurable metadata and embedding functions
62
- **Embedding Functions**: Pluggable components for generating embeddings from various providers (OpenAI, Cohere, HuggingFace, etc.)
63
- **Query System**: Vector similarity search with metadata filtering and document text matching
64
- **Multi-tenancy**: Support for tenants and databases for data isolation
65
66
## Capabilities
67
68
### Client Creation
69
70
Factory functions for creating different types of ChromaDB clients depending on deployment needs. Supports in-memory for testing, persistent storage for local development, and remote connections for production.
71
72
```python { .api }
73
def EphemeralClient(settings=None, tenant="default_tenant", database="default_database"): ...
74
def PersistentClient(path="./chroma", settings=None, tenant="default_tenant", database="default_database"): ...
75
def HttpClient(host="localhost", port=8000, ssl=False, headers=None, settings=None, tenant="default_tenant", database="default_database"): ...
76
def CloudClient(tenant=None, database=None, api_key=None, settings=None): ...
77
```
78
79
[Client Creation](./clients.md)
80
81
### Collection Operations
82
83
Core collection management including creation, retrieval, modification, and deletion. Collections are the primary containers for documents and embeddings with configurable metadata and embedding functions.
84
85
```python { .api }
86
class ClientAPI:
87
def create_collection(self, name: str, **kwargs) -> Collection: ...
88
def get_collection(self, name: str, **kwargs) -> Collection: ...
89
def delete_collection(self, name: str) -> None: ...
90
def list_collections(self, limit=None, offset=None) -> Sequence[Collection]: ...
91
```
92
93
[Collection Management](./collections.md)
94
95
### Document Operations
96
97
Adding, updating, querying, and deleting documents within collections. Supports embeddings, metadata, images, and URIs with flexible data formats and automatic embedding generation.
98
99
```python { .api }
100
class Collection:
101
def add(self, ids, documents=None, embeddings=None, metadatas=None, images=None, uris=None): ...
102
def query(self, query_texts=None, query_embeddings=None, n_results=10, where=None, **kwargs): ...
103
def get(self, ids=None, where=None, limit=None, **kwargs): ...
104
def update(self, ids, documents=None, embeddings=None, metadatas=None, **kwargs): ...
105
def delete(self, ids=None, where=None, where_document=None): ...
106
```
107
108
[Document Operations](./documents.md)
109
110
### Embedding Functions
111
112
Pre-built and configurable embedding functions for generating vector embeddings from text, supporting major AI providers and embedding models with consistent interfaces.
113
114
```python { .api }
115
class EmbeddingFunction:
116
def __call__(self, input): ...
117
def embed_with_retries(self, input, **retry_kwargs): ...
118
```
119
120
[Embedding Functions](./embedding-functions.md)
121
122
### Query and Filtering
123
124
Advanced query capabilities including vector similarity search, metadata filtering, and document text matching with logical operators and flexible result formatting.
125
126
```python { .api }
127
Where = Dict[Union[str, LogicalOperator], Union[LiteralValue, OperatorExpression, List[Where]]]
128
WhereDocument = Dict[WhereDocumentOperator, Union[str, List[WhereDocument]]]
129
Include = List[Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]]
130
```
131
132
[Queries and Filtering](./queries.md)
133
134
### Configuration and Settings
135
136
Comprehensive configuration system for customizing ChromaDB behavior including authentication, server settings, telemetry, and storage options.
137
138
```python { .api }
139
class Settings:
140
def __init__(self, **kwargs): ...
141
142
def configure(**kwargs) -> None: ...
143
def get_settings() -> Settings: ...
144
```
145
146
[Configuration](./configuration.md)
147
148
## Types
149
150
### Core Types
151
152
```python { .api }
153
# Document and metadata types
154
Documents = List[str]
155
Metadatas = List[Dict[str, Union[str, int, float, bool]]]
156
IDs = List[str]
157
Embeddings = List[List[float]]
158
159
# Query result types
160
GetResult = TypedDict('GetResult', {
161
'ids': List[List[str]],
162
'documents': List[List[Optional[str]]],
163
'metadatas': List[List[Optional[Dict]]],
164
'embeddings': List[List[Optional[List[float]]]],
165
'distances': List[List[float]],
166
'uris': List[List[Optional[str]]],
167
'data': List[List[Optional[Any]]],
168
'included': List[str]
169
})
170
171
QueryResult = TypedDict('QueryResult', {
172
'ids': List[List[str]],
173
'documents': List[List[Optional[str]]],
174
'metadatas': List[List[Optional[Dict]]],
175
'embeddings': List[List[Optional[List[float]]]],
176
'distances': List[List[float]],
177
'uris': List[List[Optional[str]]],
178
'data': List[List[Optional[Any]]],
179
'included': List[str]
180
})
181
```