or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

clients.mdcollections.mdconfiguration.mddocuments.mdembedding-functions.mdindex.mdqueries.md

index.mddocs/

0

# ChromaDB

1

2

ChromaDB is an open-source embedding database designed as the fastest way to build Python or JavaScript LLM applications with memory. It provides a simple API for storing, querying, and managing document embeddings with automatic tokenization, embedding generation, and indexing capabilities.

3

4

## Package Information

5

6

- **Package Name**: chromadb

7

- **Language**: Python

8

- **Installation**: `pip install chromadb`

9

10

## Core Imports

11

12

```python

13

import chromadb

14

```

15

16

For specific client types:

17

18

```python

19

from chromadb import EphemeralClient, PersistentClient, HttpClient

20

```

21

22

For types and embedding functions:

23

24

```python

25

from chromadb.api.types import Documents, Embeddings, Metadatas, Where, Include

26

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

27

```

28

29

## Basic Usage

30

31

```python

32

import chromadb

33

34

# Create a client (in-memory for testing)

35

client = chromadb.EphemeralClient()

36

37

# Create a collection

38

collection = client.create_collection(name="my_collection")

39

40

# Add documents with metadata

41

collection.add(

42

documents=["This is a document", "This is another document"],

43

metadatas=[{"source": "my_source"}, {"source": "my_source"}],

44

ids=["id1", "id2"]

45

)

46

47

# Query for similar documents

48

results = collection.query(

49

query_texts=["This is a query document"],

50

n_results=2

51

)

52

53

print(results)

54

```

55

56

## Architecture

57

58

ChromaDB operates on a client-server architecture with multiple deployment options:

59

60

- **Client Types**: Multiple client implementations for different deployment scenarios (in-memory, persistent, remote HTTP, cloud)

61

- **Collections**: Named containers for documents and embeddings with configurable metadata and embedding functions

62

- **Embedding Functions**: Pluggable components for generating embeddings from various providers (OpenAI, Cohere, HuggingFace, etc.)

63

- **Query System**: Vector similarity search with metadata filtering and document text matching

64

- **Multi-tenancy**: Support for tenants and databases for data isolation

65

66

## Capabilities

67

68

### Client Creation

69

70

Factory functions for creating different types of ChromaDB clients depending on deployment needs. Supports in-memory for testing, persistent storage for local development, and remote connections for production.

71

72

```python { .api }

73

def EphemeralClient(settings=None, tenant="default_tenant", database="default_database"): ...

74

def PersistentClient(path="./chroma", settings=None, tenant="default_tenant", database="default_database"): ...

75

def HttpClient(host="localhost", port=8000, ssl=False, headers=None, settings=None, tenant="default_tenant", database="default_database"): ...

76

def CloudClient(tenant=None, database=None, api_key=None, settings=None): ...

77

```

78

79

[Client Creation](./clients.md)

80

81

### Collection Operations

82

83

Core collection management including creation, retrieval, modification, and deletion. Collections are the primary containers for documents and embeddings with configurable metadata and embedding functions.

84

85

```python { .api }

86

class ClientAPI:

87

def create_collection(self, name: str, **kwargs) -> Collection: ...

88

def get_collection(self, name: str, **kwargs) -> Collection: ...

89

def delete_collection(self, name: str) -> None: ...

90

def list_collections(self, limit=None, offset=None) -> Sequence[Collection]: ...

91

```

92

93

[Collection Management](./collections.md)

94

95

### Document Operations

96

97

Adding, updating, querying, and deleting documents within collections. Supports embeddings, metadata, images, and URIs with flexible data formats and automatic embedding generation.

98

99

```python { .api }

100

class Collection:

101

def add(self, ids, documents=None, embeddings=None, metadatas=None, images=None, uris=None): ...

102

def query(self, query_texts=None, query_embeddings=None, n_results=10, where=None, **kwargs): ...

103

def get(self, ids=None, where=None, limit=None, **kwargs): ...

104

def update(self, ids, documents=None, embeddings=None, metadatas=None, **kwargs): ...

105

def delete(self, ids=None, where=None, where_document=None): ...

106

```

107

108

[Document Operations](./documents.md)

109

110

### Embedding Functions

111

112

Pre-built and configurable embedding functions for generating vector embeddings from text, supporting major AI providers and embedding models with consistent interfaces.

113

114

```python { .api }

115

class EmbeddingFunction:

116

def __call__(self, input): ...

117

def embed_with_retries(self, input, **retry_kwargs): ...

118

```

119

120

[Embedding Functions](./embedding-functions.md)

121

122

### Query and Filtering

123

124

Advanced query capabilities including vector similarity search, metadata filtering, and document text matching with logical operators and flexible result formatting.

125

126

```python { .api }

127

Where = Dict[Union[str, LogicalOperator], Union[LiteralValue, OperatorExpression, List[Where]]]

128

WhereDocument = Dict[WhereDocumentOperator, Union[str, List[WhereDocument]]]

129

Include = List[Literal["documents", "embeddings", "metadatas", "distances", "uris", "data"]]

130

```

131

132

[Queries and Filtering](./queries.md)

133

134

### Configuration and Settings

135

136

Comprehensive configuration system for customizing ChromaDB behavior including authentication, server settings, telemetry, and storage options.

137

138

```python { .api }

139

class Settings:

140

def __init__(self, **kwargs): ...

141

142

def configure(**kwargs) -> None: ...

143

def get_settings() -> Settings: ...

144

```

145

146

[Configuration](./configuration.md)

147

148

## Types

149

150

### Core Types

151

152

```python { .api }

153

# Document and metadata types

154

Documents = List[str]

155

Metadatas = List[Dict[str, Union[str, int, float, bool]]]

156

IDs = List[str]

157

Embeddings = List[List[float]]

158

159

# Query result types

160

GetResult = TypedDict('GetResult', {

161

'ids': List[List[str]],

162

'documents': List[List[Optional[str]]],

163

'metadatas': List[List[Optional[Dict]]],

164

'embeddings': List[List[Optional[List[float]]]],

165

'distances': List[List[float]],

166

'uris': List[List[Optional[str]]],

167

'data': List[List[Optional[Any]]],

168

'included': List[str]

169

})

170

171

QueryResult = TypedDict('QueryResult', {

172

'ids': List[List[str]],

173

'documents': List[List[Optional[str]]],

174

'metadatas': List[List[Optional[Dict]]],

175

'embeddings': List[List[Optional[List[float]]]],

176

'distances': List[List[float]],

177

'uris': List[List[Optional[str]]],

178

'data': List[List[Optional[Any]]],

179

'included': List[str]

180

})

181

```