0
# Embedding Functions
1
2
ChromaDB provides a comprehensive library of embedding functions for generating vector embeddings from text, supporting major AI providers and embedding models. Embedding functions are pluggable components that convert text into numerical representations for vector similarity search.
3
4
## Capabilities
5
6
### Default Embedding Function
7
8
ChromaDB includes a default ONNX-based embedding function that works out-of-the-box without requiring API keys.
9
10
```python { .api }
11
class DefaultEmbeddingFunction:
12
"""Default ONNX-based embedding function using all-MiniLM-L6-v2 model."""
13
14
def __call__(self, input: Documents) -> Embeddings:
15
"""Generate embeddings for input documents."""
16
17
def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings:
18
"""Generate embeddings with retry logic."""
19
```
20
21
**Usage Example:**
22
23
```python
24
import chromadb
25
26
# Uses DefaultEmbeddingFunction automatically
27
client = chromadb.EphemeralClient()
28
collection = client.create_collection("default_embeddings")
29
30
# Explicit usage
31
from chromadb.utils.embedding_functions import DefaultEmbeddingFunction
32
ef = DefaultEmbeddingFunction()
33
collection = client.create_collection("explicit_default", embedding_function=ef)
34
```
35
36
### OpenAI Embeddings
37
38
Generate embeddings using OpenAI's embedding models with API key authentication.
39
40
```python { .api }
41
class OpenAIEmbeddingFunction:
42
"""OpenAI embedding function using text-embedding-ada-002 or newer models."""
43
44
def __init__(
45
self,
46
api_key: str,
47
model_name: str = "text-embedding-ada-002",
48
api_base: Optional[str] = None,
49
api_type: Optional[str] = None,
50
api_version: Optional[str] = None,
51
deployment_id: Optional[str] = None
52
):
53
"""
54
Initialize OpenAI embedding function.
55
56
Args:
57
api_key: OpenAI API key
58
model_name: Model to use for embeddings
59
api_base: Custom API base URL
60
api_type: API type (e.g., 'azure')
61
api_version: API version for Azure
62
deployment_id: Deployment ID for Azure
63
"""
64
```
65
66
**Usage Example:**
67
68
```python
69
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
70
71
openai_ef = OpenAIEmbeddingFunction(
72
api_key="your-openai-api-key",
73
model_name="text-embedding-3-small"
74
)
75
76
collection = client.create_collection(
77
"openai_embeddings",
78
embedding_function=openai_ef
79
)
80
```
81
82
### Cohere Embeddings
83
84
Generate embeddings using Cohere's embedding models with support for different model types.
85
86
```python { .api }
87
class CohereEmbeddingFunction:
88
"""Cohere embedding function supporting various Cohere models."""
89
90
def __init__(
91
self,
92
api_key: str,
93
model_name: str = "embed-english-v2.0"
94
):
95
"""
96
Initialize Cohere embedding function.
97
98
Args:
99
api_key: Cohere API key
100
model_name: Cohere model to use for embeddings
101
"""
102
```
103
104
### HuggingFace Embeddings
105
106
Generate embeddings using HuggingFace Transformers models with local or remote execution.
107
108
```python { .api }
109
class HuggingFaceEmbeddingFunction:
110
"""HuggingFace Transformers embedding function."""
111
112
def __init__(
113
self,
114
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
115
device: str = "cpu",
116
normalize_embeddings: bool = True
117
):
118
"""
119
Initialize HuggingFace embedding function.
120
121
Args:
122
model_name: HuggingFace model identifier
123
device: Device to run model on ('cpu' or 'cuda')
124
normalize_embeddings: Whether to normalize embeddings
125
"""
126
```
127
128
**Usage Example:**
129
130
```python
131
from chromadb.utils.embedding_functions import HuggingFaceEmbeddingFunction
132
133
hf_ef = HuggingFaceEmbeddingFunction(
134
model_name="sentence-transformers/all-mpnet-base-v2",
135
device="cuda" if torch.cuda.is_available() else "cpu"
136
)
137
138
collection = client.create_collection(
139
"huggingface_embeddings",
140
embedding_function=hf_ef
141
)
142
```
143
144
### Sentence Transformers
145
146
Specialized interface for Sentence Transformers models optimized for semantic similarity.
147
148
```python { .api }
149
class SentenceTransformerEmbeddingFunction:
150
"""Sentence Transformers embedding function."""
151
152
def __init__(
153
self,
154
model_name: str = "all-MiniLM-L6-v2",
155
device: str = "cpu",
156
normalize_embeddings: bool = True
157
):
158
"""
159
Initialize Sentence Transformers embedding function.
160
161
Args:
162
model_name: Sentence Transformers model name
163
device: Device to run model on
164
normalize_embeddings: Whether to normalize embeddings
165
"""
166
```
167
168
### Google AI Embeddings
169
170
Generate embeddings using Google's AI models including PaLM and Vertex AI.
171
172
```python { .api }
173
class GooglePalmEmbeddingFunction:
174
"""Google PaLM embedding function."""
175
176
def __init__(self, api_key: str, model_name: str = "models/embedding-gecko-001"):
177
"""Initialize Google PaLM embedding function."""
178
179
class GoogleVertexEmbeddingFunction:
180
"""Google Vertex AI embedding function."""
181
182
def __init__(
183
self,
184
project_id: str,
185
region: str = "us-central1",
186
model_name: str = "textembedding-gecko"
187
):
188
"""Initialize Google Vertex AI embedding function."""
189
```
190
191
### Specialized Embedding Functions
192
193
ChromaDB includes many specialized embedding functions for specific use cases:
194
195
```python { .api }
196
class OllamaEmbeddingFunction:
197
"""Ollama local embedding function."""
198
199
class JinaEmbeddingFunction:
200
"""Jina AI embedding function."""
201
202
class VoyageAIEmbeddingFunction:
203
"""Voyage AI embedding function."""
204
205
class InstructorEmbeddingFunction:
206
"""Instructor embedding function."""
207
208
class OpenCLIPEmbeddingFunction:
209
"""OpenCLIP embedding function for images and text."""
210
211
class AmazonBedrockEmbeddingFunction:
212
"""Amazon Bedrock embedding function."""
213
214
class MistralEmbeddingFunction:
215
"""Mistral AI embedding function."""
216
```
217
218
### Custom Embedding Functions
219
220
Create custom embedding functions by implementing the EmbeddingFunction protocol.
221
222
```python { .api }
223
class EmbeddingFunction:
224
"""Protocol for embedding functions."""
225
226
def __call__(self, input: Documents) -> Embeddings:
227
"""Generate embeddings for input documents."""
228
229
def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings:
230
"""Generate embeddings with retry logic."""
231
232
@staticmethod
233
def name() -> str:
234
"""Return the name of the embedding function."""
235
236
@staticmethod
237
def build_from_config(config: Dict[str, Any]) -> "EmbeddingFunction":
238
"""Build embedding function from configuration."""
239
240
def get_config(self) -> Dict[str, Any]:
241
"""Get configuration for the embedding function."""
242
243
def default_space(self) -> str:
244
"""Return default distance metric ('cosine', 'l2', 'ip')."""
245
246
def supported_spaces(self) -> List[str]:
247
"""Return list of supported distance metrics."""
248
```
249
250
**Custom Implementation Example:**
251
252
```python
253
from chromadb.api.types import EmbeddingFunction, Documents, Embeddings
254
import requests
255
256
class CustomAPIEmbeddingFunction(EmbeddingFunction):
257
def __init__(self, api_url: str, api_key: str):
258
self.api_url = api_url
259
self.api_key = api_key
260
261
def __call__(self, input: Documents) -> Embeddings:
262
response = requests.post(
263
self.api_url,
264
headers={"Authorization": f"Bearer {self.api_key}"},
265
json={"texts": input}
266
)
267
return response.json()["embeddings"]
268
269
def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings:
270
# Implement retry logic
271
return self.__call__(input)
272
273
# Use custom embedding function
274
custom_ef = CustomAPIEmbeddingFunction("https://api.example.com/embed", "your-key")
275
collection = client.create_collection("custom_embeddings", embedding_function=custom_ef)
276
```
277
278
### Embedding Function Utilities
279
280
Utility functions for working with embedding functions.
281
282
```python { .api }
283
def register_embedding_function(ef_class: type) -> None:
284
"""Register a custom embedding function class."""
285
286
def config_to_embedding_function(config: Dict[str, Any]) -> EmbeddingFunction:
287
"""Create embedding function from configuration dictionary."""
288
289
known_embedding_functions: Dict[str, type] = {
290
# Dictionary of all available embedding functions
291
}
292
```
293
294
**Usage Example:**
295
296
```python
297
from chromadb.utils.embedding_functions import (
298
config_to_embedding_function,
299
known_embedding_functions
300
)
301
302
# Create from config
303
config = {
304
"name": "OpenAIEmbeddingFunction",
305
"api_key": "your-key",
306
"model_name": "text-embedding-3-small"
307
}
308
ef = config_to_embedding_function(config)
309
310
# List available functions
311
print("Available embedding functions:")
312
for name in known_embedding_functions.keys():
313
print(f" - {name}")
314
```
315
316
## Types
317
318
```python { .api }
319
from typing import List, Dict, Any, Optional, Protocol
320
from abc import ABC, abstractmethod
321
322
Documents = List[str]
323
Embeddings = List[List[float]]
324
325
class EmbeddingFunction(Protocol):
326
"""Protocol that all embedding functions must implement."""
327
328
def __call__(self, input: Documents) -> Embeddings: ...
329
def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings: ...
330
331
@staticmethod
332
def name() -> str: ...
333
334
@staticmethod
335
def build_from_config(config: Dict[str, Any]) -> "EmbeddingFunction": ...
336
337
def get_config(self) -> Dict[str, Any]: ...
338
def default_space(self) -> str: ...
339
def supported_spaces(self) -> List[str]: ...
340
```