or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

clients.mdcollections.mdconfiguration.mddocuments.mdembedding-functions.mdindex.mdqueries.md

embedding-functions.mddocs/

0

# Embedding Functions

1

2

ChromaDB provides a comprehensive library of embedding functions for generating vector embeddings from text, supporting major AI providers and embedding models. Embedding functions are pluggable components that convert text into numerical representations for vector similarity search.

3

4

## Capabilities

5

6

### Default Embedding Function

7

8

ChromaDB includes a default ONNX-based embedding function that works out-of-the-box without requiring API keys.

9

10

```python { .api }

11

class DefaultEmbeddingFunction:

12

"""Default ONNX-based embedding function using all-MiniLM-L6-v2 model."""

13

14

def __call__(self, input: Documents) -> Embeddings:

15

"""Generate embeddings for input documents."""

16

17

def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings:

18

"""Generate embeddings with retry logic."""

19

```

20

21

**Usage Example:**

22

23

```python

24

import chromadb

25

26

# Uses DefaultEmbeddingFunction automatically

27

client = chromadb.EphemeralClient()

28

collection = client.create_collection("default_embeddings")

29

30

# Explicit usage

31

from chromadb.utils.embedding_functions import DefaultEmbeddingFunction

32

ef = DefaultEmbeddingFunction()

33

collection = client.create_collection("explicit_default", embedding_function=ef)

34

```

35

36

### OpenAI Embeddings

37

38

Generate embeddings using OpenAI's embedding models with API key authentication.

39

40

```python { .api }

41

class OpenAIEmbeddingFunction:

42

"""OpenAI embedding function using text-embedding-ada-002 or newer models."""

43

44

def __init__(

45

self,

46

api_key: str,

47

model_name: str = "text-embedding-ada-002",

48

api_base: Optional[str] = None,

49

api_type: Optional[str] = None,

50

api_version: Optional[str] = None,

51

deployment_id: Optional[str] = None

52

):

53

"""

54

Initialize OpenAI embedding function.

55

56

Args:

57

api_key: OpenAI API key

58

model_name: Model to use for embeddings

59

api_base: Custom API base URL

60

api_type: API type (e.g., 'azure')

61

api_version: API version for Azure

62

deployment_id: Deployment ID for Azure

63

"""

64

```

65

66

**Usage Example:**

67

68

```python

69

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

70

71

openai_ef = OpenAIEmbeddingFunction(

72

api_key="your-openai-api-key",

73

model_name="text-embedding-3-small"

74

)

75

76

collection = client.create_collection(

77

"openai_embeddings",

78

embedding_function=openai_ef

79

)

80

```

81

82

### Cohere Embeddings

83

84

Generate embeddings using Cohere's embedding models with support for different model types.

85

86

```python { .api }

87

class CohereEmbeddingFunction:

88

"""Cohere embedding function supporting various Cohere models."""

89

90

def __init__(

91

self,

92

api_key: str,

93

model_name: str = "embed-english-v2.0"

94

):

95

"""

96

Initialize Cohere embedding function.

97

98

Args:

99

api_key: Cohere API key

100

model_name: Cohere model to use for embeddings

101

"""

102

```

103

104

### HuggingFace Embeddings

105

106

Generate embeddings using HuggingFace Transformers models with local or remote execution.

107

108

```python { .api }

109

class HuggingFaceEmbeddingFunction:

110

"""HuggingFace Transformers embedding function."""

111

112

def __init__(

113

self,

114

model_name: str = "sentence-transformers/all-MiniLM-L6-v2",

115

device: str = "cpu",

116

normalize_embeddings: bool = True

117

):

118

"""

119

Initialize HuggingFace embedding function.

120

121

Args:

122

model_name: HuggingFace model identifier

123

device: Device to run model on ('cpu' or 'cuda')

124

normalize_embeddings: Whether to normalize embeddings

125

"""

126

```

127

128

**Usage Example:**

129

130

```python

131

from chromadb.utils.embedding_functions import HuggingFaceEmbeddingFunction

132

133

hf_ef = HuggingFaceEmbeddingFunction(

134

model_name="sentence-transformers/all-mpnet-base-v2",

135

device="cuda" if torch.cuda.is_available() else "cpu"

136

)

137

138

collection = client.create_collection(

139

"huggingface_embeddings",

140

embedding_function=hf_ef

141

)

142

```

143

144

### Sentence Transformers

145

146

Specialized interface for Sentence Transformers models optimized for semantic similarity.

147

148

```python { .api }

149

class SentenceTransformerEmbeddingFunction:

150

"""Sentence Transformers embedding function."""

151

152

def __init__(

153

self,

154

model_name: str = "all-MiniLM-L6-v2",

155

device: str = "cpu",

156

normalize_embeddings: bool = True

157

):

158

"""

159

Initialize Sentence Transformers embedding function.

160

161

Args:

162

model_name: Sentence Transformers model name

163

device: Device to run model on

164

normalize_embeddings: Whether to normalize embeddings

165

"""

166

```

167

168

### Google AI Embeddings

169

170

Generate embeddings using Google's AI models including PaLM and Vertex AI.

171

172

```python { .api }

173

class GooglePalmEmbeddingFunction:

174

"""Google PaLM embedding function."""

175

176

def __init__(self, api_key: str, model_name: str = "models/embedding-gecko-001"):

177

"""Initialize Google PaLM embedding function."""

178

179

class GoogleVertexEmbeddingFunction:

180

"""Google Vertex AI embedding function."""

181

182

def __init__(

183

self,

184

project_id: str,

185

region: str = "us-central1",

186

model_name: str = "textembedding-gecko"

187

):

188

"""Initialize Google Vertex AI embedding function."""

189

```

190

191

### Specialized Embedding Functions

192

193

ChromaDB includes many specialized embedding functions for specific use cases:

194

195

```python { .api }

196

class OllamaEmbeddingFunction:

197

"""Ollama local embedding function."""

198

199

class JinaEmbeddingFunction:

200

"""Jina AI embedding function."""

201

202

class VoyageAIEmbeddingFunction:

203

"""Voyage AI embedding function."""

204

205

class InstructorEmbeddingFunction:

206

"""Instructor embedding function."""

207

208

class OpenCLIPEmbeddingFunction:

209

"""OpenCLIP embedding function for images and text."""

210

211

class AmazonBedrockEmbeddingFunction:

212

"""Amazon Bedrock embedding function."""

213

214

class MistralEmbeddingFunction:

215

"""Mistral AI embedding function."""

216

```

217

218

### Custom Embedding Functions

219

220

Create custom embedding functions by implementing the EmbeddingFunction protocol.

221

222

```python { .api }

223

class EmbeddingFunction:

224

"""Protocol for embedding functions."""

225

226

def __call__(self, input: Documents) -> Embeddings:

227

"""Generate embeddings for input documents."""

228

229

def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings:

230

"""Generate embeddings with retry logic."""

231

232

@staticmethod

233

def name() -> str:

234

"""Return the name of the embedding function."""

235

236

@staticmethod

237

def build_from_config(config: Dict[str, Any]) -> "EmbeddingFunction":

238

"""Build embedding function from configuration."""

239

240

def get_config(self) -> Dict[str, Any]:

241

"""Get configuration for the embedding function."""

242

243

def default_space(self) -> str:

244

"""Return default distance metric ('cosine', 'l2', 'ip')."""

245

246

def supported_spaces(self) -> List[str]:

247

"""Return list of supported distance metrics."""

248

```

249

250

**Custom Implementation Example:**

251

252

```python

253

from chromadb.api.types import EmbeddingFunction, Documents, Embeddings

254

import requests

255

256

class CustomAPIEmbeddingFunction(EmbeddingFunction):

257

def __init__(self, api_url: str, api_key: str):

258

self.api_url = api_url

259

self.api_key = api_key

260

261

def __call__(self, input: Documents) -> Embeddings:

262

response = requests.post(

263

self.api_url,

264

headers={"Authorization": f"Bearer {self.api_key}"},

265

json={"texts": input}

266

)

267

return response.json()["embeddings"]

268

269

def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings:

270

# Implement retry logic

271

return self.__call__(input)

272

273

# Use custom embedding function

274

custom_ef = CustomAPIEmbeddingFunction("https://api.example.com/embed", "your-key")

275

collection = client.create_collection("custom_embeddings", embedding_function=custom_ef)

276

```

277

278

### Embedding Function Utilities

279

280

Utility functions for working with embedding functions.

281

282

```python { .api }

283

def register_embedding_function(ef_class: type) -> None:

284

"""Register a custom embedding function class."""

285

286

def config_to_embedding_function(config: Dict[str, Any]) -> EmbeddingFunction:

287

"""Create embedding function from configuration dictionary."""

288

289

known_embedding_functions: Dict[str, type] = {

290

# Dictionary of all available embedding functions

291

}

292

```

293

294

**Usage Example:**

295

296

```python

297

from chromadb.utils.embedding_functions import (

298

config_to_embedding_function,

299

known_embedding_functions

300

)

301

302

# Create from config

303

config = {

304

"name": "OpenAIEmbeddingFunction",

305

"api_key": "your-key",

306

"model_name": "text-embedding-3-small"

307

}

308

ef = config_to_embedding_function(config)

309

310

# List available functions

311

print("Available embedding functions:")

312

for name in known_embedding_functions.keys():

313

print(f" - {name}")

314

```

315

316

## Types

317

318

```python { .api }

319

from typing import List, Dict, Any, Optional, Protocol

320

from abc import ABC, abstractmethod

321

322

Documents = List[str]

323

Embeddings = List[List[float]]

324

325

class EmbeddingFunction(Protocol):

326

"""Protocol that all embedding functions must implement."""

327

328

def __call__(self, input: Documents) -> Embeddings: ...

329

def embed_with_retries(self, input: Documents, **retry_kwargs) -> Embeddings: ...

330

331

@staticmethod

332

def name() -> str: ...

333

334

@staticmethod

335

def build_from_config(config: Dict[str, Any]) -> "EmbeddingFunction": ...

336

337

def get_config(self) -> Dict[str, Any]: ...

338

def default_space(self) -> str: ...

339

def supported_spaces(self) -> List[str]: ...

340

```