or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-langchain-community

VoyageAI embeddings integration for LangChain providing cutting-edge embedding models through the Voyage AI API

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/langchain-community@0.0.x

To install, run

npx @tessl/cli install tessl/pypi-langchain-community@0.0.0

0

# VoyageAI Embeddings

1

2

VoyageAI embeddings integration for LangChain providing access to cutting-edge embedding models through the Voyage AI API. This integration implements the LangChain Embeddings interface to enable seamless text embedding generation for semantic search, document similarity, and vector-based retrieval systems.

3

4

## Package Information

5

6

- **Package Name**: langchain-community

7

- **Component**: VoyageEmbeddings

8

- **Language**: Python

9

- **Installation**: `pip install langchain-community`

10

11

## Core Imports

12

13

```python

14

from langchain_community.embeddings import VoyageEmbeddings

15

```

16

17

Alternative imports:

18

19

```python

20

from langchain_community.embeddings.voyageai import VoyageEmbeddings

21

```

22

23

From main langchain package (re-exports from community):

24

25

```python

26

from langchain.embeddings import VoyageEmbeddings

27

```

28

29

## Basic Usage

30

31

```python

32

from langchain_community.embeddings import VoyageEmbeddings

33

34

# Initialize with API key from environment (VOYAGE_API_KEY)

35

embeddings = VoyageEmbeddings()

36

37

# Or provide API key explicitly

38

embeddings = VoyageEmbeddings(voyage_api_key="your-api-key-here")

39

40

# Embed a single query

41

query = "What is machine learning?"

42

query_embedding = embeddings.embed_query(query)

43

print(f"Query embedding dimension: {len(query_embedding)}")

44

45

# Embed multiple documents

46

documents = [

47

"Machine learning is a subset of artificial intelligence.",

48

"Deep learning uses neural networks with multiple layers.",

49

"Natural language processing deals with text and speech."

50

]

51

doc_embeddings = embeddings.embed_documents(documents)

52

print(f"Document embeddings: {len(doc_embeddings)} vectors")

53

```

54

55

## Capabilities

56

57

### VoyageEmbeddings Class

58

59

Main class for generating embeddings using Voyage AI models. Supports batch processing, retry logic, and configurable parameters for optimal performance.

60

61

```python { .api }

62

class VoyageEmbeddings(BaseModel, Embeddings):

63

"""

64

Voyage embedding models integration for LangChain.

65

66

Inherits from:

67

BaseModel: Pydantic model for configuration and validation

68

Embeddings: LangChain embeddings interface

69

70

Attributes:

71

model (str): Voyage AI model name (default: "voyage-01")

72

voyage_api_base (str): API endpoint URL (default: "https://api.voyageai.com/v1/embeddings")

73

voyage_api_key (Optional[SecretStr]): API key (loaded from VOYAGE_API_KEY env var if not provided)

74

batch_size (int): Maximum texts per API request (default: 8)

75

max_retries (int): Maximum retry attempts (default: 6)

76

request_timeout (Optional[Union[float, Tuple[float, float]]]): Request timeout in seconds

77

show_progress_bar (bool): Show progress for large batches (default: False, requires tqdm)

78

"""

79

```

80

81

### Query Embedding

82

83

Embeds a single text query using the "query" input type, optimized for search and retrieval scenarios.

84

85

```python { .api }

86

def embed_query(self, text: str) -> List[float]:

87

"""

88

Embed a single query text.

89

90

Args:

91

text (str): The text to embed

92

93

Returns:

94

List[float]: Embedding vector

95

"""

96

```

97

98

Usage example:

99

100

```python { .api }

101

# Embed a search query

102

query = "python machine learning libraries"

103

query_vector = embeddings.embed_query(query)

104

```

105

106

### Document Embedding

107

108

Embeds multiple documents using the "document" input type, optimized for indexing and storage scenarios with automatic batching.

109

110

```python { .api }

111

def embed_documents(self, texts: List[str]) -> List[List[float]]:

112

"""

113

Embed multiple documents.

114

115

Args:

116

texts (List[str]): List of texts to embed

117

118

Returns:

119

List[List[float]]: List of embedding vectors

120

"""

121

```

122

123

Usage example:

124

125

```python { .api }

126

# Embed documents for indexing

127

documents = [

128

"Python is a versatile programming language",

129

"Machine learning requires large datasets",

130

"Neural networks process information in layers"

131

]

132

doc_vectors = embeddings.embed_documents(documents)

133

```

134

135

### General Text Embedding

136

137

Embeds texts with configurable input type for flexible use cases beyond query/document distinction.

138

139

```python { .api }

140

def embed_general_texts(

141

self,

142

texts: List[str],

143

*,

144

input_type: Optional[str] = None

145

) -> List[List[float]]:

146

"""

147

Embed texts with configurable input type.

148

149

Args:

150

texts (List[str]): List of texts to embed

151

input_type (str, optional): "query", "document", or None for unspecified

152

153

Returns:

154

List[List[float]]: List of embedding vectors

155

156

Raises:

157

ValueError: If input_type is not None, "query", or "document"

158

"""

159

```

160

161

Usage example:

162

163

```python { .api }

164

# Embed with explicit input type

165

texts = ["text classification", "sentiment analysis"]

166

vectors = embeddings.embed_general_texts(texts, input_type="query")

167

168

# Embed without specifying type

169

vectors = embeddings.embed_general_texts(texts)

170

```

171

172

### Retry Functionality

173

174

Utility function providing exponential backoff retry logic for robust API interaction.

175

176

```python { .api }

177

def embed_with_retry(embeddings: VoyageEmbeddings, **kwargs: Any) -> Any:

178

"""

179

Execute embedding with retry logic using exponential backoff.

180

181

Args:

182

embeddings (VoyageEmbeddings): Embeddings instance (used for max_retries config)

183

**kwargs: Additional arguments passed to requests.post()

184

185

Returns:

186

dict: API response data containing "data" field with embeddings

187

188

Raises:

189

RuntimeError: If API response lacks "data" field

190

"""

191

```

192

193

## Configuration Options

194

195

### API Configuration

196

197

```python { .api }

198

# Custom API endpoint and model

199

embeddings = VoyageEmbeddings(

200

model="voyage-01",

201

voyage_api_base="https://api.voyageai.com/v1/embeddings",

202

voyage_api_key="your-key"

203

)

204

```

205

206

### Performance Configuration

207

208

```python { .api }

209

# Configure batching and retries

210

embeddings = VoyageEmbeddings(

211

batch_size=16, # Larger batches for better throughput

212

max_retries=10, # More retries for unstable connections

213

request_timeout=30.0, # 30-second timeout

214

show_progress_bar=True # Show progress for large datasets

215

)

216

```

217

218

### Progress Tracking

219

220

For large embedding tasks, enable progress tracking:

221

222

```python

223

# Requires: pip install tqdm

224

embeddings = VoyageEmbeddings(show_progress_bar=True)

225

226

# Process large document set with progress bar

227

large_documents = ["doc " + str(i) for i in range(1000)]

228

embeddings_result = embeddings.embed_documents(large_documents)

229

```

230

231

## Types

232

233

### Input Types

234

235

Supported input type values for `embed_general_texts`:

236

237

- `"query"`: Optimized for search queries and questions

238

- `"document"`: Optimized for documents and content to be indexed

239

- `None`: Unspecified input type (default behavior)

240

241

### Return Types

242

243

All embedding methods return vectors with consistent dimensionality:

244

245

- **Embedding Vector**: `List[float]` (dimensions depend on model used)

246

- **Batch Embeddings**: `List[List[float]]` where each inner list contains the same number of dimensions

247

248

## Error Handling

249

250

### Common Exceptions

251

252

```python

253

try:

254

embeddings = VoyageEmbeddings(show_progress_bar=True)

255

result = embeddings.embed_general_texts(["test"], input_type="invalid")

256

except ImportError as e:

257

# tqdm not installed but show_progress_bar=True

258

print("Install tqdm: pip install tqdm")

259

except RuntimeError as e:

260

# API error or malformed response (missing "data" field)

261

print(f"API error: {e}")

262

except ValueError as e:

263

# Invalid input_type parameter (must be None, "query", or "document")

264

print(f"Invalid parameter: {e}")

265

```

266

267

### API Key Management

268

269

```python

270

import os

271

272

# Set API key via environment variable (recommended)

273

os.environ["VOYAGE_API_KEY"] = "your-api-key"

274

embeddings = VoyageEmbeddings()

275

276

# Or pass directly (less secure)

277

embeddings = VoyageEmbeddings(voyage_api_key="your-api-key")

278

```

279

280

## Integration Examples

281

282

### Semantic Search

283

284

```python

285

from langchain_community.embeddings import VoyageEmbeddings

286

import numpy as np

287

from sklearn.metrics.pairwise import cosine_similarity

288

289

# Initialize embeddings

290

embeddings = VoyageEmbeddings()

291

292

# Create document embeddings

293

documents = [

294

"Python is a programming language",

295

"Machine learning uses algorithms",

296

"Data science involves statistics"

297

]

298

doc_embeddings = embeddings.embed_documents(documents)

299

300

# Search with a query

301

query = "programming languages"

302

query_embedding = embeddings.embed_query(query)

303

304

# Calculate similarities

305

similarities = cosine_similarity([query_embedding], doc_embeddings)[0]

306

best_match_idx = np.argmax(similarities)

307

308

print(f"Best match: {documents[best_match_idx]}")

309

print(f"Similarity: {similarities[best_match_idx]:.3f}")

310

```

311

312

### LangChain Integration

313

314

```python

315

from langchain.retrievers import KNNRetriever

316

from langchain_community.embeddings import VoyageEmbeddings

317

318

# Create retriever with VoyageAI embeddings

319

embeddings = VoyageEmbeddings()

320

documents = ["doc1", "doc2", "doc3"]

321

322

retriever = KNNRetriever.from_texts(documents, embeddings)

323

results = retriever.get_relevant_documents("search query")

324

```