VoyageAI embeddings integration for LangChain providing cutting-edge embedding models through the Voyage AI API
npx @tessl/cli install tessl/pypi-langchain-community@0.0.00
# VoyageAI Embeddings
1
2
VoyageAI embeddings integration for LangChain providing access to cutting-edge embedding models through the Voyage AI API. This integration implements the LangChain Embeddings interface to enable seamless text embedding generation for semantic search, document similarity, and vector-based retrieval systems.
3
4
## Package Information
5
6
- **Package Name**: langchain-community
7
- **Component**: VoyageEmbeddings
8
- **Language**: Python
9
- **Installation**: `pip install langchain-community`
10
11
## Core Imports
12
13
```python
14
from langchain_community.embeddings import VoyageEmbeddings
15
```
16
17
Alternative imports:
18
19
```python
20
from langchain_community.embeddings.voyageai import VoyageEmbeddings
21
```
22
23
From main langchain package (re-exports from community):
24
25
```python
26
from langchain.embeddings import VoyageEmbeddings
27
```
28
29
## Basic Usage
30
31
```python
32
from langchain_community.embeddings import VoyageEmbeddings
33
34
# Initialize with API key from environment (VOYAGE_API_KEY)
35
embeddings = VoyageEmbeddings()
36
37
# Or provide API key explicitly
38
embeddings = VoyageEmbeddings(voyage_api_key="your-api-key-here")
39
40
# Embed a single query
41
query = "What is machine learning?"
42
query_embedding = embeddings.embed_query(query)
43
print(f"Query embedding dimension: {len(query_embedding)}")
44
45
# Embed multiple documents
46
documents = [
47
"Machine learning is a subset of artificial intelligence.",
48
"Deep learning uses neural networks with multiple layers.",
49
"Natural language processing deals with text and speech."
50
]
51
doc_embeddings = embeddings.embed_documents(documents)
52
print(f"Document embeddings: {len(doc_embeddings)} vectors")
53
```
54
55
## Capabilities
56
57
### VoyageEmbeddings Class
58
59
Main class for generating embeddings using Voyage AI models. Supports batch processing, retry logic, and configurable parameters for optimal performance.
60
61
```python { .api }
62
class VoyageEmbeddings(BaseModel, Embeddings):
63
"""
64
Voyage embedding models integration for LangChain.
65
66
Inherits from:
67
BaseModel: Pydantic model for configuration and validation
68
Embeddings: LangChain embeddings interface
69
70
Attributes:
71
model (str): Voyage AI model name (default: "voyage-01")
72
voyage_api_base (str): API endpoint URL (default: "https://api.voyageai.com/v1/embeddings")
73
voyage_api_key (Optional[SecretStr]): API key (loaded from VOYAGE_API_KEY env var if not provided)
74
batch_size (int): Maximum texts per API request (default: 8)
75
max_retries (int): Maximum retry attempts (default: 6)
76
request_timeout (Optional[Union[float, Tuple[float, float]]]): Request timeout in seconds
77
show_progress_bar (bool): Show progress for large batches (default: False, requires tqdm)
78
"""
79
```
80
81
### Query Embedding
82
83
Embeds a single text query using the "query" input type, optimized for search and retrieval scenarios.
84
85
```python { .api }
86
def embed_query(self, text: str) -> List[float]:
87
"""
88
Embed a single query text.
89
90
Args:
91
text (str): The text to embed
92
93
Returns:
94
List[float]: Embedding vector
95
"""
96
```
97
98
Usage example:
99
100
```python { .api }
101
# Embed a search query
102
query = "python machine learning libraries"
103
query_vector = embeddings.embed_query(query)
104
```
105
106
### Document Embedding
107
108
Embeds multiple documents using the "document" input type, optimized for indexing and storage scenarios with automatic batching.
109
110
```python { .api }
111
def embed_documents(self, texts: List[str]) -> List[List[float]]:
112
"""
113
Embed multiple documents.
114
115
Args:
116
texts (List[str]): List of texts to embed
117
118
Returns:
119
List[List[float]]: List of embedding vectors
120
"""
121
```
122
123
Usage example:
124
125
```python { .api }
126
# Embed documents for indexing
127
documents = [
128
"Python is a versatile programming language",
129
"Machine learning requires large datasets",
130
"Neural networks process information in layers"
131
]
132
doc_vectors = embeddings.embed_documents(documents)
133
```
134
135
### General Text Embedding
136
137
Embeds texts with configurable input type for flexible use cases beyond query/document distinction.
138
139
```python { .api }
140
def embed_general_texts(
141
self,
142
texts: List[str],
143
*,
144
input_type: Optional[str] = None
145
) -> List[List[float]]:
146
"""
147
Embed texts with configurable input type.
148
149
Args:
150
texts (List[str]): List of texts to embed
151
input_type (str, optional): "query", "document", or None for unspecified
152
153
Returns:
154
List[List[float]]: List of embedding vectors
155
156
Raises:
157
ValueError: If input_type is not None, "query", or "document"
158
"""
159
```
160
161
Usage example:
162
163
```python { .api }
164
# Embed with explicit input type
165
texts = ["text classification", "sentiment analysis"]
166
vectors = embeddings.embed_general_texts(texts, input_type="query")
167
168
# Embed without specifying type
169
vectors = embeddings.embed_general_texts(texts)
170
```
171
172
### Retry Functionality
173
174
Utility function providing exponential backoff retry logic for robust API interaction.
175
176
```python { .api }
177
def embed_with_retry(embeddings: VoyageEmbeddings, **kwargs: Any) -> Any:
178
"""
179
Execute embedding with retry logic using exponential backoff.
180
181
Args:
182
embeddings (VoyageEmbeddings): Embeddings instance (used for max_retries config)
183
**kwargs: Additional arguments passed to requests.post()
184
185
Returns:
186
dict: API response data containing "data" field with embeddings
187
188
Raises:
189
RuntimeError: If API response lacks "data" field
190
"""
191
```
192
193
## Configuration Options
194
195
### API Configuration
196
197
```python { .api }
198
# Custom API endpoint and model
199
embeddings = VoyageEmbeddings(
200
model="voyage-01",
201
voyage_api_base="https://api.voyageai.com/v1/embeddings",
202
voyage_api_key="your-key"
203
)
204
```
205
206
### Performance Configuration
207
208
```python { .api }
209
# Configure batching and retries
210
embeddings = VoyageEmbeddings(
211
batch_size=16, # Larger batches for better throughput
212
max_retries=10, # More retries for unstable connections
213
request_timeout=30.0, # 30-second timeout
214
show_progress_bar=True # Show progress for large datasets
215
)
216
```
217
218
### Progress Tracking
219
220
For large embedding tasks, enable progress tracking:
221
222
```python
223
# Requires: pip install tqdm
224
embeddings = VoyageEmbeddings(show_progress_bar=True)
225
226
# Process large document set with progress bar
227
large_documents = ["doc " + str(i) for i in range(1000)]
228
embeddings_result = embeddings.embed_documents(large_documents)
229
```
230
231
## Types
232
233
### Input Types
234
235
Supported input type values for `embed_general_texts`:
236
237
- `"query"`: Optimized for search queries and questions
238
- `"document"`: Optimized for documents and content to be indexed
239
- `None`: Unspecified input type (default behavior)
240
241
### Return Types
242
243
All embedding methods return vectors with consistent dimensionality:
244
245
- **Embedding Vector**: `List[float]` (dimensions depend on model used)
246
- **Batch Embeddings**: `List[List[float]]` where each inner list contains the same number of dimensions
247
248
## Error Handling
249
250
### Common Exceptions
251
252
```python
253
try:
254
embeddings = VoyageEmbeddings(show_progress_bar=True)
255
result = embeddings.embed_general_texts(["test"], input_type="invalid")
256
except ImportError as e:
257
# tqdm not installed but show_progress_bar=True
258
print("Install tqdm: pip install tqdm")
259
except RuntimeError as e:
260
# API error or malformed response (missing "data" field)
261
print(f"API error: {e}")
262
except ValueError as e:
263
# Invalid input_type parameter (must be None, "query", or "document")
264
print(f"Invalid parameter: {e}")
265
```
266
267
### API Key Management
268
269
```python
270
import os
271
272
# Set API key via environment variable (recommended)
273
os.environ["VOYAGE_API_KEY"] = "your-api-key"
274
embeddings = VoyageEmbeddings()
275
276
# Or pass directly (less secure)
277
embeddings = VoyageEmbeddings(voyage_api_key="your-api-key")
278
```
279
280
## Integration Examples
281
282
### Semantic Search
283
284
```python
285
from langchain_community.embeddings import VoyageEmbeddings
286
import numpy as np
287
from sklearn.metrics.pairwise import cosine_similarity
288
289
# Initialize embeddings
290
embeddings = VoyageEmbeddings()
291
292
# Create document embeddings
293
documents = [
294
"Python is a programming language",
295
"Machine learning uses algorithms",
296
"Data science involves statistics"
297
]
298
doc_embeddings = embeddings.embed_documents(documents)
299
300
# Search with a query
301
query = "programming languages"
302
query_embedding = embeddings.embed_query(query)
303
304
# Calculate similarities
305
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]
306
best_match_idx = np.argmax(similarities)
307
308
print(f"Best match: {documents[best_match_idx]}")
309
print(f"Similarity: {similarities[best_match_idx]:.3f}")
310
```
311
312
### LangChain Integration
313
314
```python
315
from langchain.retrievers import KNNRetriever
316
from langchain_community.embeddings import VoyageEmbeddings
317
318
# Create retriever with VoyageAI embeddings
319
embeddings = VoyageEmbeddings()
320
documents = ["doc1", "doc2", "doc3"]
321
322
retriever = KNNRetriever.from_texts(documents, embeddings)
323
results = retriever.get_relevant_documents("search query")
324
```