Tessl Tile for pypi/langchain-community@0.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-langchain-community

VoyageAI embeddings integration for LangChain providing cutting-edge embedding models through the Voyage AI API

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/langchain-community@0.0.x

To install, run

npx @tessl/cli install tessl/pypi-langchain-community@0.0.0

0
# VoyageAI Embeddings
1

2
VoyageAI embeddings integration for LangChain providing access to cutting-edge embedding models through the Voyage AI API. This integration implements the LangChain Embeddings interface to enable seamless text embedding generation for semantic search, document similarity, and vector-based retrieval systems.
3

4
## Package Information
5

6
- **Package Name**: langchain-community
7
- **Component**: VoyageEmbeddings
8
- **Language**: Python
9
- **Installation**: `pip install langchain-community`
10

11
## Core Imports
12

13
```python
14
from langchain_community.embeddings import VoyageEmbeddings
15
```
16

17
Alternative imports:
18

19
```python
20
from langchain_community.embeddings.voyageai import VoyageEmbeddings
21
```
22

23
From main langchain package (re-exports from community):
24

25
```python
26
from langchain.embeddings import VoyageEmbeddings
27
```
28

29
## Basic Usage
30

31
```python
32
from langchain_community.embeddings import VoyageEmbeddings
33

34
# Initialize with API key from environment (VOYAGE_API_KEY)
35
embeddings = VoyageEmbeddings()
36

37
# Or provide API key explicitly
38
embeddings = VoyageEmbeddings(voyage_api_key="your-api-key-here")
39

40
# Embed a single query
41
query = "What is machine learning?"
42
query_embedding = embeddings.embed_query(query)
43
print(f"Query embedding dimension: {len(query_embedding)}")
44

45
# Embed multiple documents
46
documents = [
47
    "Machine learning is a subset of artificial intelligence.",
48
    "Deep learning uses neural networks with multiple layers.",
49
    "Natural language processing deals with text and speech."
50
]
51
doc_embeddings = embeddings.embed_documents(documents)
52
print(f"Document embeddings: {len(doc_embeddings)} vectors")
53
```
54

55
## Capabilities
56

57
### VoyageEmbeddings Class
58

59
Main class for generating embeddings using Voyage AI models. Supports batch processing, retry logic, and configurable parameters for optimal performance.
60

61
```python { .api }
62
class VoyageEmbeddings(BaseModel, Embeddings):
63
    """
64
    Voyage embedding models integration for LangChain.
65
    
66
    Inherits from:
67
        BaseModel: Pydantic model for configuration and validation
68
        Embeddings: LangChain embeddings interface
69
    
70
    Attributes:
71
        model (str): Voyage AI model name (default: "voyage-01")
72
        voyage_api_base (str): API endpoint URL (default: "https://api.voyageai.com/v1/embeddings")
73
        voyage_api_key (Optional[SecretStr]): API key (loaded from VOYAGE_API_KEY env var if not provided)
74
        batch_size (int): Maximum texts per API request (default: 8)
75
        max_retries (int): Maximum retry attempts (default: 6)
76
        request_timeout (Optional[Union[float, Tuple[float, float]]]): Request timeout in seconds
77
        show_progress_bar (bool): Show progress for large batches (default: False, requires tqdm)
78
    """
79
```
80

81
### Query Embedding
82

83
Embeds a single text query using the "query" input type, optimized for search and retrieval scenarios.
84

85
```python { .api }
86
def embed_query(self, text: str) -> List[float]:
87
    """
88
    Embed a single query text.
89
    
90
    Args:
91
        text (str): The text to embed
92
        
93
    Returns:
94
        List[float]: Embedding vector
95
    """
96
```
97

98
Usage example:
99

100
```python { .api }
101
# Embed a search query
102
query = "python machine learning libraries"
103
query_vector = embeddings.embed_query(query)
104
```
105

106
### Document Embedding
107

108
Embeds multiple documents using the "document" input type, optimized for indexing and storage scenarios with automatic batching.
109

110
```python { .api }
111
def embed_documents(self, texts: List[str]) -> List[List[float]]:
112
    """
113
    Embed multiple documents.
114
    
115
    Args:
116
        texts (List[str]): List of texts to embed
117
        
118
    Returns:
119
        List[List[float]]: List of embedding vectors
120
    """
121
```
122

123
Usage example:
124

125
```python { .api }
126
# Embed documents for indexing
127
documents = [
128
    "Python is a versatile programming language",
129
    "Machine learning requires large datasets",
130
    "Neural networks process information in layers"
131
]
132
doc_vectors = embeddings.embed_documents(documents)
133
```
134

135
### General Text Embedding
136

137
Embeds texts with configurable input type for flexible use cases beyond query/document distinction.
138

139
```python { .api }
140
def embed_general_texts(
141
    self,
142
    texts: List[str],
143
    *,
144
    input_type: Optional[str] = None
145
) -> List[List[float]]:
146
    """
147
    Embed texts with configurable input type.
148
    
149
    Args:
150
        texts (List[str]): List of texts to embed
151
        input_type (str, optional): "query", "document", or None for unspecified
152
        
153
    Returns:
154
        List[List[float]]: List of embedding vectors
155
        
156
    Raises:
157
        ValueError: If input_type is not None, "query", or "document"
158
    """
159
```
160

161
Usage example:
162

163
```python { .api }
164
# Embed with explicit input type
165
texts = ["text classification", "sentiment analysis"]
166
vectors = embeddings.embed_general_texts(texts, input_type="query")
167

168
# Embed without specifying type
169
vectors = embeddings.embed_general_texts(texts)
170
```
171

172
### Retry Functionality
173

174
Utility function providing exponential backoff retry logic for robust API interaction.
175

176
```python { .api }
177
def embed_with_retry(embeddings: VoyageEmbeddings, **kwargs: Any) -> Any:
178
    """
179
    Execute embedding with retry logic using exponential backoff.
180
    
181
    Args:
182
        embeddings (VoyageEmbeddings): Embeddings instance (used for max_retries config)
183
        **kwargs: Additional arguments passed to requests.post()
184
        
185
    Returns:
186
        dict: API response data containing "data" field with embeddings
187
        
188
    Raises:
189
        RuntimeError: If API response lacks "data" field
190
    """
191
```
192

193
## Configuration Options
194

195
### API Configuration
196

197
```python { .api }
198
# Custom API endpoint and model
199
embeddings = VoyageEmbeddings(
200
    model="voyage-01",
201
    voyage_api_base="https://api.voyageai.com/v1/embeddings",
202
    voyage_api_key="your-key"
203
)
204
```
205

206
### Performance Configuration
207

208
```python { .api }
209
# Configure batching and retries
210
embeddings = VoyageEmbeddings(
211
    batch_size=16,          # Larger batches for better throughput
212
    max_retries=10,         # More retries for unstable connections
213
    request_timeout=30.0,   # 30-second timeout
214
    show_progress_bar=True  # Show progress for large datasets
215
)
216
```
217

218
### Progress Tracking
219

220
For large embedding tasks, enable progress tracking:
221

222
```python
223
# Requires: pip install tqdm
224
embeddings = VoyageEmbeddings(show_progress_bar=True)
225

226
# Process large document set with progress bar
227
large_documents = ["doc " + str(i) for i in range(1000)]
228
embeddings_result = embeddings.embed_documents(large_documents)
229
```
230

231
## Types
232

233
### Input Types
234

235
Supported input type values for `embed_general_texts`:
236

237
- `"query"`: Optimized for search queries and questions
238
- `"document"`: Optimized for documents and content to be indexed
239
- `None`: Unspecified input type (default behavior)
240

241
### Return Types
242

243
All embedding methods return vectors with consistent dimensionality:
244

245
- **Embedding Vector**: `List[float]` (dimensions depend on model used)
246
- **Batch Embeddings**: `List[List[float]]` where each inner list contains the same number of dimensions
247

248
## Error Handling
249

250
### Common Exceptions
251

252
```python
253
try:
254
    embeddings = VoyageEmbeddings(show_progress_bar=True)
255
    result = embeddings.embed_general_texts(["test"], input_type="invalid")
256
except ImportError as e:
257
    # tqdm not installed but show_progress_bar=True
258
    print("Install tqdm: pip install tqdm")
259
except RuntimeError as e:
260
    # API error or malformed response (missing "data" field)
261
    print(f"API error: {e}")
262
except ValueError as e:
263
    # Invalid input_type parameter (must be None, "query", or "document")
264
    print(f"Invalid parameter: {e}")
265
```
266

267
### API Key Management
268

269
```python
270
import os
271

272
# Set API key via environment variable (recommended)
273
os.environ["VOYAGE_API_KEY"] = "your-api-key"
274
embeddings = VoyageEmbeddings()
275

276
# Or pass directly (less secure)
277
embeddings = VoyageEmbeddings(voyage_api_key="your-api-key")
278
```
279

280
## Integration Examples
281

282
### Semantic Search
283

284
```python
285
from langchain_community.embeddings import VoyageEmbeddings
286
import numpy as np
287
from sklearn.metrics.pairwise import cosine_similarity
288

289
# Initialize embeddings
290
embeddings = VoyageEmbeddings()
291

292
# Create document embeddings
293
documents = [
294
    "Python is a programming language",
295
    "Machine learning uses algorithms",
296
    "Data science involves statistics"
297
]
298
doc_embeddings = embeddings.embed_documents(documents)
299

300
# Search with a query
301
query = "programming languages"
302
query_embedding = embeddings.embed_query(query)
303

304
# Calculate similarities
305
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]
306
best_match_idx = np.argmax(similarities)
307

308
print(f"Best match: {documents[best_match_idx]}")
309
print(f"Similarity: {similarities[best_match_idx]:.3f}")
310
```
311

312
### LangChain Integration
313

314
```python
315
from langchain.retrievers import KNNRetriever
316
from langchain_community.embeddings import VoyageEmbeddings
317

318
# Create retriever with VoyageAI embeddings
319
embeddings = VoyageEmbeddings()
320
documents = ["doc1", "doc2", "doc3"]
321

322
retriever = KNNRetriever.from_texts(documents, embeddings)
323
results = retriever.get_relevant_documents("search query")
324
```