Tessl Tile for npm/@langchain/aws@0.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

authentication.md chat-models.md embeddings.md index.md retrievers.md

embeddings.mddocs/

0
# Text Embeddings
1

2
Generate high-quality text embeddings using AWS Bedrock embedding models with automatic batching, text preprocessing, and comprehensive error handling for vector search and similarity applications.
3

4
## Capabilities
5

6
### BedrockEmbeddings Class
7

8
Primary embeddings class for generating text embeddings using AWS Bedrock embedding models, extending LangChain's Embeddings base class.
9

10
```typescript { .api }
11
/**
12
 * Generate text embeddings using AWS Bedrock embedding models
13
 */
14
class BedrockEmbeddings extends Embeddings implements BedrockEmbeddingsParams {
15
  constructor(fields?: BedrockEmbeddingsParams);
16
  
17
  /** Generate embedding for a single document/query */
18
  embedQuery(document: string): Promise<number[]>;
19
  
20
  /** Generate embeddings for multiple documents in parallel */
21
  embedDocuments(documents: string[]): Promise<number[][]>;
22
  
23
  /** Internal method for embedding text with retry logic */
24
  protected _embedText(text: string): Promise<number[]>;
25
}
26
```
27

28
**Usage Examples:**
29

30
```typescript
31
import { BedrockEmbeddings } from "@langchain/aws";
32

33
// Basic initialization
34
const embeddings = new BedrockEmbeddings({
35
  region: "us-east-1",
36
  model: "amazon.titan-embed-text-v1",
37
  credentials: {
38
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
39
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
40
  }
41
});
42

43
// Embed a single query
44
const queryEmbedding = await embeddings.embedQuery(
45
  "What is the capital of France?"
46
);
47
console.log(queryEmbedding.length); // 1536 for Titan v1
48

49
// Embed multiple documents
50
const documents = [
51
  "Paris is the capital of France.",
52
  "London is the capital of England.",
53
  "Berlin is the capital of Germany."
54
];
55

56
const documentEmbeddings = await embeddings.embedDocuments(documents);
57
console.log(documentEmbeddings.length); // 3
58
console.log(documentEmbeddings[0].length); // 1536 for each embedding
59
```
60

61
### Constructor Configuration
62

63
Configuration options for initializing BedrockEmbeddings instances.
64

65
```typescript { .api }
66
interface BedrockEmbeddingsParams extends EmbeddingsParams {
67
  /** Embedding model to use (default: "amazon.titan-embed-text-v1") */
68
  model?: string;
69
  
70
  /** Custom BedrockRuntimeClient instance */
71
  client?: BedrockRuntimeClient;
72
  
73
  /** Configuration options for BedrockRuntimeClient */
74
  clientOptions?: BedrockRuntimeClientConfig;
75
  
76
  /** AWS region for API calls */
77
  region?: string;
78
  
79
  /** AWS credentials for authentication */
80
  credentials?: CredentialType;
81
}
82
```
83

84
### Supported Models
85

86
Popular embedding models available through AWS Bedrock:
87

88
**Amazon Titan Models:**
89
- `amazon.titan-embed-text-v1` - 1536 dimensions, optimized for English text
90
- `amazon.titan-embed-text-v2:0` - 1024 dimensions, improved multilingual support
91

92
**Cohere Models:**
93
- `cohere.embed-english-v3` - English-optimized embeddings
94
- `cohere.embed-multilingual-v3` - Multilingual embeddings
95

96
**Usage Examples:**
97

98
```typescript
99
// Using Titan v2
100
const titanV2 = new BedrockEmbeddings({
101
  region: "us-east-1",
102
  model: "amazon.titan-embed-text-v2:0"
103
});
104

105
// Using Cohere multilingual
106
const cohereMulti = new BedrockEmbeddings({
107
  region: "us-east-1",
108
  model: "cohere.embed-multilingual-v3"
109
});
110

111
// Test different models
112
const text = "Machine learning is transforming industries";
113
const titanEmbedding = await titanV2.embedQuery(text);
114
const cohereEmbedding = await cohereMulti.embedQuery(text);
115

116
console.log("Titan dimensions:", titanEmbedding.length);
117
console.log("Cohere dimensions:", cohereEmbedding.length);
118
```
119

120
### Advanced Configuration
121

122
#### Custom Client Configuration
123

124
```typescript
125
import { BedrockRuntimeClient } from "@aws-sdk/client-bedrock-runtime";
126

127
// Custom client with specific configuration
128
const customClient = new BedrockRuntimeClient({
129
  region: "us-west-2",
130
  credentials: {
131
    accessKeyId: "your-access-key",
132
    secretAccessKey: "your-secret-key"
133
  },
134
  maxAttempts: 3,
135
  requestHandler: {
136
    connectionTimeout: 5000,
137
    socketTimeout: 60000
138
  }
139
});
140

141
const embeddings = new BedrockEmbeddings({
142
  client: customClient,
143
  model: "amazon.titan-embed-text-v1"
144
});
145
```
146

147
#### Client Options
148

149
```typescript
150
// Configure client options without providing custom client
151
const embeddings = new BedrockEmbeddings({
152
  region: "us-east-1",
153
  model: "amazon.titan-embed-text-v1",
154
  clientOptions: {
155
    maxAttempts: 5,
156
    requestHandler: {
157
      connectionTimeout: 10000,
158
      socketTimeout: 120000
159
    }
160
  }
161
});
162
```
163

164
### Batch Processing
165

166
Automatic batching capabilities for processing large sets of documents efficiently.
167

168
**Properties:**
169

170
```typescript { .api }
171
class BedrockEmbeddings {
172
  /** Maximum batch size for processing documents (default: 512) */
173
  batchSize: number;
174
}
175
```
176

177
**Usage Examples:**
178

179
```typescript
180
// Configure batch size
181
const embeddings = new BedrockEmbeddings({
182
  region: "us-east-1",
183
  model: "amazon.titan-embed-text-v1"
184
});
185

186
embeddings.batchSize = 100; // Process 100 documents at a time
187

188
// Process large document set
189
const largeDocumentSet = Array.from({ length: 1000 }, (_, i) => 
190
  `Document ${i + 1}: This is sample content for document number ${i + 1}.`
191
);
192

193
// Embeddings will be processed in batches automatically
194
const embeddings_result = await embeddings.embedDocuments(largeDocumentSet);
195
console.log(`Processed ${embeddings_result.length} documents`);
196
```
197

198
### Text Preprocessing
199

200
Automatic text cleaning and preprocessing for optimal embedding performance.
201

202
**Features:**
203
- **Newline Replacement**: Converts newlines to spaces to improve performance
204
- **Whitespace Normalization**: Handles various whitespace characters
205
- **Error Handling**: Comprehensive error handling with descriptive messages
206

207
**Usage Examples:**
208

209
```typescript
210
// Text with various formatting will be automatically cleaned
211
const messyText = `
212
This is a document
213
with multiple lines
214
and    extra    spaces.
215

216
It has paragraph breaks too.
217
`;
218

219
const embedding = await embeddings.embedQuery(messyText);
220
// Text is automatically cleaned: "This is a document with multiple lines and extra spaces. It has paragraph breaks too."
221
```
222

223
### Error Handling
224

225
Comprehensive error handling with descriptive error messages and retry logic.
226

227
**Usage Examples:**
228

229
```typescript
230
try {
231
  const embedding = await embeddings.embedQuery("Sample text");
232
} catch (error) {
233
  if (error.message.includes("credentials")) {
234
    console.error("Authentication failed - check AWS credentials");
235
  } else if (error.message.includes("region")) {
236
    console.error("Invalid region - check AWS region configuration");
237
  } else if (error.message.includes("model")) {
238
    console.error("Model not available - check model ID and region support");
239
  } else {
240
    console.error("Embedding failed:", error.message);
241
  }
242
}
243

244
// Handle document batch errors
245
try {
246
  const embeddings_result = await embeddings.embedDocuments([
247
    "Valid document",
248
    "", // Empty document might cause issues
249
    "Another valid document"
250
  ]);
251
} catch (error) {
252
  console.error("Batch embedding failed:", error.message);
253
  // Retry with individual documents to identify problematic ones
254
}
255
```
256

257
### Vector Search Integration
258

259
Common patterns for using embeddings with vector databases and similarity search.
260

261
**Usage Examples:**
262

263
```typescript
264
// Create embeddings for vector search
265
const documents = [
266
  "Artificial intelligence is transforming healthcare",
267
  "Machine learning models require large datasets",
268
  "Deep learning uses neural networks with multiple layers",
269
  "Natural language processing enables computers to understand text"
270
];
271

272
const docEmbeddings = await embeddings.embedDocuments(documents);
273

274
// Store in vector database (pseudo-code)
275
const vectorStore = new VectorDatabase();
276
documents.forEach((doc, index) => {
277
  vectorStore.add({
278
    id: index,
279
    text: doc,
280
    embedding: docEmbeddings[index]
281
  });
282
});
283

284
// Query for similar documents
285
const query = "How does AI help in medical applications?";
286
const queryEmbedding = await embeddings.embedQuery(query);
287

288
// Find similar documents (pseudo-code)
289
const similarDocs = vectorStore.similaritySearch(queryEmbedding, { limit: 3 });
290
```
291

292
### Performance Optimization
293

294
Best practices for optimal performance with BedrockEmbeddings.
295

296
**Tips:**
297

298
1. **Batch Processing**: Use `embedDocuments()` for multiple texts instead of individual `embedQuery()` calls
299
2. **Text Length**: Keep texts under model limits (typically 8192 tokens for Titan models)
300
3. **Regional Optimization**: Use regions closest to your application for lower latency
301
4. **Connection Reuse**: Reuse BedrockEmbeddings instances to benefit from connection pooling
302

303
```typescript
304
// Efficient batch processing
305
const embeddings = new BedrockEmbeddings({
306
  region: "us-east-1", // Use closest region
307
  model: "amazon.titan-embed-text-v1"
308
});
309

310
// Process in optimal batch sizes
311
const batchSize = 50; // Adjust based on your needs
312
const allDocuments = [...]; // Your document array
313

314
const allEmbeddings = [];
315
for (let i = 0; i < allDocuments.length; i += batchSize) {
316
  const batch = allDocuments.slice(i, i + batchSize);
317
  const batchEmbeddings = await embeddings.embedDocuments(batch);
318
  allEmbeddings.push(...batchEmbeddings);
319
}
320
```
321

322
### Cost Optimization
323

324
Understanding and optimizing costs for Bedrock embedding usage.
325

326
**Cost Factors:**
327
- **Input Tokens**: Charged per 1,000 input tokens
328
- **Model Type**: Different models have different pricing
329
- **Region**: Pricing may vary by AWS region
330

331
**Optimization Strategies:**
332

333
```typescript
334
// Monitor token usage
335
const embeddings = new BedrockEmbeddings({
336
  region: "us-east-1",
337
  model: "amazon.titan-embed-text-v1"
338
});
339

340
// Estimate tokens (rough approximation: 1 token ≈ 4 characters)
341
function estimateTokens(text: string): number {
342
  return Math.ceil(text.length / 4);
343
}
344

345
const documents = ["Document 1", "Document 2", "Document 3"];
346
const totalTokens = documents.reduce((sum, doc) => sum + estimateTokens(doc), 0);
347
console.log(`Estimated tokens: ${totalTokens}`);
348

349
// Consider text truncation for very long documents
350
const maxTokens = 8000; // Leave buffer for model limits
351
const truncatedDocs = documents.map(doc => {
352
  const estimatedTokens = estimateTokens(doc);
353
  if (estimatedTokens > maxTokens) {
354
    // Rough truncation - in production, use proper tokenization
355
    const maxChars = maxTokens * 4;
356
    return doc.substring(0, maxChars);
357
  }
358
  return doc;
359
});
360

361
const embeddings_result = await embeddings.embedDocuments(truncatedDocs);
362
```

Version

Tile

Files

embeddings.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

embeddings.mddocs/