or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

authentication.mdchat-models.mdembeddings.mdindex.mdretrievers.md

retrievers.mddocs/

0

# Document Retrieval

1

2

Retrieve relevant documents from AWS services for RAG applications and knowledge search workflows using Amazon Kendra's intelligent search capabilities and Amazon Bedrock Knowledge Bases for vector-based retrieval.

3

4

## Capabilities

5

6

### AmazonKendraRetriever Class

7

8

Intelligent document retrieval using Amazon Kendra search service with support for semantic search, attribute filtering, and automatic fallback between API methods.

9

10

```typescript { .api }

11

/**

12

* Retriever for Amazon Kendra intelligent search service

13

*/

14

class AmazonKendraRetriever extends BaseRetriever {

15

constructor(args: AmazonKendraRetrieverArgs);

16

17

/** Main method to retrieve relevant documents for a query */

18

getRelevantDocuments(query: string): Promise<Document[]>;

19

20

/** Query Kendra with Retrieve API and fallback to Query API */

21

queryKendra(query: string, topK: number, attributeFilter?: AttributeFilter): Promise<Document[]>;

22

23

/** Combine title and excerpt into single text */

24

combineText(title?: string, excerpt?: string): string;

25

26

/** Clean result text by removing extra whitespace */

27

cleanResult(text: string): string;

28

29

/** Extract document attributes from Kendra response */

30

getDocAttributes(attributes?: DocumentAttribute[]): Record<string, any>;

31

}

32

```

33

34

**Usage Examples:**

35

36

```typescript

37

import { AmazonKendraRetriever } from "@langchain/aws";

38

39

// Basic initialization

40

const kendraRetriever = new AmazonKendraRetriever({

41

indexId: "your-kendra-index-id",

42

topK: 10,

43

region: "us-east-1",

44

clientOptions: {

45

credentials: {

46

accessKeyId: process.env.AWS_ACCESS_KEY_ID,

47

secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY

48

}

49

}

50

});

51

52

// Retrieve documents

53

const documents = await kendraRetriever.getRelevantDocuments(

54

"How do I configure SSL certificates?"

55

);

56

57

documents.forEach((doc, index) => {

58

console.log(`Document ${index + 1}:`);

59

console.log(`Content: ${doc.pageContent}`);

60

console.log(`Source: ${doc.metadata.source}`);

61

console.log(`Title: ${doc.metadata.title}`);

62

console.log(`---`);

63

});

64

```

65

66

### AmazonKnowledgeBaseRetriever Class

67

68

Vector-based document retrieval using Amazon Bedrock Knowledge Bases for RAG workflows with support for hybrid search and advanced filtering.

69

70

```typescript { .api }

71

/**

72

* Retriever for Amazon Bedrock Knowledge Bases RAG workflow

73

*/

74

class AmazonKnowledgeBaseRetriever extends BaseRetriever {

75

constructor(args: AmazonKnowledgeBaseRetrieverArgs);

76

77

/** Main method to retrieve relevant documents for a query */

78

getRelevantDocuments(query: string): Promise<Document[]>;

79

80

/** Query knowledge base directly with advanced parameters */

81

queryKnowledgeBase(query: string, topK: number, filter?: RetrievalFilter, overrideSearchType?: SearchType): Promise<Document[]>;

82

83

/** Clean result text by normalizing whitespace and removing ellipses */

84

cleanResult(text: string): string;

85

}

86

```

87

88

**Usage Examples:**

89

90

```typescript

91

import { AmazonKnowledgeBaseRetriever } from "@langchain/aws";

92

93

// Basic initialization

94

const kbRetriever = new AmazonKnowledgeBaseRetriever({

95

knowledgeBaseId: "your-knowledge-base-id",

96

topK: 5,

97

region: "us-east-1",

98

clientOptions: {

99

credentials: {

100

accessKeyId: process.env.AWS_ACCESS_KEY_ID,

101

secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY

102

}

103

}

104

});

105

106

// Retrieve documents with hybrid search

107

const hybridRetriever = new AmazonKnowledgeBaseRetriever({

108

knowledgeBaseId: "your-knowledge-base-id",

109

topK: 8,

110

region: "us-east-1",

111

overrideSearchType: "HYBRID" // or "SEMANTIC"

112

});

113

114

const documents = await hybridRetriever.getRelevantDocuments(

115

"What are the security best practices for cloud deployment?"

116

);

117

```

118

119

### Configuration Interfaces

120

121

#### AmazonKendraRetriever Configuration

122

123

```typescript { .api }

124

interface AmazonKendraRetrieverArgs {

125

/** Amazon Kendra index identifier */

126

indexId: string;

127

128

/** Maximum number of documents to retrieve */

129

topK: number;

130

131

/** AWS region where the Kendra index is located */

132

region: string;

133

134

/** Optional attribute filter for refined search */

135

attributeFilter?: AttributeFilter;

136

137

/** Optional Kendra client configuration */

138

clientOptions?: KendraClientConfig;

139

}

140

```

141

142

#### AmazonKnowledgeBaseRetriever Configuration

143

144

```typescript { .api }

145

interface AmazonKnowledgeBaseRetrieverArgs {

146

/** Amazon Bedrock Knowledge Base identifier */

147

knowledgeBaseId: string;

148

149

/** Maximum number of documents to retrieve */

150

topK: number;

151

152

/** AWS region where the Knowledge Base is located */

153

region: string;

154

155

/** Optional Bedrock Agent Runtime client configuration */

156

clientOptions?: BedrockAgentRuntimeClientConfig;

157

158

/** Optional retrieval filter for refined search */

159

filter?: RetrievalFilter;

160

161

/** Override search type (HYBRID or SEMANTIC) */

162

overrideSearchType?: SearchType;

163

}

164

```

165

166

### Advanced Filtering

167

168

#### Kendra Attribute Filtering

169

170

Amazon Kendra supports sophisticated attribute-based filtering for precise document retrieval.

171

172

```typescript { .api }

173

type AttributeFilter = {

174

AndAllFilters?: AttributeFilter[];

175

OrAllFilters?: AttributeFilter[];

176

NotFilter?: AttributeFilter;

177

EqualsTo?: DocumentAttribute;

178

ContainsAll?: DocumentAttribute;

179

ContainsAny?: DocumentAttribute;

180

GreaterThan?: DocumentAttribute;

181

GreaterThanOrEquals?: DocumentAttribute;

182

LessThan?: DocumentAttribute;

183

LessThanOrEquals?: DocumentAttribute;

184

};

185

```

186

187

**Usage Examples:**

188

189

```typescript

190

// Filter by document source and date

191

const kendraWithFilters = new AmazonKendraRetriever({

192

indexId: "your-index-id",

193

topK: 10,

194

region: "us-east-1",

195

attributeFilter: {

196

AndAllFilters: [

197

{

198

EqualsTo: {

199

Key: "_source_uri",

200

Value: { StringValue: "https://docs.example.com" }

201

}

202

},

203

{

204

GreaterThanOrEquals: {

205

Key: "_last_updated_at",

206

Value: { DateValue: new Date("2024-01-01") }

207

}

208

}

209

]

210

}

211

});

212

213

// Filter by category with OR logic

214

const categoryFilter = new AmazonKendraRetriever({

215

indexId: "your-index-id",

216

topK: 15,

217

region: "us-east-1",

218

attributeFilter: {

219

OrAllFilters: [

220

{

221

EqualsTo: {

222

Key: "category",

223

Value: { StringValue: "documentation" }

224

}

225

},

226

{

227

EqualsTo: {

228

Key: "category",

229

Value: { StringValue: "tutorial" }

230

}

231

}

232

]

233

}

234

});

235

236

const docs = await categoryFilter.getRelevantDocuments("API authentication");

237

```

238

239

#### Knowledge Base Filtering

240

241

Amazon Bedrock Knowledge Bases support metadata-based filtering for targeted retrieval.

242

243

```typescript { .api }

244

type RetrievalFilter = {

245

equals?: {

246

key: string;

247

value: string | number | boolean;

248

};

249

notEquals?: {

250

key: string;

251

value: string | number | boolean;

252

};

253

lessThan?: {

254

key: string;

255

value: number;

256

};

257

lessThanOrEquals?: {

258

key: string;

259

value: number;

260

};

261

greaterThan?: {

262

key: string;

263

value: number;

264

};

265

greaterThanOrEquals?: {

266

key: string;

267

value: number;

268

};

269

in?: {

270

key: string;

271

value: (string | number | boolean)[];

272

};

273

notIn?: {

274

key: string;

275

value: (string | number | boolean)[];

276

};

277

startsWith?: {

278

key: string;

279

value: string;

280

};

281

listContains?: {

282

key: string;

283

value: string | number | boolean;

284

};

285

stringContains?: {

286

key: string;

287

value: string;

288

};

289

andAll?: RetrievalFilter[];

290

orAll?: RetrievalFilter[];

291

};

292

```

293

294

**Usage Examples:**

295

296

```typescript

297

// Filter by document type and recency

298

const kbWithFilters = new AmazonKnowledgeBaseRetriever({

299

knowledgeBaseId: "your-kb-id",

300

topK: 10,

301

region: "us-east-1",

302

filter: {

303

andAll: [

304

{

305

equals: {

306

key: "document_type",

307

value: "user_guide"

308

}

309

},

310

{

311

greaterThan: {

312

key: "publish_date",

313

value: 20240101

314

}

315

}

316

]

317

}

318

});

319

320

// Filter by multiple categories

321

const multiCategoryKB = new AmazonKnowledgeBaseRetriever({

322

knowledgeBaseId: "your-kb-id",

323

topK: 8,

324

region: "us-east-1",

325

filter: {

326

in: {

327

key: "category",

328

value: ["api", "security", "deployment"]

329

}

330

}

331

});

332

333

const securityDocs = await multiCategoryKB.getRelevantDocuments(

334

"How to implement OAuth2 authentication?"

335

);

336

```

337

338

### Search Types and Strategies

339

340

#### Kendra Search Capabilities

341

342

Amazon Kendra automatically uses intelligent search combining keyword and semantic understanding.

343

344

**Features:**

345

- **Intelligent Ranking**: Combines relevance scoring with document authority

346

- **Natural Language Queries**: Understands questions in natural language

347

- **Answer Extraction**: Can provide direct answers when available

348

- **Faceted Search**: Supports filtering by document attributes

349

350

#### Knowledge Base Search Types

351

352

Amazon Bedrock Knowledge Bases support different search strategies.

353

354

```typescript { .api }

355

type SearchType = "HYBRID" | "SEMANTIC";

356

```

357

358

**Usage Examples:**

359

360

```typescript

361

// Semantic search - pure vector similarity

362

const semanticRetriever = new AmazonKnowledgeBaseRetriever({

363

knowledgeBaseId: "your-kb-id",

364

topK: 10,

365

region: "us-east-1",

366

overrideSearchType: "SEMANTIC"

367

});

368

369

// Hybrid search - combines vector similarity with keyword matching

370

const hybridRetriever = new AmazonKnowledgeBaseRetriever({

371

knowledgeBaseId: "your-kb-id",

372

topK: 10,

373

region: "us-east-1",

374

overrideSearchType: "HYBRID"

375

});

376

377

// Compare search strategies

378

const query = "microservices architecture patterns";

379

const semanticResults = await semanticRetriever.getRelevantDocuments(query);

380

const hybridResults = await hybridRetriever.getRelevantDocuments(query);

381

382

console.log("Semantic results:", semanticResults.length);

383

console.log("Hybrid results:", hybridResults.length);

384

```

385

386

### Document Metadata and Content

387

388

Both retrievers return rich metadata alongside document content for enhanced RAG applications.

389

390

#### Kendra Document Structure

391

392

```typescript

393

// Example Kendra document metadata

394

{

395

pageContent: "Document content text...",

396

metadata: {

397

source: "https://docs.example.com/guide.html",

398

title: "Configuration Guide",

399

excerpt: "Brief excerpt from the document...",

400

document_attributes: {

401

category: "documentation",

402

last_updated: "2024-03-15",

403

author: "Tech Writing Team"

404

}

405

}

406

}

407

```

408

409

#### Knowledge Base Document Structure

410

411

```typescript

412

// Example Knowledge Base document metadata

413

{

414

pageContent: "Document content text...",

415

metadata: {

416

source: "s3://my-bucket/docs/deployment-guide.pdf",

417

location: {

418

s3Location: {

419

uri: "s3://my-bucket/docs/deployment-guide.pdf"

420

},

421

type: "S3"

422

},

423

score: 0.85,

424

metadata: {

425

document_type: "user_guide",

426

version: "2.1",

427

publish_date: 20240315

428

}

429

}

430

}

431

```

432

433

### RAG Integration Patterns

434

435

Common patterns for integrating retrievers with RAG (Retrieval-Augmented Generation) workflows.

436

437

**Usage Examples:**

438

439

```typescript

440

import { ChatBedrockConverse } from "@langchain/aws";

441

import { HumanMessage } from "@langchain/core/messages";

442

443

// Create RAG chain with retriever

444

const retriever = new AmazonKnowledgeBaseRetriever({

445

knowledgeBaseId: "your-kb-id",

446

topK: 5,

447

region: "us-east-1"

448

});

449

450

const chatModel = new ChatBedrockConverse({

451

region: "us-east-1",

452

model: "anthropic.claude-3-5-sonnet-20240620-v1:0"

453

});

454

455

async function answerWithRAG(question: string): Promise<string> {

456

// Retrieve relevant documents

457

const relevantDocs = await retriever.getRelevantDocuments(question);

458

459

// Format context from retrieved documents

460

const context = relevantDocs

461

.map((doc, index) => `Document ${index + 1}: ${doc.pageContent}`)

462

.join("\n\n");

463

464

// Generate answer using retrieved context

465

const prompt = `Based on the following context, answer the question.

466

467

Context:

468

${context}

469

470

Question: ${question}

471

472

Answer:`;

473

474

const response = await chatModel.invoke([

475

new HumanMessage(prompt)

476

]);

477

478

return response.content as string;

479

}

480

481

// Use the RAG function

482

const answer = await answerWithRAG(

483

"What are the recommended security practices for API deployment?"

484

);

485

console.log(answer);

486

```

487

488

### Error Handling and Reliability

489

490

Comprehensive error handling patterns for production deployments.

491

492

**Usage Examples:**

493

494

```typescript

495

async function robustRetrieval(query: string) {

496

const retriever = new AmazonKendraRetriever({

497

indexId: "your-index-id",

498

topK: 10,

499

region: "us-east-1"

500

});

501

502

try {

503

const documents = await retriever.getRelevantDocuments(query);

504

505

if (documents.length === 0) {

506

console.log("No relevant documents found, try rephrasing your query");

507

return [];

508

}

509

510

return documents;

511

} catch (error) {

512

if (error.name === "ResourceNotFoundException") {

513

console.error("Kendra index not found - check index ID and region");

514

} else if (error.name === "AccessDeniedException") {

515

console.error("Access denied - check IAM permissions for Kendra");

516

} else if (error.name === "ThrottlingException") {

517

console.error("Rate limited - implement exponential backoff");

518

// Implement retry logic here

519

} else {

520

console.error("Retrieval failed:", error.message);

521

}

522

return [];

523

}

524

}

525

526

// Implement retry logic for transient failures

527

async function retrieverWithRetry(retriever: AmazonKendraRetriever, query: string, maxRetries = 3) {

528

for (let attempt = 1; attempt <= maxRetries; attempt++) {

529

try {

530

return await retriever.getRelevantDocuments(query);

531

} catch (error) {

532

if (attempt === maxRetries) throw error;

533

534

if (error.name === "ThrottlingException" || error.name === "ServiceException") {

535

const delay = Math.pow(2, attempt) * 1000; // Exponential backoff

536

console.log(`Retry attempt ${attempt} after ${delay}ms`);

537

await new Promise(resolve => setTimeout(resolve, delay));

538

} else {

539

throw error; // Don't retry for non-transient errors

540

}

541

}

542

}

543

}

544

```

545

546

### Performance Optimization

547

548

Best practices for optimizing retriever performance in production applications.

549

550

**Strategies:**

551

552

1. **Appropriate topK Values**: Balance between relevance and performance

553

2. **Effective Filtering**: Use attribute filters to reduce search space

554

3. **Query Optimization**: Structure queries for better semantic understanding

555

4. **Caching**: Implement application-level caching for frequent queries

556

557

```typescript

558

// Optimized retriever configuration

559

const optimizedRetriever = new AmazonKnowledgeBaseRetriever({

560

knowledgeBaseId: "your-kb-id",

561

topK: 5, // Smaller topK for faster responses

562

region: "us-east-1", // Use closest region

563

overrideSearchType: "HYBRID", // Often better than pure semantic

564

filter: {

565

// Pre-filter to relevant document types

566

in: {

567

key: "document_type",

568

value: ["manual", "guide", "faq"]

569

}

570

}

571

});

572

573

// Simple in-memory cache for frequent queries

574

const queryCache = new Map<string, Document[]>();

575

const CACHE_TTL = 5 * 60 * 1000; // 5 minutes

576

577

async function cachedRetrieval(query: string): Promise<Document[]> {

578

const cacheKey = query.toLowerCase().trim();

579

580

if (queryCache.has(cacheKey)) {

581

const cached = queryCache.get(cacheKey)!;

582

// Check if cache entry is still valid (simple timestamp check)

583

return cached;

584

}

585

586

const documents = await optimizedRetriever.getRelevantDocuments(query);

587

queryCache.set(cacheKey, documents);

588

589

// Simple cache cleanup (in production, use proper TTL)

590

setTimeout(() => queryCache.delete(cacheKey), CACHE_TTL);

591

592

return documents;

593

}

594

```