or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

aqa.mdchat-models.mdembeddings.mdindex.mdllm-models.mdsafety-config.mdvector-store.md

vector-store.mddocs/

0

# Vector Store

1

2

Managed semantic search and document retrieval using Google's vector store infrastructure. Provides corpus and document management, similarity search, and integration with Google's AQA (Attributed Question Answering) service.

3

4

## Capabilities

5

6

### GoogleVectorStore

7

8

Primary vector store interface that extends LangChain's `VectorStore` to provide managed storage and retrieval using Google's semantic retriever service.

9

10

```python { .api }

11

class GoogleVectorStore:

12

def __init__(

13

self,

14

*,

15

corpus_id: str,

16

document_id: Optional[str] = None

17

)

18

```

19

20

**Parameters:**

21

- `corpus_id` (str): Google corpus identifier for the vector store

22

- `document_id` (Optional[str]): Specific document within the corpus (optional)

23

24

### Core Methods

25

26

#### Adding Documents

27

28

```python { .api }

29

def add_texts(

30

self,

31

texts: Iterable[str],

32

metadatas: Optional[List[Dict[str, Any]]] = None,

33

*,

34

document_id: Optional[str] = None,

35

**kwargs: Any

36

) -> List[str]

37

```

38

39

Add texts to the vector store as searchable chunks.

40

41

**Parameters:**

42

- `texts` (Iterable[str]): Texts to add to the store

43

- `metadatas` (Optional[List[Dict]]): Metadata for each text chunk

44

- `document_id` (Optional[str]): Target document ID (required if store not initialized with document_id)

45

- `**kwargs`: Additional parameters

46

47

**Returns:** List of chunk IDs for the added texts

48

49

#### Similarity Search

50

51

```python { .api }

52

def similarity_search(

53

self,

54

query: str,

55

k: int = 4,

56

filter: Optional[Dict[str, Any]] = None,

57

**kwargs: Any

58

) -> List[Document]

59

```

60

61

Perform semantic search to find similar documents.

62

63

**Parameters:**

64

- `query` (str): Search query text

65

- `k` (int): Number of results to return (default: 4)

66

- `filter` (Optional[Dict]): Metadata filters for search

67

- `**kwargs`: Additional search parameters

68

69

**Returns:** List of Document objects with relevant content

70

71

```python { .api }

72

def similarity_search_with_score(

73

self,

74

query: str,

75

k: int = 4,

76

filter: Optional[Dict[str, Any]] = None,

77

**kwargs: Any

78

) -> List[Tuple[Document, float]]

79

```

80

81

Perform similarity search with relevance scores.

82

83

**Parameters:**

84

- `query` (str): Search query text

85

- `k` (int): Number of results to return (default: 4)

86

- `filter` (Optional[Dict]): Metadata filters for search

87

- `**kwargs`: Additional search parameters

88

89

**Returns:** List of tuples containing (Document, relevance_score)

90

91

### Properties

92

93

```python { .api }

94

@property

95

def name(self) -> str

96

```

97

98

Returns the full name/path of the Google entity.

99

100

```python { .api }

101

@property

102

def corpus_id(self) -> str

103

```

104

105

Returns the corpus ID managed by this vector store.

106

107

```python { .api }

108

@property

109

def document_id(self) -> Optional[str]

110

```

111

112

Returns the document ID managed by this vector store (if any).

113

114

#### Document Management

115

116

```python { .api }

117

def delete(

118

self,

119

ids: Optional[List[str]] = None,

120

**kwargs: Any

121

) -> Optional[bool]

122

```

123

124

Delete documents or chunks from the vector store.

125

126

**Parameters:**

127

- `ids` (Optional[List[str]]): Specific chunk IDs to delete

128

- `**kwargs`: Additional parameters

129

130

**Returns:** Success status

131

132

```python { .api }

133

async def adelete(

134

self,

135

ids: Optional[List[str]] = None,

136

**kwargs: Any

137

) -> Optional[bool]

138

```

139

140

Async version of delete().

141

142

### Class Methods

143

144

#### Corpus Creation

145

146

```python { .api }

147

@classmethod

148

def create_corpus(

149

cls,

150

corpus_id: Optional[str] = None,

151

display_name: Optional[str] = None

152

) -> "GoogleVectorStore"

153

```

154

155

Create a new corpus on Google's servers.

156

157

**Parameters:**

158

- `corpus_id` (Optional[str]): Desired corpus ID (auto-generated if None)

159

- `display_name` (Optional[str]): Human-readable name for the corpus

160

161

**Returns:** GoogleVectorStore instance for the new corpus

162

163

#### Document Creation

164

165

```python { .api }

166

@classmethod

167

def create_document(

168

cls,

169

corpus_id: str,

170

document_id: Optional[str] = None,

171

display_name: Optional[str] = None,

172

metadata: Optional[Dict[str, Any]] = None

173

) -> "GoogleVectorStore"

174

```

175

176

Create a new document within an existing corpus.

177

178

**Parameters:**

179

- `corpus_id` (str): Target corpus ID

180

- `document_id` (Optional[str]): Desired document ID (auto-generated if None)

181

- `display_name` (Optional[str]): Human-readable name for the document

182

- `metadata` (Optional[Dict]): Custom metadata for the document

183

184

**Returns:** GoogleVectorStore instance for the new document

185

186

#### From Texts

187

188

```python { .api }

189

@classmethod

190

def from_texts(

191

cls,

192

texts: List[str],

193

embedding: Optional[Embeddings] = None,

194

metadatas: Optional[List[Dict]] = None,

195

*,

196

corpus_id: Optional[str] = None,

197

document_id: Optional[str] = None,

198

**kwargs: Any

199

) -> "GoogleVectorStore"

200

```

201

202

Create vector store and populate with texts in one operation.

203

204

**Parameters:**

205

- `texts` (List[str]): Initial texts to add

206

- `embedding` (Optional[Embeddings]): Embedding model (uses server-side if None)

207

- `metadatas` (Optional[List[Dict]]): Metadata for each text

208

- `corpus_id` (Optional[str]): Target corpus (created if doesn't exist)

209

- `document_id` (Optional[str]): Target document (created if doesn't exist)

210

- `**kwargs`: Additional parameters

211

212

**Returns:** GoogleVectorStore instance with populated content

213

214

### Integration Methods

215

216

#### AQA Integration

217

218

```python { .api }

219

def as_aqa(self, **kwargs: Any) -> Runnable[str, AqaOutput]

220

```

221

222

Create a runnable that performs attributed question answering using the vector store content.

223

224

**Parameters:**

225

- `**kwargs`: Additional AQA configuration parameters

226

227

**Returns:** Runnable that takes a query string and returns AqaOutput with attributed answers

228

229

#### Retriever Integration

230

231

```python { .api }

232

def as_retriever(self, **kwargs: Any) -> VectorStoreRetriever

233

```

234

235

Convert to a LangChain retriever for use in chains.

236

237

**Parameters:**

238

- `**kwargs`: Retriever configuration parameters

239

240

**Returns:** VectorStoreRetriever instance

241

242

## Usage Examples

243

244

### Creating and Populating a Corpus

245

246

```python

247

from langchain_google_genai import GoogleVectorStore

248

249

# Create a new corpus

250

vector_store = GoogleVectorStore.create_corpus(

251

corpus_id="my-ai-knowledge-base",

252

display_name="AI Knowledge Base"

253

)

254

255

print(f"Created corpus: {vector_store.corpus_id}")

256

257

# Add documents to the corpus

258

texts = [

259

"Machine learning is a subset of artificial intelligence.",

260

"Deep learning uses neural networks with multiple layers.",

261

"Natural language processing focuses on understanding text.",

262

"Computer vision enables machines to interpret images."

263

]

264

265

# Add texts (will create a document automatically)

266

chunk_ids = vector_store.add_texts(texts)

267

print(f"Added {len(chunk_ids)} chunks")

268

```

269

270

### Document-Level Organization

271

272

```python

273

# Create a document within a corpus

274

doc_store = GoogleVectorStore.create_document(

275

corpus_id="my-ai-knowledge-base",

276

document_id="ml-basics",

277

display_name="Machine Learning Basics",

278

metadata={"topic": "machine-learning", "level": "beginner"}

279

)

280

281

# Add content to the specific document

282

ml_texts = [

283

"Supervised learning uses labeled data for training.",

284

"Unsupervised learning finds patterns in unlabeled data.",

285

"Reinforcement learning learns through trial and error."

286

]

287

288

doc_store.add_texts(ml_texts)

289

```

290

291

### Similarity Search

292

293

```python

294

# Connect to existing corpus

295

vector_store = GoogleVectorStore(corpus_id="my-ai-knowledge-base")

296

297

# Perform similarity search

298

query = "What is deep learning?"

299

results = vector_store.similarity_search(query, k=3)

300

301

for i, doc in enumerate(results, 1):

302

print(f"Result {i}: {doc.page_content}")

303

print(f"Metadata: {doc.metadata}")

304

print()

305

```

306

307

### Search with Scores

308

309

```python

310

# Get similarity scores with results

311

results_with_scores = vector_store.similarity_search_with_score(

312

"Explain neural networks",

313

k=5

314

)

315

316

for doc, score in results_with_scores:

317

print(f"Score: {score:.3f} - {doc.page_content}")

318

```

319

320

### From Texts Helper

321

322

```python

323

from langchain_core.documents import Document

324

325

# Create vector store from texts in one step

326

documents = [

327

"Python is a versatile programming language.",

328

"JavaScript is essential for web development.",

329

"SQL is used for database operations.",

330

"Docker helps with application containerization."

331

]

332

333

metadata = [

334

{"category": "programming", "language": "python"},

335

{"category": "programming", "language": "javascript"},

336

{"category": "database", "language": "sql"},

337

{"category": "devops", "tool": "docker"}

338

]

339

340

# Create and populate vector store

341

vector_store = GoogleVectorStore.from_texts(

342

texts=documents,

343

metadatas=metadata,

344

corpus_id="programming-knowledge",

345

document_id="languages-and-tools"

346

)

347

348

# Search with metadata filtering

349

results = vector_store.similarity_search(

350

"What programming languages are available?",

351

filter={"category": "programming"}

352

)

353

```

354

355

### AQA Integration

356

357

```python

358

from langchain_google_genai import AqaInput

359

360

# Create AQA runnable from vector store

361

aqa = vector_store.as_aqa()

362

363

# Perform attributed question answering

364

query = "What are the main types of machine learning?"

365

aqa_result = aqa.invoke(query)

366

367

print(f"Answer: {aqa_result.answer}")

368

print(f"Confidence: {aqa_result.answerable_probability:.2f}")

369

print("Sources used:")

370

for passage in aqa_result.attributed_passages:

371

print(f"- {passage}")

372

```

373

374

### Retriever Integration

375

376

```python

377

from langchain_core.prompts import PromptTemplate

378

from langchain_google_genai import ChatGoogleGenerativeAI

379

380

# Convert to retriever

381

retriever = vector_store.as_retriever(search_kwargs={"k": 3})

382

383

# Use in a RAG chain

384

llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro")

385

386

template = """Based on the following context, answer the question:

387

388

Context:

389

{context}

390

391

Question: {question}

392

393

Answer:"""

394

395

prompt = PromptTemplate.from_template(template)

396

397

# Create RAG chain

398

from langchain_core.runnables import RunnablePassthrough

399

from langchain_core.output_parsers import StrOutputParser

400

401

def format_docs(docs):

402

return "\n\n".join(doc.page_content for doc in docs)

403

404

rag_chain = (

405

{"context": retriever | format_docs, "question": RunnablePassthrough()}

406

| prompt

407

| llm

408

| StrOutputParser()

409

)

410

411

# Ask questions with retrieval

412

answer = rag_chain.invoke("What is the difference between supervised and unsupervised learning?")

413

print(answer)

414

```

415

416

### Document Management

417

418

```python

419

# Delete specific chunks

420

vector_store = GoogleVectorStore(corpus_id="my-corpus")

421

422

# Add texts first

423

chunk_ids = vector_store.add_texts([

424

"Text to be deleted later",

425

"Important text to keep"

426

])

427

428

# Delete specific chunk

429

success = vector_store.delete(ids=[chunk_ids[0]])

430

print(f"Deletion successful: {success}")

431

```

432

433

### Async Operations

434

435

```python

436

import asyncio

437

438

async def manage_vector_store():

439

vector_store = GoogleVectorStore(corpus_id="async-corpus")

440

441

# Async deletion

442

success = await vector_store.adelete(ids=["chunk-id-1", "chunk-id-2"])

443

print(f"Async deletion: {success}")

444

445

asyncio.run(manage_vector_store())

446

```

447

448

### Error Handling

449

450

```python

451

from langchain_google_genai import DoesNotExistsException

452

453

try:

454

# Try to connect to non-existent corpus

455

vector_store = GoogleVectorStore(corpus_id="non-existent-corpus")

456

457

except DoesNotExistsException as e:

458

print(f"Vector store error: {e}")

459

460

# Create the corpus instead

461

vector_store = GoogleVectorStore.create_corpus(

462

corpus_id="new-corpus",

463

display_name="Newly Created Corpus"

464

)

465

```

466

467

## Utility Classes

468

469

### ServerSideEmbedding

470

471

```python { .api }

472

class ServerSideEmbedding:

473

def embed_documents(self, texts: List[str]) -> List[List[float]]

474

def embed_query(self, text: str) -> List[float]

475

```

476

477

Placeholder embedding class for server-side embeddings (returns empty vectors as Google handles embedding internally).

478

479

### DoesNotExistsException

480

481

```python { .api }

482

class DoesNotExistsException(Exception):

483

def __init__(self, *, corpus_id: str, document_id: Optional[str] = None)

484

```

485

486

Exception raised when trying to access a corpus or document that doesn't exist on Google's servers.

487

488

**Parameters:**

489

- `corpus_id` (str): The corpus ID that doesn't exist

490

- `document_id` (Optional[str]): The document ID that doesn't exist (if applicable)

491

492

## Best Practices

493

494

1. **Organize content logically** using corpus and document structure

495

2. **Use meaningful IDs** for corpora and documents for easier management

496

3. **Include relevant metadata** to enable filtering and organization

497

4. **Handle exceptions** when accessing potentially non-existent resources

498

5. **Use AQA integration** for applications requiring source attribution

499

6. **Leverage async methods** for better performance in concurrent scenarios

500

7. **Monitor quota and limits** when working with large document collections