0
# Vector Store
1
2
Managed semantic search and document retrieval using Google's vector store infrastructure. Provides corpus and document management, similarity search, and integration with Google's AQA (Attributed Question Answering) service.
3
4
## Capabilities
5
6
### GoogleVectorStore
7
8
Primary vector store interface that extends LangChain's `VectorStore` to provide managed storage and retrieval using Google's semantic retriever service.
9
10
```python { .api }
11
class GoogleVectorStore:
12
def __init__(
13
self,
14
*,
15
corpus_id: str,
16
document_id: Optional[str] = None
17
)
18
```
19
20
**Parameters:**
21
- `corpus_id` (str): Google corpus identifier for the vector store
22
- `document_id` (Optional[str]): Specific document within the corpus (optional)
23
24
### Core Methods
25
26
#### Adding Documents
27
28
```python { .api }
29
def add_texts(
30
self,
31
texts: Iterable[str],
32
metadatas: Optional[List[Dict[str, Any]]] = None,
33
*,
34
document_id: Optional[str] = None,
35
**kwargs: Any
36
) -> List[str]
37
```
38
39
Add texts to the vector store as searchable chunks.
40
41
**Parameters:**
42
- `texts` (Iterable[str]): Texts to add to the store
43
- `metadatas` (Optional[List[Dict]]): Metadata for each text chunk
44
- `document_id` (Optional[str]): Target document ID (required if store not initialized with document_id)
45
- `**kwargs`: Additional parameters
46
47
**Returns:** List of chunk IDs for the added texts
48
49
#### Similarity Search
50
51
```python { .api }
52
def similarity_search(
53
self,
54
query: str,
55
k: int = 4,
56
filter: Optional[Dict[str, Any]] = None,
57
**kwargs: Any
58
) -> List[Document]
59
```
60
61
Perform semantic search to find similar documents.
62
63
**Parameters:**
64
- `query` (str): Search query text
65
- `k` (int): Number of results to return (default: 4)
66
- `filter` (Optional[Dict]): Metadata filters for search
67
- `**kwargs`: Additional search parameters
68
69
**Returns:** List of Document objects with relevant content
70
71
```python { .api }
72
def similarity_search_with_score(
73
self,
74
query: str,
75
k: int = 4,
76
filter: Optional[Dict[str, Any]] = None,
77
**kwargs: Any
78
) -> List[Tuple[Document, float]]
79
```
80
81
Perform similarity search with relevance scores.
82
83
**Parameters:**
84
- `query` (str): Search query text
85
- `k` (int): Number of results to return (default: 4)
86
- `filter` (Optional[Dict]): Metadata filters for search
87
- `**kwargs`: Additional search parameters
88
89
**Returns:** List of tuples containing (Document, relevance_score)
90
91
### Properties
92
93
```python { .api }
94
@property
95
def name(self) -> str
96
```
97
98
Returns the full name/path of the Google entity.
99
100
```python { .api }
101
@property
102
def corpus_id(self) -> str
103
```
104
105
Returns the corpus ID managed by this vector store.
106
107
```python { .api }
108
@property
109
def document_id(self) -> Optional[str]
110
```
111
112
Returns the document ID managed by this vector store (if any).
113
114
#### Document Management
115
116
```python { .api }
117
def delete(
118
self,
119
ids: Optional[List[str]] = None,
120
**kwargs: Any
121
) -> Optional[bool]
122
```
123
124
Delete documents or chunks from the vector store.
125
126
**Parameters:**
127
- `ids` (Optional[List[str]]): Specific chunk IDs to delete
128
- `**kwargs`: Additional parameters
129
130
**Returns:** Success status
131
132
```python { .api }
133
async def adelete(
134
self,
135
ids: Optional[List[str]] = None,
136
**kwargs: Any
137
) -> Optional[bool]
138
```
139
140
Async version of delete().
141
142
### Class Methods
143
144
#### Corpus Creation
145
146
```python { .api }
147
@classmethod
148
def create_corpus(
149
cls,
150
corpus_id: Optional[str] = None,
151
display_name: Optional[str] = None
152
) -> "GoogleVectorStore"
153
```
154
155
Create a new corpus on Google's servers.
156
157
**Parameters:**
158
- `corpus_id` (Optional[str]): Desired corpus ID (auto-generated if None)
159
- `display_name` (Optional[str]): Human-readable name for the corpus
160
161
**Returns:** GoogleVectorStore instance for the new corpus
162
163
#### Document Creation
164
165
```python { .api }
166
@classmethod
167
def create_document(
168
cls,
169
corpus_id: str,
170
document_id: Optional[str] = None,
171
display_name: Optional[str] = None,
172
metadata: Optional[Dict[str, Any]] = None
173
) -> "GoogleVectorStore"
174
```
175
176
Create a new document within an existing corpus.
177
178
**Parameters:**
179
- `corpus_id` (str): Target corpus ID
180
- `document_id` (Optional[str]): Desired document ID (auto-generated if None)
181
- `display_name` (Optional[str]): Human-readable name for the document
182
- `metadata` (Optional[Dict]): Custom metadata for the document
183
184
**Returns:** GoogleVectorStore instance for the new document
185
186
#### From Texts
187
188
```python { .api }
189
@classmethod
190
def from_texts(
191
cls,
192
texts: List[str],
193
embedding: Optional[Embeddings] = None,
194
metadatas: Optional[List[Dict]] = None,
195
*,
196
corpus_id: Optional[str] = None,
197
document_id: Optional[str] = None,
198
**kwargs: Any
199
) -> "GoogleVectorStore"
200
```
201
202
Create vector store and populate with texts in one operation.
203
204
**Parameters:**
205
- `texts` (List[str]): Initial texts to add
206
- `embedding` (Optional[Embeddings]): Embedding model (uses server-side if None)
207
- `metadatas` (Optional[List[Dict]]): Metadata for each text
208
- `corpus_id` (Optional[str]): Target corpus (created if doesn't exist)
209
- `document_id` (Optional[str]): Target document (created if doesn't exist)
210
- `**kwargs`: Additional parameters
211
212
**Returns:** GoogleVectorStore instance with populated content
213
214
### Integration Methods
215
216
#### AQA Integration
217
218
```python { .api }
219
def as_aqa(self, **kwargs: Any) -> Runnable[str, AqaOutput]
220
```
221
222
Create a runnable that performs attributed question answering using the vector store content.
223
224
**Parameters:**
225
- `**kwargs`: Additional AQA configuration parameters
226
227
**Returns:** Runnable that takes a query string and returns AqaOutput with attributed answers
228
229
#### Retriever Integration
230
231
```python { .api }
232
def as_retriever(self, **kwargs: Any) -> VectorStoreRetriever
233
```
234
235
Convert to a LangChain retriever for use in chains.
236
237
**Parameters:**
238
- `**kwargs`: Retriever configuration parameters
239
240
**Returns:** VectorStoreRetriever instance
241
242
## Usage Examples
243
244
### Creating and Populating a Corpus
245
246
```python
247
from langchain_google_genai import GoogleVectorStore
248
249
# Create a new corpus
250
vector_store = GoogleVectorStore.create_corpus(
251
corpus_id="my-ai-knowledge-base",
252
display_name="AI Knowledge Base"
253
)
254
255
print(f"Created corpus: {vector_store.corpus_id}")
256
257
# Add documents to the corpus
258
texts = [
259
"Machine learning is a subset of artificial intelligence.",
260
"Deep learning uses neural networks with multiple layers.",
261
"Natural language processing focuses on understanding text.",
262
"Computer vision enables machines to interpret images."
263
]
264
265
# Add texts (will create a document automatically)
266
chunk_ids = vector_store.add_texts(texts)
267
print(f"Added {len(chunk_ids)} chunks")
268
```
269
270
### Document-Level Organization
271
272
```python
273
# Create a document within a corpus
274
doc_store = GoogleVectorStore.create_document(
275
corpus_id="my-ai-knowledge-base",
276
document_id="ml-basics",
277
display_name="Machine Learning Basics",
278
metadata={"topic": "machine-learning", "level": "beginner"}
279
)
280
281
# Add content to the specific document
282
ml_texts = [
283
"Supervised learning uses labeled data for training.",
284
"Unsupervised learning finds patterns in unlabeled data.",
285
"Reinforcement learning learns through trial and error."
286
]
287
288
doc_store.add_texts(ml_texts)
289
```
290
291
### Similarity Search
292
293
```python
294
# Connect to existing corpus
295
vector_store = GoogleVectorStore(corpus_id="my-ai-knowledge-base")
296
297
# Perform similarity search
298
query = "What is deep learning?"
299
results = vector_store.similarity_search(query, k=3)
300
301
for i, doc in enumerate(results, 1):
302
print(f"Result {i}: {doc.page_content}")
303
print(f"Metadata: {doc.metadata}")
304
print()
305
```
306
307
### Search with Scores
308
309
```python
310
# Get similarity scores with results
311
results_with_scores = vector_store.similarity_search_with_score(
312
"Explain neural networks",
313
k=5
314
)
315
316
for doc, score in results_with_scores:
317
print(f"Score: {score:.3f} - {doc.page_content}")
318
```
319
320
### From Texts Helper
321
322
```python
323
from langchain_core.documents import Document
324
325
# Create vector store from texts in one step
326
documents = [
327
"Python is a versatile programming language.",
328
"JavaScript is essential for web development.",
329
"SQL is used for database operations.",
330
"Docker helps with application containerization."
331
]
332
333
metadata = [
334
{"category": "programming", "language": "python"},
335
{"category": "programming", "language": "javascript"},
336
{"category": "database", "language": "sql"},
337
{"category": "devops", "tool": "docker"}
338
]
339
340
# Create and populate vector store
341
vector_store = GoogleVectorStore.from_texts(
342
texts=documents,
343
metadatas=metadata,
344
corpus_id="programming-knowledge",
345
document_id="languages-and-tools"
346
)
347
348
# Search with metadata filtering
349
results = vector_store.similarity_search(
350
"What programming languages are available?",
351
filter={"category": "programming"}
352
)
353
```
354
355
### AQA Integration
356
357
```python
358
from langchain_google_genai import AqaInput
359
360
# Create AQA runnable from vector store
361
aqa = vector_store.as_aqa()
362
363
# Perform attributed question answering
364
query = "What are the main types of machine learning?"
365
aqa_result = aqa.invoke(query)
366
367
print(f"Answer: {aqa_result.answer}")
368
print(f"Confidence: {aqa_result.answerable_probability:.2f}")
369
print("Sources used:")
370
for passage in aqa_result.attributed_passages:
371
print(f"- {passage}")
372
```
373
374
### Retriever Integration
375
376
```python
377
from langchain_core.prompts import PromptTemplate
378
from langchain_google_genai import ChatGoogleGenerativeAI
379
380
# Convert to retriever
381
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
382
383
# Use in a RAG chain
384
llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro")
385
386
template = """Based on the following context, answer the question:
387
388
Context:
389
{context}
390
391
Question: {question}
392
393
Answer:"""
394
395
prompt = PromptTemplate.from_template(template)
396
397
# Create RAG chain
398
from langchain_core.runnables import RunnablePassthrough
399
from langchain_core.output_parsers import StrOutputParser
400
401
def format_docs(docs):
402
return "\n\n".join(doc.page_content for doc in docs)
403
404
rag_chain = (
405
{"context": retriever | format_docs, "question": RunnablePassthrough()}
406
| prompt
407
| llm
408
| StrOutputParser()
409
)
410
411
# Ask questions with retrieval
412
answer = rag_chain.invoke("What is the difference between supervised and unsupervised learning?")
413
print(answer)
414
```
415
416
### Document Management
417
418
```python
419
# Delete specific chunks
420
vector_store = GoogleVectorStore(corpus_id="my-corpus")
421
422
# Add texts first
423
chunk_ids = vector_store.add_texts([
424
"Text to be deleted later",
425
"Important text to keep"
426
])
427
428
# Delete specific chunk
429
success = vector_store.delete(ids=[chunk_ids[0]])
430
print(f"Deletion successful: {success}")
431
```
432
433
### Async Operations
434
435
```python
436
import asyncio
437
438
async def manage_vector_store():
439
vector_store = GoogleVectorStore(corpus_id="async-corpus")
440
441
# Async deletion
442
success = await vector_store.adelete(ids=["chunk-id-1", "chunk-id-2"])
443
print(f"Async deletion: {success}")
444
445
asyncio.run(manage_vector_store())
446
```
447
448
### Error Handling
449
450
```python
451
from langchain_google_genai import DoesNotExistsException
452
453
try:
454
# Try to connect to non-existent corpus
455
vector_store = GoogleVectorStore(corpus_id="non-existent-corpus")
456
457
except DoesNotExistsException as e:
458
print(f"Vector store error: {e}")
459
460
# Create the corpus instead
461
vector_store = GoogleVectorStore.create_corpus(
462
corpus_id="new-corpus",
463
display_name="Newly Created Corpus"
464
)
465
```
466
467
## Utility Classes
468
469
### ServerSideEmbedding
470
471
```python { .api }
472
class ServerSideEmbedding:
473
def embed_documents(self, texts: List[str]) -> List[List[float]]
474
def embed_query(self, text: str) -> List[float]
475
```
476
477
Placeholder embedding class for server-side embeddings (returns empty vectors as Google handles embedding internally).
478
479
### DoesNotExistsException
480
481
```python { .api }
482
class DoesNotExistsException(Exception):
483
def __init__(self, *, corpus_id: str, document_id: Optional[str] = None)
484
```
485
486
Exception raised when trying to access a corpus or document that doesn't exist on Google's servers.
487
488
**Parameters:**
489
- `corpus_id` (str): The corpus ID that doesn't exist
490
- `document_id` (Optional[str]): The document ID that doesn't exist (if applicable)
491
492
## Best Practices
493
494
1. **Organize content logically** using corpus and document structure
495
2. **Use meaningful IDs** for corpora and documents for easier management
496
3. **Include relevant metadata** to enable filtering and organization
497
4. **Handle exceptions** when accessing potentially non-existent resources
498
5. **Use AQA integration** for applications requiring source attribution
499
6. **Leverage async methods** for better performance in concurrent scenarios
500
7. **Monitor quota and limits** when working with large document collections