Tessl Tile for pypi/langchain-google-genai@2.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

aqa.md chat-models.md embeddings.md index.md llm-models.md safety-config.md vector-store.md

vector-store.mddocs/

0
# Vector Store
1

2
Managed semantic search and document retrieval using Google's vector store infrastructure. Provides corpus and document management, similarity search, and integration with Google's AQA (Attributed Question Answering) service.
3

4
## Capabilities
5

6
### GoogleVectorStore
7

8
Primary vector store interface that extends LangChain's `VectorStore` to provide managed storage and retrieval using Google's semantic retriever service.
9

10
```python { .api }
11
class GoogleVectorStore:
12
    def __init__(
13
        self,
14
        *,
15
        corpus_id: str,
16
        document_id: Optional[str] = None
17
    )
18
```
19

20
**Parameters:**
21
- `corpus_id` (str): Google corpus identifier for the vector store
22
- `document_id` (Optional[str]): Specific document within the corpus (optional)
23

24
### Core Methods
25

26
#### Adding Documents
27

28
```python { .api }
29
def add_texts(
30
    self,
31
    texts: Iterable[str],
32
    metadatas: Optional[List[Dict[str, Any]]] = None,
33
    *,
34
    document_id: Optional[str] = None,
35
    **kwargs: Any
36
) -> List[str]
37
```
38

39
Add texts to the vector store as searchable chunks.
40

41
**Parameters:**
42
- `texts` (Iterable[str]): Texts to add to the store
43
- `metadatas` (Optional[List[Dict]]): Metadata for each text chunk
44
- `document_id` (Optional[str]): Target document ID (required if store not initialized with document_id)
45
- `**kwargs`: Additional parameters
46

47
**Returns:** List of chunk IDs for the added texts
48

49
#### Similarity Search
50

51
```python { .api }
52
def similarity_search(
53
    self,
54
    query: str,
55
    k: int = 4,
56
    filter: Optional[Dict[str, Any]] = None,
57
    **kwargs: Any
58
) -> List[Document]
59
```
60

61
Perform semantic search to find similar documents.
62

63
**Parameters:**
64
- `query` (str): Search query text
65
- `k` (int): Number of results to return (default: 4)
66
- `filter` (Optional[Dict]): Metadata filters for search
67
- `**kwargs`: Additional search parameters
68

69
**Returns:** List of Document objects with relevant content
70

71
```python { .api }
72
def similarity_search_with_score(
73
    self,
74
    query: str,
75
    k: int = 4,
76
    filter: Optional[Dict[str, Any]] = None,
77
    **kwargs: Any
78
) -> List[Tuple[Document, float]]
79
```
80

81
Perform similarity search with relevance scores.
82

83
**Parameters:**
84
- `query` (str): Search query text
85
- `k` (int): Number of results to return (default: 4)
86
- `filter` (Optional[Dict]): Metadata filters for search
87
- `**kwargs`: Additional search parameters
88

89
**Returns:** List of tuples containing (Document, relevance_score)
90

91
### Properties
92

93
```python { .api }
94
@property
95
def name(self) -> str
96
```
97

98
Returns the full name/path of the Google entity.
99

100
```python { .api }
101
@property
102
def corpus_id(self) -> str
103
```
104

105
Returns the corpus ID managed by this vector store.
106

107
```python { .api }
108
@property
109
def document_id(self) -> Optional[str]
110
```
111

112
Returns the document ID managed by this vector store (if any).
113

114
#### Document Management
115

116
```python { .api }
117
def delete(
118
    self,
119
    ids: Optional[List[str]] = None,
120
    **kwargs: Any
121
) -> Optional[bool]
122
```
123

124
Delete documents or chunks from the vector store.
125

126
**Parameters:**
127
- `ids` (Optional[List[str]]): Specific chunk IDs to delete
128
- `**kwargs`: Additional parameters
129

130
**Returns:** Success status
131

132
```python { .api }
133
async def adelete(
134
    self,
135
    ids: Optional[List[str]] = None,
136
    **kwargs: Any
137
) -> Optional[bool]
138
```
139

140
Async version of delete().
141

142
### Class Methods
143

144
#### Corpus Creation
145

146
```python { .api }
147
@classmethod
148
def create_corpus(
149
    cls,
150
    corpus_id: Optional[str] = None,
151
    display_name: Optional[str] = None
152
) -> "GoogleVectorStore"
153
```
154

155
Create a new corpus on Google's servers.
156

157
**Parameters:**
158
- `corpus_id` (Optional[str]): Desired corpus ID (auto-generated if None)
159
- `display_name` (Optional[str]): Human-readable name for the corpus
160

161
**Returns:** GoogleVectorStore instance for the new corpus
162

163
#### Document Creation
164

165
```python { .api }
166
@classmethod
167
def create_document(
168
    cls,
169
    corpus_id: str,
170
    document_id: Optional[str] = None,
171
    display_name: Optional[str] = None,
172
    metadata: Optional[Dict[str, Any]] = None
173
) -> "GoogleVectorStore"
174
```
175

176
Create a new document within an existing corpus.
177

178
**Parameters:**
179
- `corpus_id` (str): Target corpus ID
180
- `document_id` (Optional[str]): Desired document ID (auto-generated if None)
181
- `display_name` (Optional[str]): Human-readable name for the document
182
- `metadata` (Optional[Dict]): Custom metadata for the document
183

184
**Returns:** GoogleVectorStore instance for the new document
185

186
#### From Texts
187

188
```python { .api }
189
@classmethod
190
def from_texts(
191
    cls,
192
    texts: List[str],
193
    embedding: Optional[Embeddings] = None,
194
    metadatas: Optional[List[Dict]] = None,
195
    *,
196
    corpus_id: Optional[str] = None,
197
    document_id: Optional[str] = None,
198
    **kwargs: Any
199
) -> "GoogleVectorStore"
200
```
201

202
Create vector store and populate with texts in one operation.
203

204
**Parameters:**
205
- `texts` (List[str]): Initial texts to add
206
- `embedding` (Optional[Embeddings]): Embedding model (uses server-side if None)
207
- `metadatas` (Optional[List[Dict]]): Metadata for each text
208
- `corpus_id` (Optional[str]): Target corpus (created if doesn't exist)
209
- `document_id` (Optional[str]): Target document (created if doesn't exist)
210
- `**kwargs`: Additional parameters
211

212
**Returns:** GoogleVectorStore instance with populated content
213

214
### Integration Methods
215

216
#### AQA Integration
217

218
```python { .api }
219
def as_aqa(self, **kwargs: Any) -> Runnable[str, AqaOutput]
220
```
221

222
Create a runnable that performs attributed question answering using the vector store content.
223

224
**Parameters:**
225
- `**kwargs`: Additional AQA configuration parameters
226

227
**Returns:** Runnable that takes a query string and returns AqaOutput with attributed answers
228

229
#### Retriever Integration
230

231
```python { .api }
232
def as_retriever(self, **kwargs: Any) -> VectorStoreRetriever
233
```
234

235
Convert to a LangChain retriever for use in chains.
236

237
**Parameters:**
238
- `**kwargs`: Retriever configuration parameters
239

240
**Returns:** VectorStoreRetriever instance
241

242
## Usage Examples
243

244
### Creating and Populating a Corpus
245

246
```python
247
from langchain_google_genai import GoogleVectorStore
248

249
# Create a new corpus
250
vector_store = GoogleVectorStore.create_corpus(
251
    corpus_id="my-ai-knowledge-base",
252
    display_name="AI Knowledge Base"
253
)
254

255
print(f"Created corpus: {vector_store.corpus_id}")
256

257
# Add documents to the corpus
258
texts = [
259
    "Machine learning is a subset of artificial intelligence.",
260
    "Deep learning uses neural networks with multiple layers.",
261
    "Natural language processing focuses on understanding text.",
262
    "Computer vision enables machines to interpret images."
263
]
264

265
# Add texts (will create a document automatically)
266
chunk_ids = vector_store.add_texts(texts)
267
print(f"Added {len(chunk_ids)} chunks")
268
```
269

270
### Document-Level Organization
271

272
```python
273
# Create a document within a corpus
274
doc_store = GoogleVectorStore.create_document(
275
    corpus_id="my-ai-knowledge-base",
276
    document_id="ml-basics",
277
    display_name="Machine Learning Basics",
278
    metadata={"topic": "machine-learning", "level": "beginner"}
279
)
280

281
# Add content to the specific document
282
ml_texts = [
283
    "Supervised learning uses labeled data for training.",
284
    "Unsupervised learning finds patterns in unlabeled data.",
285
    "Reinforcement learning learns through trial and error."
286
]
287

288
doc_store.add_texts(ml_texts)
289
```
290

291
### Similarity Search
292

293
```python
294
# Connect to existing corpus
295
vector_store = GoogleVectorStore(corpus_id="my-ai-knowledge-base")
296

297
# Perform similarity search
298
query = "What is deep learning?"
299
results = vector_store.similarity_search(query, k=3)
300

301
for i, doc in enumerate(results, 1):
302
    print(f"Result {i}: {doc.page_content}")
303
    print(f"Metadata: {doc.metadata}")
304
    print()
305
```
306

307
### Search with Scores
308

309
```python
310
# Get similarity scores with results
311
results_with_scores = vector_store.similarity_search_with_score(
312
    "Explain neural networks", 
313
    k=5
314
)
315

316
for doc, score in results_with_scores:
317
    print(f"Score: {score:.3f} - {doc.page_content}")
318
```
319

320
### From Texts Helper
321

322
```python
323
from langchain_core.documents import Document
324

325
# Create vector store from texts in one step
326
documents = [
327
    "Python is a versatile programming language.",
328
    "JavaScript is essential for web development.",
329
    "SQL is used for database operations.",
330
    "Docker helps with application containerization."
331
]
332

333
metadata = [
334
    {"category": "programming", "language": "python"},
335
    {"category": "programming", "language": "javascript"},
336
    {"category": "database", "language": "sql"},
337
    {"category": "devops", "tool": "docker"}
338
]
339

340
# Create and populate vector store
341
vector_store = GoogleVectorStore.from_texts(
342
    texts=documents,
343
    metadatas=metadata,
344
    corpus_id="programming-knowledge",
345
    document_id="languages-and-tools"
346
)
347

348
# Search with metadata filtering
349
results = vector_store.similarity_search(
350
    "What programming languages are available?",
351
    filter={"category": "programming"}
352
)
353
```
354

355
### AQA Integration
356

357
```python
358
from langchain_google_genai import AqaInput
359

360
# Create AQA runnable from vector store
361
aqa = vector_store.as_aqa()
362

363
# Perform attributed question answering
364
query = "What are the main types of machine learning?"
365
aqa_result = aqa.invoke(query)
366

367
print(f"Answer: {aqa_result.answer}")
368
print(f"Confidence: {aqa_result.answerable_probability:.2f}")
369
print("Sources used:")
370
for passage in aqa_result.attributed_passages:
371
    print(f"- {passage}")
372
```
373

374
### Retriever Integration
375

376
```python
377
from langchain_core.prompts import PromptTemplate
378
from langchain_google_genai import ChatGoogleGenerativeAI
379

380
# Convert to retriever
381
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
382

383
# Use in a RAG chain
384
llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro")
385

386
template = """Based on the following context, answer the question:
387

388
Context:
389
{context}
390

391
Question: {question}
392

393
Answer:"""
394

395
prompt = PromptTemplate.from_template(template)
396

397
# Create RAG chain
398
from langchain_core.runnables import RunnablePassthrough
399
from langchain_core.output_parsers import StrOutputParser
400

401
def format_docs(docs):
402
    return "\n\n".join(doc.page_content for doc in docs)
403

404
rag_chain = (
405
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
406
    | prompt
407
    | llm
408
    | StrOutputParser()
409
)
410

411
# Ask questions with retrieval
412
answer = rag_chain.invoke("What is the difference between supervised and unsupervised learning?")
413
print(answer)
414
```
415

416
### Document Management
417

418
```python
419
# Delete specific chunks
420
vector_store = GoogleVectorStore(corpus_id="my-corpus")
421

422
# Add texts first
423
chunk_ids = vector_store.add_texts([
424
    "Text to be deleted later",
425
    "Important text to keep"
426
])
427

428
# Delete specific chunk
429
success = vector_store.delete(ids=[chunk_ids[0]])
430
print(f"Deletion successful: {success}")
431
```
432

433
### Async Operations
434

435
```python
436
import asyncio
437

438
async def manage_vector_store():
439
    vector_store = GoogleVectorStore(corpus_id="async-corpus")
440
    
441
    # Async deletion
442
    success = await vector_store.adelete(ids=["chunk-id-1", "chunk-id-2"])
443
    print(f"Async deletion: {success}")
444

445
asyncio.run(manage_vector_store())
446
```
447

448
### Error Handling
449

450
```python
451
from langchain_google_genai import DoesNotExistsException
452

453
try:
454
    # Try to connect to non-existent corpus
455
    vector_store = GoogleVectorStore(corpus_id="non-existent-corpus")
456
    
457
except DoesNotExistsException as e:
458
    print(f"Vector store error: {e}")
459
    
460
    # Create the corpus instead
461
    vector_store = GoogleVectorStore.create_corpus(
462
        corpus_id="new-corpus",
463
        display_name="Newly Created Corpus"
464
    )
465
```
466

467
## Utility Classes
468

469
### ServerSideEmbedding
470

471
```python { .api }
472
class ServerSideEmbedding:
473
    def embed_documents(self, texts: List[str]) -> List[List[float]]
474
    def embed_query(self, text: str) -> List[float]
475
```
476

477
Placeholder embedding class for server-side embeddings (returns empty vectors as Google handles embedding internally).
478

479
### DoesNotExistsException
480

481
```python { .api }
482
class DoesNotExistsException(Exception):
483
    def __init__(self, *, corpus_id: str, document_id: Optional[str] = None)
484
```
485

486
Exception raised when trying to access a corpus or document that doesn't exist on Google's servers.
487

488
**Parameters:**
489
- `corpus_id` (str): The corpus ID that doesn't exist
490
- `document_id` (Optional[str]): The document ID that doesn't exist (if applicable)
491

492
## Best Practices
493

494
1. **Organize content logically** using corpus and document structure
495
2. **Use meaningful IDs** for corpora and documents for easier management
496
3. **Include relevant metadata** to enable filtering and organization
497
4. **Handle exceptions** when accessing potentially non-existent resources
498
5. **Use AQA integration** for applications requiring source attribution
499
6. **Leverage async methods** for better performance in concurrent scenarios
500
7. **Monitor quota and limits** when working with large document collections

Version

Tile

Files

vector-store.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

vector-store.mddocs/