0
# Data Indexing
1
2
Data structures for organizing and retrieving information efficiently, supporting various indexing strategies from vector-based semantic search to hierarchical trees and knowledge graphs.
3
4
## Capabilities
5
6
### Vector Store Index
7
8
Vector-based index for semantic search using embeddings, supporting various vector databases and similarity metrics.
9
10
```python { .api }
11
class VectorStoreIndex:
12
"""
13
Vector store index for semantic search.
14
15
Args:
16
nodes: List of Node objects to index
17
storage_context: Storage configuration
18
service_context: Service configuration (deprecated, use Settings)
19
**kwargs: Additional arguments
20
"""
21
def __init__(self, nodes=None, storage_context=None, service_context=None, **kwargs): ...
22
23
@classmethod
24
def from_documents(cls, documents, storage_context=None, service_context=None, **kwargs):
25
"""
26
Create index from documents.
27
28
Args:
29
documents: List of Document objects
30
storage_context: Storage configuration
31
service_context: Service configuration (deprecated)
32
**kwargs: Additional arguments
33
34
Returns:
35
VectorStoreIndex: Constructed index
36
"""
37
38
def as_query_engine(self, **kwargs):
39
"""
40
Create query engine from index.
41
42
Returns:
43
BaseQueryEngine: Query engine for the index
44
"""
45
46
def as_retriever(self, similarity_top_k=None, **kwargs):
47
"""
48
Create retriever from index.
49
50
Args:
51
similarity_top_k: Number of similar documents to retrieve
52
53
Returns:
54
BaseRetriever: Retriever for the index
55
"""
56
57
def insert(self, document, **kwargs):
58
"""Insert a document into the index."""
59
60
def delete_ref_doc(self, ref_doc_id, **kwargs):
61
"""Delete document by reference ID."""
62
63
def update_ref_doc(self, document, **kwargs):
64
"""Update document in the index."""
65
```
66
67
### Summary Index (List Index)
68
69
Simple list-based index for summarization tasks, storing all nodes in a flat list structure.
70
71
```python { .api }
72
class SummaryIndex:
73
"""
74
Summary index for basic retrieval and summarization.
75
76
Args:
77
nodes: List of Node objects to index
78
storage_context: Storage configuration
79
**kwargs: Additional arguments
80
"""
81
def __init__(self, nodes=None, storage_context=None, **kwargs): ...
82
83
@classmethod
84
def from_documents(cls, documents, storage_context=None, **kwargs):
85
"""Create summary index from documents."""
86
87
def as_query_engine(self, retriever_mode="default", **kwargs):
88
"""
89
Create query engine from index.
90
91
Args:
92
retriever_mode: Retrieval mode ("default", "embedding")
93
"""
94
95
def as_retriever(self, retriever_mode="default", **kwargs):
96
"""Create retriever from index."""
97
```
98
99
### Property Graph Index
100
101
Property graph index for complex entity relationships with support for knowledge graph reasoning and graph-based retrieval.
102
103
```python { .api }
104
class PropertyGraphIndex:
105
"""
106
Property graph index for knowledge graph storage and retrieval.
107
108
Args:
109
nodes: List of Node objects to index
110
property_graph_store: Graph storage backend
111
vector_store: Vector storage for embeddings
112
embed_model: Embedding model for nodes
113
**kwargs: Additional arguments
114
"""
115
def __init__(
116
self,
117
nodes=None,
118
property_graph_store=None,
119
vector_store=None,
120
embed_model=None,
121
**kwargs
122
): ...
123
124
@classmethod
125
def from_documents(cls, documents, **kwargs):
126
"""Create property graph index from documents."""
127
128
def as_query_engine(self, **kwargs):
129
"""Create query engine for graph traversal and reasoning."""
130
131
def as_retriever(self, **kwargs):
132
"""Create retriever for graph-based document retrieval."""
133
134
def upsert_triplet(self, subj, pred, obj):
135
"""Insert or update a knowledge triplet."""
136
137
def delete_triplet(self, subj, pred, obj):
138
"""Delete a knowledge triplet."""
139
```
140
141
### Keyword Table Index
142
143
Keyword-based index for exact keyword matching and retrieval, supporting various keyword extraction algorithms.
144
145
```python { .api }
146
class KeywordTableIndex:
147
"""
148
Base keyword table index for keyword-based retrieval.
149
150
Args:
151
nodes: List of Node objects to index
152
table: Index table storage
153
**kwargs: Additional arguments
154
"""
155
def __init__(self, nodes=None, table=None, **kwargs): ...
156
157
@classmethod
158
def from_documents(cls, documents, **kwargs):
159
"""Create keyword index from documents."""
160
161
class SimpleKeywordTableIndex(KeywordTableIndex):
162
"""
163
Simple keyword extraction using basic text processing.
164
165
Args:
166
max_keywords_per_chunk: Maximum keywords per document chunk
167
keyword_extract_template: Template for keyword extraction
168
**kwargs: KeywordTableIndex arguments
169
"""
170
def __init__(
171
self,
172
max_keywords_per_chunk=10,
173
keyword_extract_template=None,
174
**kwargs
175
): ...
176
177
class RAKEKeywordTableIndex(KeywordTableIndex):
178
"""
179
RAKE (Rapid Automatic Keyword Extraction) algorithm for keyword extraction.
180
181
Args:
182
max_keywords_per_chunk: Maximum keywords per document chunk
183
**kwargs: KeywordTableIndex arguments
184
"""
185
def __init__(self, max_keywords_per_chunk=10, **kwargs): ...
186
```
187
188
### Multimodal Vector Store Index
189
190
Vector index supporting multimodal data including text, images, and other media types with cross-modal similarity search.
191
192
```python { .api }
193
class MultiModalVectorStoreIndex:
194
"""
195
Multimodal vector store index for text and image data.
196
197
Args:
198
nodes: List of Node objects (text and image)
199
image_vector_store: Vector store for image embeddings
200
storage_context: Storage configuration
201
**kwargs: Additional arguments
202
"""
203
def __init__(
204
self,
205
nodes=None,
206
image_vector_store=None,
207
storage_context=None,
208
**kwargs
209
): ...
210
211
@classmethod
212
def from_documents(cls, documents, **kwargs):
213
"""Create multimodal index from text and image documents."""
214
215
def as_query_engine(self, **kwargs):
216
"""Create multimodal query engine."""
217
218
def as_retriever(self, **kwargs):
219
"""Create multimodal retriever."""
220
```
221
222
### Composable Graph
223
224
Container for combining multiple indices with routing and composition strategies for complex retrieval scenarios.
225
226
```python { .api }
227
class ComposableGraph:
228
"""
229
Composable graph for combining multiple indices.
230
231
Args:
232
all_indices: Dictionary mapping index IDs to index objects
233
root_id: Root index identifier
234
**kwargs: Additional arguments
235
"""
236
def __init__(self, all_indices, root_id=None, **kwargs): ...
237
238
@classmethod
239
def from_indices(
240
cls,
241
root_index,
242
children_indices,
243
index_summaries=None,
244
**kwargs
245
):
246
"""
247
Create composable graph from indices.
248
249
Args:
250
root_index: Main index for routing
251
children_indices: List of child indices
252
index_summaries: Summaries for each child index
253
"""
254
255
def as_query_engine(self, **kwargs):
256
"""Create query engine with index routing."""
257
258
def as_retriever(self, **kwargs):
259
"""Create retriever with index selection."""
260
```
261
262
### Structured Data Indices
263
264
Specialized indices for structured data sources like SQL databases and pandas DataFrames.
265
266
```python { .api }
267
class SQLStructStoreIndex:
268
"""
269
SQL-based structured store index.
270
271
Args:
272
nodes: List of structured nodes
273
sql_database: SQL database connection
274
table_name: Target table name
275
**kwargs: Additional arguments
276
"""
277
def __init__(
278
self,
279
nodes=None,
280
sql_database=None,
281
table_name=None,
282
**kwargs
283
): ...
284
285
@classmethod
286
def from_documents(cls, documents, sql_database, **kwargs):
287
"""Create SQL index from documents."""
288
289
class PandasIndex:
290
"""
291
Pandas DataFrame index for structured data analysis.
292
293
Args:
294
df: Pandas DataFrame
295
**kwargs: Additional arguments
296
"""
297
def __init__(self, df=None, **kwargs): ...
298
299
def as_query_engine(self, **kwargs):
300
"""Create query engine for DataFrame operations."""
301
```
302
303
### Tree Index
304
305
Hierarchical tree-based index for structured retrieval, organizing nodes in a tree structure for efficient traversal.
306
307
```python { .api }
308
class TreeIndex:
309
"""
310
Tree index for hierarchical retrieval.
311
312
Args:
313
nodes: List of Node objects to index
314
storage_context: Storage configuration
315
**kwargs: Additional arguments
316
"""
317
def __init__(self, nodes=None, storage_context=None, **kwargs): ...
318
319
@classmethod
320
def from_documents(cls, documents, storage_context=None, **kwargs):
321
"""Create tree index from documents."""
322
323
def as_query_engine(self, retriever_mode="select_leaf_embedding", **kwargs):
324
"""
325
Create query engine from index.
326
327
Args:
328
retriever_mode: Tree traversal mode
329
"""
330
331
def as_retriever(self, retriever_mode="select_leaf_embedding", **kwargs):
332
"""Create retriever from index."""
333
```
334
335
### Keyword Table Index
336
337
Keyword-based index using various extraction strategies for term-based retrieval.
338
339
```python { .api }
340
class KeywordTableIndex:
341
"""Base class for keyword table indices."""
342
def __init__(self, nodes=None, storage_context=None, **kwargs): ...
343
344
@classmethod
345
def from_documents(cls, documents, storage_context=None, **kwargs):
346
"""Create keyword index from documents."""
347
348
class SimpleKeywordTableIndex(KeywordTableIndex):
349
"""Simple keyword extraction and matching."""
350
351
class RAKEKeywordTableIndex(KeywordTableIndex):
352
"""RAKE algorithm-based keyword extraction."""
353
```
354
355
### Knowledge Graph Index
356
357
Graph-based index for representing entities and relationships, enabling complex relational queries.
358
359
```python { .api }
360
class KnowledgeGraphIndex:
361
"""
362
Knowledge graph index for entity-relationship modeling.
363
364
Args:
365
nodes: List of Node objects to index
366
storage_context: Storage configuration
367
kg_triple_extract_template: Template for triple extraction
368
**kwargs: Additional arguments
369
"""
370
def __init__(self, nodes=None, storage_context=None, kg_triple_extract_template=None, **kwargs): ...
371
372
@classmethod
373
def from_documents(cls, documents, storage_context=None, **kwargs):
374
"""Create knowledge graph index from documents."""
375
376
def as_query_engine(self, **kwargs):
377
"""Create query engine for graph queries."""
378
379
def as_retriever(self, **kwargs):
380
"""Create retriever for graph-based retrieval."""
381
```
382
383
### Property Graph Index
384
385
Advanced graph index with entity and relationship properties for complex graph operations.
386
387
```python { .api }
388
class PropertyGraphIndex:
389
"""
390
Property graph index with rich entity/relationship properties.
391
392
Args:
393
nodes: List of Node objects to index
394
storage_context: Storage configuration
395
**kwargs: Additional arguments
396
"""
397
def __init__(self, nodes=None, storage_context=None, **kwargs): ...
398
399
@classmethod
400
def from_documents(cls, documents, storage_context=None, **kwargs):
401
"""Create property graph index from documents."""
402
403
def as_query_engine(self, **kwargs):
404
"""Create query engine for property graph queries."""
405
406
def as_retriever(self, **kwargs):
407
"""Create retriever for property graph retrieval."""
408
```
409
410
### Document Summary Index
411
412
Index that maintains document-level summaries for efficient high-level retrieval.
413
414
```python { .api }
415
class DocumentSummaryIndex:
416
"""
417
Document summary index for document-level retrieval.
418
419
Args:
420
nodes: List of Node objects to index
421
storage_context: Storage configuration
422
response_synthesizer: Synthesizer for generating summaries
423
**kwargs: Additional arguments
424
"""
425
def __init__(self, nodes=None, storage_context=None, response_synthesizer=None, **kwargs): ...
426
427
@classmethod
428
def from_documents(cls, documents, storage_context=None, **kwargs):
429
"""Create document summary index from documents."""
430
431
def as_query_engine(self, **kwargs):
432
"""Create query engine for document-level queries."""
433
434
def as_retriever(self, **kwargs):
435
"""Create retriever for document-level retrieval."""
436
```
437
438
### Composable Graph
439
440
Combine multiple indices into a unified query interface for complex multi-index operations.
441
442
```python { .api }
443
class ComposableGraph:
444
"""
445
Composable graph for combining multiple indices.
446
447
Args:
448
all_indices: Dictionary mapping index IDs to indices
449
index_summaries: Dictionary mapping index IDs to summaries
450
**kwargs: Additional arguments
451
"""
452
def __init__(self, all_indices, index_summaries=None, **kwargs): ...
453
454
def as_query_engine(self, **kwargs):
455
"""Create query engine for multi-index queries."""
456
457
def as_retriever(self, **kwargs):
458
"""Create retriever for multi-index retrieval."""
459
```
460
461
## Index Loading and Storage
462
463
```python { .api }
464
def load_index_from_storage(storage_context, index_id=None, **kwargs):
465
"""
466
Load index from storage.
467
468
Args:
469
storage_context: Storage context containing persisted index
470
index_id: Optional index ID to load specific index
471
472
Returns:
473
BaseIndex: Loaded index
474
"""
475
476
def load_indices_from_storage(storage_context, index_ids=None, **kwargs):
477
"""
478
Load multiple indices from storage.
479
480
Args:
481
storage_context: Storage context
482
index_ids: List of index IDs to load
483
484
Returns:
485
dict: Dictionary mapping index IDs to loaded indices
486
"""
487
488
def load_graph_from_storage(storage_context, root_id=None, **kwargs):
489
"""
490
Load composable graph from storage.
491
492
Args:
493
storage_context: Storage context
494
root_id: Root node ID for the graph
495
496
Returns:
497
ComposableGraph: Loaded composable graph
498
"""
499
```
500
501
## Types
502
503
```python { .api }
504
from enum import Enum
505
506
class IndexStructType(Enum):
507
"""Enumeration of available index types."""
508
TREE = "tree"
509
LIST = "list"
510
KEYWORD_TABLE = "keyword_table"
511
VECTOR_STORE = "vector_store"
512
DOCUMENT_SUMMARY = "document_summary"
513
KNOWLEDGE_GRAPH = "kg"
514
PROPERTY_GRAPH = "property_graph"
515
```