High-performance library for approximate nearest neighbor search in high-dimensional vector spaces
npx @tessl/cli install tessl/pypi-ngt@2.3.00
# NGT
1
2
NGT (Neighborhood Graph and Tree) provides Python bindings for high-performance approximate nearest neighbor search in high-dimensional vector spaces. Built on top of a C++ library, it offers both legacy and modern interfaces for indexing and searching large-scale vector datasets with multiple distance functions and data types.
3
4
## Package Information
5
6
- **Package Name**: ngt
7
- **Language**: Python
8
- **Installation**: `pip install ngt`
9
10
## Core Imports
11
12
Modern interface (recommended):
13
14
```python
15
import ngtpy
16
```
17
18
Legacy interface:
19
20
```python
21
from ngt import base as ngt
22
```
23
24
Both interfaces:
25
26
```python
27
import ngt
28
import ngtpy
29
```
30
31
## Basic Usage
32
33
```python
34
import ngtpy
35
import random
36
37
# Create sample high-dimensional vectors
38
dim = 128
39
nb = 1000
40
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
41
query = vectors[0]
42
43
# Create and populate index
44
ngtpy.create("my_index", dim, distance_type="L2", object_type="Float")
45
index = ngtpy.Index("my_index")
46
index.batch_insert(vectors)
47
index.save()
48
49
# Search for nearest neighbors
50
results = index.search(query, size=5, epsilon=0.1)
51
for i, (object_id, distance) in enumerate(results):
52
print(f"{i}: ID={object_id}, Distance={distance}")
53
original_vector = index.get_object(object_id)
54
print(f"Original vector: {original_vector}")
55
56
index.close()
57
```
58
59
## Architecture
60
61
NGT provides a layered architecture supporting multiple indexing approaches:
62
63
- **Graph-based Index (NGT)**: Standard approximate nearest neighbor search using neighborhood graphs
64
- **Quantized Graph Index (QG)**: Memory-efficient quantized vectors with maintained accuracy
65
- **Quantized Blob Index (QBG)**: Advanced quantization with blob storage for maximum compression
66
- **Dual Interface Design**: Modern pybind11 bindings (ngtpy) and legacy ctypes interface (ngt.base)
67
68
The library supports various distance functions (L1, L2, Cosine, Angular, Hamming, Jaccard, Inner Product) and data types (Float32, Float16, uint8) for different use cases in machine learning, computer vision, and recommendation systems.
69
70
## Capabilities
71
72
### Modern Index Interface
73
74
Primary high-performance interface using pybind11 bindings for standard vector indexing and search operations with full feature access.
75
76
```python { .api }
77
# Index creation and management
78
def create(path, dimension, edge_size_for_creation=10, edge_size_for_search=40,
79
distance_type="L2", object_type="Float"): ...
80
81
class Index:
82
def __init__(path, read_only=False, zero_based_numbering=True, log_disabled=False): ...
83
def search(query, size=0, epsilon=-1.0, edge_size=-1, with_distance=True): ...
84
def batch_insert(objects, num_threads=8, target_size_of_graph=0, debug=False): ...
85
def insert(object, debug=False): ...
86
def build_index(num_threads=8, target_size_of_graph=0): ...
87
```
88
89
[Modern Index Interface](./modern-index.md)
90
91
### Quantized Indexes
92
93
Memory-efficient indexing using vector quantization for reduced storage while maintaining search accuracy.
94
95
```python { .api }
96
class QuantizedIndex:
97
def __init__(path, max_no_of_edges=128, zero_based_numbering=True,
98
read_only=False, log_disabled=False): ...
99
def search(query, size=0, epsilon=-1.0, result_expansion=-1.0, edge_size=-1): ...
100
101
class QuantizedBlobIndex:
102
def __init__(path, max_no_of_edges=128, zero_based_numbering=True,
103
read_only=False, log_disabled=False, refinement=False): ...
104
def search(query, size=0, epsilon=float('-inf')): ...
105
```
106
107
[Quantized Indexes](./quantized-indexes.md)
108
109
### Index Optimization
110
111
Tools for improving index performance through graph structure optimization and parameter tuning.
112
113
```python { .api }
114
class Optimizer:
115
def __init__(num_of_outgoings=-1, num_of_incomings=-1, num_of_queries=-1,
116
num_of_results=-1, log_disabled=False): ...
117
def execute(in_path, out_path): ...
118
def adjust_search_coefficients(path): ...
119
```
120
121
[Index Optimization](./optimization.md)
122
123
### Legacy Interface
124
125
Ctypes-based interface providing backward compatibility with manual memory management and simplified API.
126
127
```python { .api }
128
class Index:
129
def __init__(path): ...
130
@staticmethod
131
def create(path, dimension, edge_size_for_creation=10, edge_size_for_search=40,
132
object_type="Float", distance_type="L2"): ...
133
def search(query, k=20, epsilon=0.1): ...
134
def insert(objects, num_threads=8): ...
135
```
136
137
[Legacy Interface](./legacy-interface.md)
138
139
## Types
140
141
```python { .api }
142
class BatchResults:
143
"""Container for batch search results"""
144
def __init__(): ...
145
def get(position): ...
146
def get_ids(): ...
147
def get_indexed_ids(): ...
148
def get_indexed_distances(): ...
149
def get_index(): ...
150
def get_size(): ...
151
152
class ObjectDistance:
153
"""Search result structure (legacy interface)"""
154
id: int
155
distance: float
156
157
class NativeError(Exception):
158
"""Exception for native library errors"""
159
160
class APIError(Exception):
161
"""Exception for API usage errors"""
162
```