0
# PyMilvus Integration
1
2
The primary and recommended way to use milvus-lite is through the pymilvus client, which automatically activates milvus-lite when using local file URIs. This approach provides access to the complete Milvus API surface including collections, vector operations, indexing, and querying.
3
4
## Capabilities
5
6
### Client Initialization
7
8
Create a MilvusClient instance that automatically uses milvus-lite for local database files.
9
10
```python { .api }
11
from pymilvus import MilvusClient
12
13
# Local file URI activates milvus-lite automatically
14
client = MilvusClient(uri="./database.db")
15
16
# Alternative: specify full path
17
client = MilvusClient(uri="/path/to/database.db")
18
```
19
20
**Usage Example:**
21
22
```python
23
from pymilvus import MilvusClient
24
25
# Initialize client - this starts milvus-lite internally
26
client = MilvusClient("./my_vector_db.db")
27
28
# Client is ready for all standard Milvus operations
29
collection_exists = client.has_collection("test_collection")
30
```
31
32
### Collection Management
33
34
Full collection lifecycle management including creation, deletion, listing, and metadata operations.
35
36
```python { .api }
37
# Collection creation with schema
38
client.create_collection(
39
collection_name: str,
40
dimension: int,
41
primary_field_name: str = "id",
42
id_type: str = "int",
43
vector_field_name: str = "vector",
44
metric_type: str = "COSINE",
45
auto_id: bool = False,
46
timeout: Optional[float] = None,
47
**kwargs
48
) -> None
49
50
# Collection existence check
51
client.has_collection(collection_name: str, timeout: Optional[float] = None) -> bool
52
53
# Collection deletion
54
client.drop_collection(collection_name: str, timeout: Optional[float] = None) -> None
55
56
# List all collections
57
client.list_collections(timeout: Optional[float] = None) -> List[str]
58
59
# Get collection statistics
60
client.describe_collection(collection_name: str, timeout: Optional[float] = None) -> Dict[str, Any]
61
```
62
63
**Usage Example:**
64
65
```python
66
# Create collection with 384-dimensional vectors
67
client.create_collection(
68
collection_name="embeddings",
69
dimension=384,
70
metric_type="COSINE",
71
auto_id=True
72
)
73
74
# Check if collection exists
75
if client.has_collection("embeddings"):
76
stats = client.describe_collection("embeddings")
77
print(f"Collection has {stats['num_entities']} entities")
78
```
79
80
### Data Operations
81
82
Insert, upsert, delete, and query operations for vector data with support for batch operations and metadata filtering.
83
84
```python { .api }
85
# Insert data
86
client.insert(
87
collection_name: str,
88
data: List[Dict[str, Any]],
89
partition_name: Optional[str] = None,
90
timeout: Optional[float] = None
91
) -> Dict[str, Any]
92
93
# Upsert data (insert or update if exists)
94
client.upsert(
95
collection_name: str,
96
data: List[Dict[str, Any]],
97
partition_name: Optional[str] = None,
98
timeout: Optional[float] = None
99
) -> Dict[str, Any]
100
101
# Delete data by filter expression
102
client.delete(
103
collection_name: str,
104
filter: str,
105
partition_name: Optional[str] = None,
106
timeout: Optional[float] = None
107
) -> Dict[str, Any]
108
109
# Query data by filter
110
client.query(
111
collection_name: str,
112
filter: str,
113
output_fields: Optional[List[str]] = None,
114
partition_names: Optional[List[str]] = None,
115
timeout: Optional[float] = None
116
) -> List[Dict[str, Any]]
117
```
118
119
**Usage Example:**
120
121
```python
122
# Insert vector data with metadata
123
data = [
124
{"id": 1, "vector": [0.1, 0.2, 0.3], "category": "document", "title": "Sample Doc"},
125
{"id": 2, "vector": [0.4, 0.5, 0.6], "category": "image", "title": "Sample Image"}
126
]
127
128
result = client.insert(collection_name="embeddings", data=data)
129
print(f"Inserted {result['insert_count']} entities")
130
131
# Query with filter
132
results = client.query(
133
collection_name="embeddings",
134
filter='category == "document"',
135
output_fields=["id", "title", "category"]
136
)
137
```
138
139
### Vector Search
140
141
High-performance vector similarity search with support for various distance metrics, filtering, and result limiting.
142
143
```python { .api }
144
# Vector similarity search
145
client.search(
146
collection_name: str,
147
data: List[List[float]],
148
filter: Optional[str] = None,
149
limit: int = 10,
150
output_fields: Optional[List[str]] = None,
151
search_params: Optional[Dict[str, Any]] = None,
152
partition_names: Optional[List[str]] = None,
153
timeout: Optional[float] = None
154
) -> List[List[Dict[str, Any]]]
155
156
# Hybrid search (multiple vector fields)
157
client.hybrid_search(
158
collection_name: str,
159
reqs: List[Dict[str, Any]],
160
ranker: Dict[str, Any],
161
limit: int = 10,
162
partition_names: Optional[List[str]] = None,
163
output_fields: Optional[List[str]] = None,
164
timeout: Optional[float] = None
165
) -> List[List[Dict[str, Any]]]
166
```
167
168
**Usage Example:**
169
170
```python
171
# Single vector search
172
query_vector = [0.15, 0.25, 0.35] # Query embedding
173
results = client.search(
174
collection_name="embeddings",
175
data=[query_vector],
176
filter='category == "document"',
177
limit=5,
178
output_fields=["id", "title", "category"]
179
)
180
181
# Process results
182
for hits in results:
183
for hit in hits:
184
print(f"ID: {hit['id']}, Score: {hit['distance']}, Title: {hit['entity']['title']}")
185
```
186
187
### Index Management
188
189
Create and manage vector indexes for improved search performance, with support for different index types and parameters.
190
191
```python { .api }
192
# Create index on vector field
193
client.create_index(
194
collection_name: str,
195
field_name: str,
196
index_params: Dict[str, Any],
197
timeout: Optional[float] = None
198
) -> None
199
200
# Drop index
201
client.drop_index(
202
collection_name: str,
203
field_name: str,
204
timeout: Optional[float] = None
205
) -> None
206
207
# List indexes
208
client.list_indexes(
209
collection_name: str,
210
timeout: Optional[float] = None
211
) -> List[str]
212
213
# Describe index
214
client.describe_index(
215
collection_name: str,
216
field_name: str,
217
timeout: Optional[float] = None
218
) -> Dict[str, Any]
219
```
220
221
**Usage Example:**
222
223
```python
224
# Create IVF_FLAT index for better performance on larger datasets
225
index_params = {
226
"index_type": "IVF_FLAT",
227
"metric_type": "COSINE",
228
"params": {"nlist": 128}
229
}
230
231
client.create_index(
232
collection_name="embeddings",
233
field_name="vector",
234
index_params=index_params
235
)
236
237
# Check index information
238
index_info = client.describe_index(
239
collection_name="embeddings",
240
field_name="vector"
241
)
242
print(f"Index type: {index_info['index_type']}")
243
```
244
245
## Connection Management
246
247
```python { .api }
248
# Client automatically manages connection lifecycle
249
# No explicit connect/disconnect needed for milvus-lite
250
251
# Client will use file-based connection for local URIs
252
# Connection is established on first operation
253
```
254
255
## Supported Features
256
257
- **Vector Types**: Dense vectors (float32), sparse vectors, binary vectors, bfloat16 vectors
258
- **Metadata**: JSON, integers, floats, strings, arrays
259
- **Filtering**: Rich expression language for metadata filtering
260
- **Indexing**: FLAT and IVF_FLAT index types (version dependent)
261
- **Batch Operations**: Efficient bulk insert, upsert, and delete operations
262
- **Multi-vector**: Multiple vector fields per collection
263
264
## Limitations
265
266
- No partition support (milvus-lite limitation)
267
- No user authentication/RBAC
268
- No collection aliases
269
- Limited to ~1 million vectors for optimal performance
270
- Fewer index types compared to full Milvus deployment