0
# Command Line Tools
1
2
Data export and migration utilities for moving data between milvus-lite and other Milvus deployments. The command line interface provides tools for collection dumping, data format conversion, and bulk data operations.
3
4
## Capabilities
5
6
### Installation
7
8
The CLI tools are included with milvus-lite but require additional dependencies for data export functionality.
9
10
```bash { .api }
11
# Install milvus-lite with bulk writer dependencies
12
pip install -U "pymilvus[bulk_writer]"
13
14
# Verify CLI is available
15
milvus-lite --help
16
```
17
18
### Dump Command
19
20
Export collection data from milvus-lite database to JSON format files for migration to other Milvus deployments.
21
22
```bash { .api }
23
milvus-lite dump -d DB_FILE -c COLLECTION -p PATH
24
25
# Required arguments:
26
# -d, --db-file DB_FILE milvus lite database file path
27
# -c, --collection COLLECTION collection name to dump
28
# -p, --path PATH output directory for dump files
29
30
# Optional arguments:
31
# -h, --help show help message and exit
32
```
33
34
**Usage Examples:**
35
36
```bash
37
# Basic collection dump
38
milvus-lite dump -d ./my_vectors.db -c embeddings -p ./export_data
39
40
# Dump with full paths
41
milvus-lite dump --db-file /home/user/data/vectors.db \
42
--collection user_profiles \
43
--path /tmp/migration_data
44
45
# Export multiple collections (run command for each)
46
milvus-lite dump -d ./app.db -c collection1 -p ./exports/collection1
47
milvus-lite dump -d ./app.db -c collection2 -p ./exports/collection2
48
```
49
50
### Data Export Process
51
52
The dump command performs comprehensive data export with support for various vector types and metadata formats.
53
54
**Export Features:**
55
- **Complete data export**: All entities including vectors and metadata
56
- **Vector type support**: Dense, sparse, binary, and bfloat16 vectors
57
- **Metadata preservation**: All scalar fields and JSON data
58
- **Progress tracking**: Real-time progress bars for large collections
59
- **Format conversion**: Automatic conversion of specialized vector types
60
61
**Export Process:**
62
1. Validates database file and collection existence
63
2. Analyzes collection schema and data types
64
3. Creates output directory structure
65
4. Exports data in batches with progress tracking
66
5. Converts vector formats for compatibility
67
6. Generates JSON files suitable for bulk import
68
69
**Usage Example:**
70
71
```python
72
# Programmatic access to dump functionality
73
from milvus_lite.cmdline import dump_collection
74
75
try:
76
dump_collection(
77
db_file="./my_database.db",
78
collection_name="embeddings",
79
path="./export_directory"
80
)
81
print("Export completed successfully")
82
except RuntimeError as e:
83
print(f"Export failed: {e}")
84
```
85
86
### Vector Type Conversion
87
88
Automatic conversion of specialized vector types during export for compatibility with import tools.
89
90
```python { .api }
91
def bfloat16_to_float32(byte_data: bytes) -> np.ndarray:
92
"""
93
Convert bfloat16 byte data to float32 numpy array.
94
95
Parameters:
96
- byte_data (bytes): Raw bfloat16 vector data
97
98
Returns:
99
- np.ndarray: Converted float32 array
100
"""
101
102
def binary_to_int_list(packed_bytes: bytes) -> np.ndarray:
103
"""
104
Convert packed binary vectors to integer list representation.
105
106
Parameters:
107
- packed_bytes (bytes): Packed binary vector data
108
109
Returns:
110
- np.ndarray: Unpacked binary vector as integer array
111
"""
112
```
113
114
These conversion functions are automatically applied during the dump process to ensure exported data is compatible with bulk import tools.
115
116
### JSON Encoding
117
118
Custom JSON encoder for handling Milvus-specific data types during export.
119
120
```python { .api }
121
class MilvusEncoder(json.JSONEncoder):
122
"""
123
JSON encoder for Milvus data types.
124
125
Handles numpy arrays, float types, and other Milvus-specific
126
data structures for proper JSON serialization.
127
"""
128
129
def default(self, obj):
130
"""
131
Convert Milvus objects to JSON-serializable format.
132
133
Supports:
134
- numpy.ndarray -> list
135
- numpy.float32/float16 -> float
136
- Other standard JSON types
137
"""
138
```
139
140
### Data Migration Workflow
141
142
Complete workflow for migrating data from milvus-lite to other Milvus deployments.
143
144
**Step 1: Export from Milvus Lite**
145
146
```bash
147
# Export collection data
148
milvus-lite dump -d ./source.db -c my_collection -p ./migration_data
149
150
# This creates JSON files in ./migration_data/ directory
151
```
152
153
**Step 2: Import to Target Milvus**
154
155
For **Zilliz Cloud** (managed Milvus):
156
- Use the [Data Import](https://docs.zilliz.com/docs/data-import) feature
157
- Upload the exported JSON files through the web interface
158
- Configure collection schema to match exported data
159
160
For **Self-hosted Milvus**:
161
- Use [Bulk Insert](https://milvus.io/docs/import-data.md) API
162
- Configure bulk insert job with exported JSON files
163
- Monitor import progress through Milvus client
164
165
**Step 3: Verify Migration**
166
167
```python
168
# Verify data after migration
169
from pymilvus import MilvusClient
170
171
# Connect to target Milvus instance
172
target_client = MilvusClient(uri="http://target-milvus:19530")
173
174
# Check collection exists and has expected data
175
if target_client.has_collection("my_collection"):
176
stats = target_client.describe_collection("my_collection")
177
print(f"Migrated collection has {stats['num_entities']} entities")
178
179
# Sample some data to verify
180
results = target_client.query(
181
collection_name="my_collection",
182
filter="", # No filter, get any records
183
limit=5,
184
output_fields=["*"]
185
)
186
print(f"Sample migrated data: {results}")
187
```
188
189
### Error Handling
190
191
The CLI tools provide comprehensive error handling and validation.
192
193
```python { .api }
194
# Common errors and exceptions:
195
# - RuntimeError: Database file not found, collection doesn't exist
196
# - FileNotFoundError: Invalid export path or permissions
197
# - PermissionError: Insufficient file system permissions
198
# - ValueError: Invalid arguments or collection schema issues
199
```
200
201
**Error Examples:**
202
203
```bash
204
# Database file doesn't exist
205
$ milvus-lite dump -d ./missing.db -c test -p ./out
206
# RuntimeError: db_file: ./missing.db not exists
207
208
# Collection doesn't exist
209
$ milvus-lite dump -d ./valid.db -c missing_collection -p ./out
210
# RuntimeError: Collection: missing_collection not exists
211
212
# Invalid export path
213
$ milvus-lite dump -d ./valid.db -c test -p /invalid/path
214
# RuntimeError: dump path(/invalid/path)'s parent dir not exists
215
```
216
217
### Performance Considerations
218
219
The dump command is optimized for large collections with configurable batch sizes and memory management.
220
221
**Performance Features:**
222
- **Streaming export**: Processes data in batches to manage memory usage
223
- **Progress tracking**: Real-time progress bars for long-running exports
224
- **Configurable batch size**: Default 512MB segments for optimal performance
225
- **Parallel processing**: Efficient data conversion and serialization
226
227
**Large Collection Handling:**
228
229
```python
230
# The dump process automatically handles large collections
231
# by using query iterators and batch processing
232
233
# Default configuration optimized for performance:
234
# - Segment size: 512MB
235
# - File type: JSON
236
# - Batch processing with progress tracking
237
# - Memory-efficient streaming
238
```
239
240
### Integration with Migration Tools
241
242
The exported JSON files are designed for compatibility with various Milvus import tools.
243
244
**Supported Import Destinations:**
245
- **Zilliz Cloud**: Native data import interface
246
- **Milvus Standalone**: Bulk insert API
247
- **Milvus Distributed**: Bulk insert API
248
- **Custom applications**: Standard JSON format for processing
249
250
**Export Format:**
251
- JSON files with entity records
252
- Compatible with Milvus bulk insert specifications
253
- Preserves all vector types and metadata
254
- Includes collection schema information