CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-milvus-lite

A lightweight version of Milvus wrapped with Python for vector similarity search in AI applications

Pending
Overview
Eval results
Files

cli-tools.mddocs/

Command Line Tools

Data export and migration utilities for moving data between milvus-lite and other Milvus deployments. The command line interface provides tools for collection dumping, data format conversion, and bulk data operations.

Capabilities

Installation

The CLI tools are included with milvus-lite but require additional dependencies for data export functionality.

# Install milvus-lite with bulk writer dependencies
pip install -U "pymilvus[bulk_writer]"

# Verify CLI is available
milvus-lite --help

Dump Command

Export collection data from milvus-lite database to JSON format files for migration to other Milvus deployments.

milvus-lite dump -d DB_FILE -c COLLECTION -p PATH

# Required arguments:
# -d, --db-file DB_FILE        milvus lite database file path
# -c, --collection COLLECTION  collection name to dump
# -p, --path PATH              output directory for dump files

# Optional arguments:
# -h, --help                   show help message and exit

Usage Examples:

# Basic collection dump
milvus-lite dump -d ./my_vectors.db -c embeddings -p ./export_data

# Dump with full paths
milvus-lite dump --db-file /home/user/data/vectors.db \
                 --collection user_profiles \
                 --path /tmp/migration_data

# Export multiple collections (run command for each)
milvus-lite dump -d ./app.db -c collection1 -p ./exports/collection1
milvus-lite dump -d ./app.db -c collection2 -p ./exports/collection2

Data Export Process

The dump command performs comprehensive data export with support for various vector types and metadata formats.

Export Features:

  • Complete data export: All entities including vectors and metadata
  • Vector type support: Dense, sparse, binary, and bfloat16 vectors
  • Metadata preservation: All scalar fields and JSON data
  • Progress tracking: Real-time progress bars for large collections
  • Format conversion: Automatic conversion of specialized vector types

Export Process:

  1. Validates database file and collection existence
  2. Analyzes collection schema and data types
  3. Creates output directory structure
  4. Exports data in batches with progress tracking
  5. Converts vector formats for compatibility
  6. Generates JSON files suitable for bulk import

Usage Example:

# Programmatic access to dump functionality
from milvus_lite.cmdline import dump_collection

try:
    dump_collection(
        db_file="./my_database.db",
        collection_name="embeddings", 
        path="./export_directory"
    )
    print("Export completed successfully")
except RuntimeError as e:
    print(f"Export failed: {e}")

Vector Type Conversion

Automatic conversion of specialized vector types during export for compatibility with import tools.

def bfloat16_to_float32(byte_data: bytes) -> np.ndarray:
    """
    Convert bfloat16 byte data to float32 numpy array.
    
    Parameters:
    - byte_data (bytes): Raw bfloat16 vector data
    
    Returns:
    - np.ndarray: Converted float32 array
    """

def binary_to_int_list(packed_bytes: bytes) -> np.ndarray:
    """
    Convert packed binary vectors to integer list representation.
    
    Parameters:  
    - packed_bytes (bytes): Packed binary vector data
    
    Returns:
    - np.ndarray: Unpacked binary vector as integer array
    """

These conversion functions are automatically applied during the dump process to ensure exported data is compatible with bulk import tools.

JSON Encoding

Custom JSON encoder for handling Milvus-specific data types during export.

class MilvusEncoder(json.JSONEncoder):
    """
    JSON encoder for Milvus data types.
    
    Handles numpy arrays, float types, and other Milvus-specific
    data structures for proper JSON serialization.
    """
    
    def default(self, obj):
        """
        Convert Milvus objects to JSON-serializable format.
        
        Supports:
        - numpy.ndarray -> list
        - numpy.float32/float16 -> float
        - Other standard JSON types
        """

Data Migration Workflow

Complete workflow for migrating data from milvus-lite to other Milvus deployments.

Step 1: Export from Milvus Lite

# Export collection data
milvus-lite dump -d ./source.db -c my_collection -p ./migration_data

# This creates JSON files in ./migration_data/ directory

Step 2: Import to Target Milvus

For Zilliz Cloud (managed Milvus):

  • Use the Data Import feature
  • Upload the exported JSON files through the web interface
  • Configure collection schema to match exported data

For Self-hosted Milvus:

  • Use Bulk Insert API
  • Configure bulk insert job with exported JSON files
  • Monitor import progress through Milvus client

Step 3: Verify Migration

# Verify data after migration
from pymilvus import MilvusClient

# Connect to target Milvus instance
target_client = MilvusClient(uri="http://target-milvus:19530")

# Check collection exists and has expected data
if target_client.has_collection("my_collection"):
    stats = target_client.describe_collection("my_collection")
    print(f"Migrated collection has {stats['num_entities']} entities")
    
    # Sample some data to verify
    results = target_client.query(
        collection_name="my_collection",
        filter="",  # No filter, get any records
        limit=5,
        output_fields=["*"]
    )
    print(f"Sample migrated data: {results}")

Error Handling

The CLI tools provide comprehensive error handling and validation.

# Common errors and exceptions:
# - RuntimeError: Database file not found, collection doesn't exist
# - FileNotFoundError: Invalid export path or permissions
# - PermissionError: Insufficient file system permissions
# - ValueError: Invalid arguments or collection schema issues

Error Examples:

# Database file doesn't exist
$ milvus-lite dump -d ./missing.db -c test -p ./out
# RuntimeError: db_file: ./missing.db not exists

# Collection doesn't exist  
$ milvus-lite dump -d ./valid.db -c missing_collection -p ./out
# RuntimeError: Collection: missing_collection not exists

# Invalid export path
$ milvus-lite dump -d ./valid.db -c test -p /invalid/path
# RuntimeError: dump path(/invalid/path)'s parent dir not exists

Performance Considerations

The dump command is optimized for large collections with configurable batch sizes and memory management.

Performance Features:

  • Streaming export: Processes data in batches to manage memory usage
  • Progress tracking: Real-time progress bars for long-running exports
  • Configurable batch size: Default 512MB segments for optimal performance
  • Parallel processing: Efficient data conversion and serialization

Large Collection Handling:

# The dump process automatically handles large collections
# by using query iterators and batch processing

# Default configuration optimized for performance:
# - Segment size: 512MB  
# - File type: JSON
# - Batch processing with progress tracking
# - Memory-efficient streaming

Integration with Migration Tools

The exported JSON files are designed for compatibility with various Milvus import tools.

Supported Import Destinations:

  • Zilliz Cloud: Native data import interface
  • Milvus Standalone: Bulk insert API
  • Milvus Distributed: Bulk insert API
  • Custom applications: Standard JSON format for processing

Export Format:

  • JSON files with entity records
  • Compatible with Milvus bulk insert specifications
  • Preserves all vector types and metadata
  • Includes collection schema information

Install with Tessl CLI

npx tessl i tessl/pypi-milvus-lite

docs

cli-tools.md

index.md

multi-instance.md

pymilvus-integration.md

server-management.md

tile.json