CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-milvus

Milvus embedding store integration for LangChain4j

Overview
Eval results
Files

advanced.mddocs/

Advanced Topics

Advanced configuration and optimization for MilvusEmbeddingStore.

Index Tuning

HNSW Index Parameters

Optimal for: Fast approximate search, 100K-10M vectors

import io.milvus.param.IndexType;
import io.milvus.param.MetricType;
import java.util.Map;

MilvusEmbeddingStore.builder()
    .indexType(IndexType.HNSW)
    .metricType(MetricType.COSINE)
    .extraParameters(Map.of(
        "efConstruction", 200,  // Build-time: 64-512, higher=better accuracy/slower build
        "m", 16                 // Connections: 4-64, higher=better accuracy/more memory
    ))
    .build();

Parameter Guidelines:

  • efConstruction: Start with 200, increase for better accuracy
  • m: Start with 16, increase for better recall
  • Memory usage: ~(dimension × m × 4 bytes) per vector
  • Build time increases with higher values

Use Cases:

  • High QPS requirements
  • Low-latency search
  • Sufficient memory available

IVF_FLAT Index Parameters

Optimal for: Balanced performance, 100K-5M vectors

MilvusEmbeddingStore.builder()
    .indexType(IndexType.IVF_FLAT)
    .extraParameters(Map.of(
        "nlist", 1024  // Clusters: 1-65536, rule: sqrt(N) to 4*sqrt(N)
    ))
    .build();

Parameter Guidelines:

  • nlist: ~sqrt(num_vectors) to 4×sqrt(num_vectors)
  • 100K vectors → nlist: 300-1200
  • 1M vectors → nlist: 1000-4000

Use Cases:

  • Balanced accuracy/speed
  • Moderate dataset sizes
  • Memory constrained vs HNSW

IVF_PQ Index Parameters

Optimal for: Large datasets, memory efficiency

MilvusEmbeddingStore.builder()
    .indexType(IndexType.IVF_PQ)
    .extraParameters(Map.of(
        "m", 8,         // Subquantizers: divides dimension, must divide evenly
        "nlist", 2048   // Clusters
    ))
    .build();

Parameter Guidelines:

  • m: dimension must be divisible by m (e.g., 384/8=48)
  • Lower m = more compression = less accuracy
  • Memory reduction: ~dimension/m bytes per vector

Use Cases:

  • 5M vectors

  • Limited memory
  • Acceptable accuracy trade-off

DISKANN Index

Optimal for: Very large datasets (> 10M vectors)

MilvusEmbeddingStore.builder()
    .indexType(IndexType.DISKANN)
    .build();

Use Cases:

  • Massive scale (billions of vectors)
  • Disk-based storage
  • Cost optimization for large collections

Consistency Level Trade-offs

EVENTUALLY (Default)

import io.milvus.common.clientenum.ConsistencyLevelEnum;

.consistencyLevel(ConsistencyLevelEnum.EVENTUALLY)

Characteristics:

  • Highest throughput
  • Lowest latency
  • New data may not be immediately visible
  • Suitable for: Analytics, batch processing, eventually consistent systems

Visibility Delay:

  • Typically < 1 second
  • Can be longer under high load

BOUNDED

.consistencyLevel(ConsistencyLevelEnum.BOUNDED)

Characteristics:

  • Bounded staleness (configurable lag)
  • Good throughput
  • Required for complex filter-based deletions
  • Suitable for: Most production use cases

Use When:

  • Need filter-based deletions
  • Can tolerate small lag
  • Balance between performance and consistency

SESSION

.consistencyLevel(ConsistencyLevelEnum.SESSION)

Characteristics:

  • Per-session consistency guarantees
  • Read your own writes within session
  • Moderate performance impact

Use When:

  • Single user/session workflows
  • Need read-after-write within session

STRONG

.consistencyLevel(ConsistencyLevelEnum.STRONG)

Characteristics:

  • Immediate consistency
  • Reads always reflect latest writes
  • Higher latency
  • Required for: Immediate visibility of additions/deletions

Use When:

  • Real-time search requirements
  • Immediate deletion visibility needed
  • Accuracy > performance

Performance Impact:

  • 2-5× higher search latency
  • Lower throughput

Connection Pooling

Using Custom Milvus Client

import io.milvus.client.MilvusServiceClient;
import io.milvus.param.ConnectParam;
import io.milvus.param.clientconfiguration.ClientConfiguration;

// Create shared client
ConnectParam connectParam = ConnectParam.newBuilder()
    .withHost("localhost")
    .withPort(19530)
    .withAuthorization("user", "password")
    .build();

MilvusServiceClient sharedClient = new MilvusServiceClient(connectParam);

// Reuse across multiple stores
MilvusEmbeddingStore store1 = MilvusEmbeddingStore.builder()
    .milvusClient(sharedClient)
    .collectionName("collection1")
    .dimension(384)
    .build();

MilvusEmbeddingStore store2 = MilvusEmbeddingStore.builder()
    .milvusClient(sharedClient)
    .collectionName("collection2")
    .dimension(768)
    .build();

Benefits:

  • Connection reuse
  • Reduced overhead
  • Better resource utilization

Custom Field Names

Customize schema for compatibility or preferences:

MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("custom_schema")
    .dimension(384)
    .idFieldName("document_id")
    .textFieldName("content")
    .metadataFieldName("properties")
    .vectorFieldName("embedding_vector")
    .build();

Use Cases:

  • Existing schema compatibility
  • Naming conventions
  • Multi-tenant deployments

Multi-Database Support

Milvus supports multiple databases for tenant isolation:

// Tenant 1
MilvusEmbeddingStore tenant1Store = MilvusEmbeddingStore.builder()
    .host("localhost")
    .databaseName("tenant_1")
    .collectionName("embeddings")
    .dimension(384)
    .build();

// Tenant 2
MilvusEmbeddingStore tenant2Store = MilvusEmbeddingStore.builder()
    .host("localhost")
    .databaseName("tenant_2")
    .collectionName("embeddings")
    .dimension(384)
    .build();

Benefits:

  • Logical isolation
  • Separate access control
  • Independent scaling

Performance Optimization

For Bulk Insertion

MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("bulk_insert")
    .dimension(384)
    .autoFlushOnInsert(false)              // Critical for bulk
    .consistencyLevel(ConsistencyLevelEnum.EVENTUALLY)
    .indexType(IndexType.IVF_FLAT)         // Build after bulk insert
    .build();

Pattern:

  1. Disable auto-flush
  2. Use eventual consistency
  3. Batch with addAll()
  4. Consider simple index initially

For High QPS Search

MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("high_qps")
    .dimension(384)
    .indexType(IndexType.HNSW)
    .metricType(MetricType.COSINE)
    .consistencyLevel(ConsistencyLevelEnum.EVENTUALLY)
    .retrieveEmbeddingsOnSearch(false)     // Critical for QPS
    .extraParameters(Map.of("efConstruction", 512, "m", 32))
    .build();

Key Settings:

  • HNSW index with tuned parameters
  • Disable embedding retrieval
  • Eventual consistency
  • Pre-warm queries after restart

For Real-Time Updates

MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("realtime")
    .dimension(384)
    .indexType(IndexType.IVF_FLAT)
    .consistencyLevel(ConsistencyLevelEnum.STRONG)  // Critical
    .autoFlushOnInsert(true)                        // For durability
    .build();

Trade-offs:

  • Strong consistency = higher latency
  • Auto-flush = lower throughput
  • Simpler index = faster updates

For Memory-Constrained Environments

MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("memory_efficient")
    .dimension(384)
    .indexType(IndexType.IVF_PQ)
    .extraParameters(Map.of("m", 8, "nlist", 2048))
    .retrieveEmbeddingsOnSearch(false)
    .build();

Metric Type Selection

COSINE

.metricType(MetricType.COSINE)

Best For:

  • Normalized embeddings
  • Text similarity
  • When direction matters more than magnitude

Score Range: 0 to 1 (1 = most similar)

L2 (Euclidean)

.metricType(MetricType.L2)

Best For:

  • Non-normalized vectors
  • Absolute distance matters
  • Image embeddings

Score Range: 0 to ∞ (0 = most similar, lower = better)

IP (Inner Product)

.metricType(MetricType.IP)

Best For:

  • Normalized embeddings
  • Faster than COSINE
  • When magnitude is already normalized

Score Range: -1 to 1 (1 = most similar, higher = better)

Performance: IP is ~10-20% faster than COSINE for normalized vectors

Search Optimization

Metadata Filter Efficiency

import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;

// Efficient: Single equality
Filter efficientFilter = metadataKey("category").isEqualTo("tech");

// Less efficient: Complex nested conditions
Filter complexFilter = and(
    or(
        metadataKey("cat1").isEqualTo("a"),
        metadataKey("cat2").isEqualTo("b")
    ),
    metadataKey("year").isGreaterThan(2020)
);

Best Practices:

  • Use simple equality when possible
  • Index commonly filtered fields
  • Avoid deeply nested conditions
  • Pre-filter by indexed fields

Result Limit Tuning

// Request only what you need
EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)  // Not 100 if you only need 10
    .build();

Impact: Each result requires deserialization and memory

Score Threshold Optimization

// Use threshold to reduce result processing
EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(100)
    .minScore(0.75)  // Filter at query time
    .build();

Monitoring and Diagnostics

Collection Stats

// Use Milvus SDK directly for advanced operations
import io.milvus.client.MilvusServiceClient;
import io.milvus.param.collection.GetCollectionStatisticsParam;
import io.milvus.grpc.GetCollectionStatisticsResponse;

// Access through custom client pattern
MilvusServiceClient client = /* from store or create */;

GetCollectionStatisticsParam param = GetCollectionStatisticsParam.newBuilder()
    .withCollectionName("my_collection")
    .build();

GetCollectionStatisticsResponse stats = client.getCollectionStatistics(param);

Production Checklist

Configuration:

  • ✓ Appropriate index type for scale
  • ✓ Tuned index parameters
  • ✓ Consistency level matches requirements
  • ✓ Auto-flush disabled for batch operations
  • ✓ Retrieve embeddings only if needed

Monitoring:

  • ✓ Query latency tracking
  • ✓ Insertion throughput metrics
  • ✓ Collection size monitoring
  • ✓ Memory usage tracking

Resilience:

  • ✓ Connection retry logic
  • ✓ Error handling on operations
  • ✓ Validation before bulk operations
  • ✓ Backup strategy for collections

Security:

  • ✓ Authentication configured
  • ✓ Network isolation
  • ✓ Access control per database/collection
  • ✓ Credentials from environment/secrets

Advanced Use Cases

Hybrid Search (Vector + Scalar)

// Vector similarity with metadata filtering
Filter metadataFilter = metadataKey("department").isEqualTo("engineering");

EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(20)
    .filter(metadataFilter)  // Combines vector + scalar filtering
    .build();

Multi-Vector Search

// Search multiple embeddings, aggregate results
List<Embedding> queryEmbeddings = List.of(embedding1, embedding2, embedding3);
Map<String, Double> aggregatedScores = new HashMap<>();

for (Embedding queryEmb : queryEmbeddings) {
    EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
        .queryEmbedding(queryEmb)
        .maxResults(50)
        .build();
    
    for (EmbeddingMatch<TextSegment> match : store.search(request).matches()) {
        aggregatedScores.merge(
            match.embeddingId(),
            match.score(),
            (existing, newScore) -> Math.max(existing, newScore)
        );
    }
}

// Get top results from aggregated scores
List<String> topIds = aggregatedScores.entrySet().stream()
    .sorted(Map.Entry.<String, Double>comparingByValue().reversed())
    .limit(10)
    .map(Map.Entry::getKey)
    .collect(Collectors.toList());

Collection Aliasing

// Use Milvus SDK for advanced collection management
import io.milvus.param.collection.CreateAliasParam;

// Point alias to active collection version
CreateAliasParam param = CreateAliasParam.newBuilder()
    .withCollectionName("embeddings_v2")
    .withAlias("embeddings_active")
    .build();

client.createAlias(param);

// Use alias in store
MilvusEmbeddingStore store = MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("embeddings_active")  // Uses alias
    .dimension(384)
    .build();

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-milvus@1.11.0

docs

advanced.md

api-reference.md

configuration.md

index.md

patterns.md

quickstart.md

troubleshooting.md

tile.json