CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-milvus

Milvus embedding store integration for LangChain4j

Overview
Eval results
Files

removing.mddocs/operations/

Removing Embeddings

Methods for deleting embeddings from Milvus collections.

Requirements: Milvus 2.3.x or newer

Methods

Remove by IDs

void removeAll(Collection<String> ids);

Removes embeddings with specified IDs.

Throws: Exception if ids is null or empty

import java.util.Arrays;

List<String> idsToRemove = Arrays.asList("id1", "id2", "id3");
store.removeAll(idsToRemove);

Remove by Filter

void removeAll(Filter filter);

Removes all embeddings matching metadata filter.

Throws: Exception if filter is null

Requires: Consistency level BOUNDED for complex filters

import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;

Filter filter = metadataKey("status").isEqualTo("archived");
store.removeAll(filter);

Remove All

void removeAll();

Removes all embeddings from collection.

store.removeAll();

Filter-Based Removal

By Category

Filter filter = metadataKey("category").isEqualTo("temporary");
store.removeAll(filter);

By Date Range

Filter oldFilter = metadataKey("year").isLessThan(2020);
store.removeAll(oldFilter);

By Multiple Criteria

import static dev.langchain4j.store.embedding.filter.Filter.and;

Filter old = metadataKey("year").isLessThan(2021);
Filter draft = metadataKey("status").isEqualTo("draft");

store.removeAll(and(old, draft));

By Value List

import static dev.langchain4j.store.embedding.filter.Filter.or;

Filter year2019 = metadataKey("year").isEqualTo(2019);
Filter year2020 = metadataKey("year").isEqualTo(2020);

store.removeAll(or(year2019, year2020));

ID-Based Removal

After Processing

List<String> processedIds = new ArrayList<>();

for (EmbeddingMatch<TextSegment> match : results.matches()) {
    processItem(match);
    processedIds.add(match.embeddingId());
}

store.removeAll(processedIds);

Specific Set

Set<String> idsToDelete = new HashSet<>();
idsToDelete.add("doc-123");
idsToDelete.add("doc-456");

store.removeAll(idsToDelete);

After Search

// Find and remove duplicates
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(referenceEmbedding)
    .maxResults(100)
    .minScore(0.99)  // Very high similarity
    .build();

List<String> duplicateIds = store.search(request).matches().stream()
    .skip(1)  // Keep first
    .map(EmbeddingMatch::embeddingId)
    .collect(Collectors.toList());

if (!duplicateIds.isEmpty()) {
    store.removeAll(duplicateIds);
}

Use Case Patterns

Cleanup Old Data

import static dev.langchain4j.store.embedding.filter.MetadataFilterBuilder.metadataKey;
import java.time.LocalDate;

int cutoffYear = LocalDate.now().getYear() - 2;
Filter oldFilter = metadataKey("year").isLessThan(cutoffYear);

store.removeAll(oldFilter);

Remove by Source

Filter sourceFilter = metadataKey("source").isEqualTo("deprecated_api_v1");
store.removeAll(sourceFilter);

Selective Deletion

import static dev.langchain4j.store.embedding.filter.Filter.and;

Filter lowConfidence = metadataKey("confidence").isLessThan(0.5);
Filter temporary = metadataKey("type").isEqualTo("temporary");

store.removeAll(and(lowConfidence, temporary));

Batch ID Deletion

List<String> expiredIds = externalSystem.getExpiredIds();

if (!expiredIds.isEmpty()) {
    store.removeAll(expiredIds);
}

Clear and Rebuild

store.removeAll();  // Clear all

// Rebuild
List<Embedding> newEmbeddings = generateNewEmbeddings();
store.addAll(newEmbeddings);

Critical Warnings

1. Consistency Level Requirement

Problem: Deleted items still visible after deletion

Cause: Consistency level below STRONG

Solution:

import io.milvus.common.clientenum.ConsistencyLevelEnum;

MilvusEmbeddingStore store = MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("my_collection")
    .dimension(384)
    .consistencyLevel(ConsistencyLevelEnum.STRONG)  // Required
    .build();

store.removeAll(idsToDelete);
// Now immediately invisible

2. Complex Filter Requirements

Problem: Filter-based deletion fails

Cause: Requires BOUNDED consistency for complex filters

Solution:

MilvusEmbeddingStore store = MilvusEmbeddingStore.builder()
    .consistencyLevel(ConsistencyLevelEnum.BOUNDED)  // Required
    .build();

Filter filter = metadataKey("status").isEqualTo("archived");
store.removeAll(filter);

3. Non-Atomic Operations

Warning: Filter-based deletions are NOT atomic. If operation fails partway, some data may still be deleted.

try {
    Filter complexFilter = buildComplexFilter();
    store.removeAll(complexFilter);
} catch (Exception e) {
    // Some entities may have been deleted
    logPartialDeletionError(e);
}

4. Performance Impact

Warning: Frequent deletions impact system performance.

Best Practices:

  • Batch deletions when possible
  • Schedule during low-traffic periods
  • Consider soft-delete (metadata flag) for high-frequency scenarios

5. Time Travel Limitations

Warning: Entities deleted beyond Time Travel retention cannot be retrieved.

Best Practices

1. Prefer ID-Based Removal

// Preferred: Direct ID removal
List<String> ids = Arrays.asList("id1", "id2", "id3");
store.removeAll(ids);

// Slower: Filter-based
Filter filter = metadataKey("status").isEqualTo("delete");
store.removeAll(filter);

2. Validate Before Removal

// Preview what will be deleted
Filter filter = metadataKey("status").isEqualTo("archived");

EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(sampleEmbedding)
    .maxResults(1000)
    .filter(filter)
    .build();

int count = store.search(request).matches().size();
System.out.println("Will delete " + count + " embeddings");

// Confirm then delete
if (confirmed) {
    store.removeAll(filter);
}

3. Configure Appropriate Consistency

// For immediate visibility (slower)
.consistencyLevel(ConsistencyLevelEnum.STRONG)

// For eventual consistency (faster)
.consistencyLevel(ConsistencyLevelEnum.EVENTUALLY)

// For filter deletions (required)
.consistencyLevel(ConsistencyLevelEnum.BOUNDED)

4. Batch Deletions

List<String> idsToDelete = new ArrayList<>();

for (Document doc : documentsToDelete) {
    idsToDelete.add(doc.getId());
}

// Single batch deletion
if (!idsToDelete.isEmpty()) {
    store.removeAll(idsToDelete);
}

5. Handle Empty Cases

// Safe to call even if no matches
Filter filter = metadataKey("type").isEqualTo("nonexistent");
store.removeAll(filter);  // No error if no matches

Error Handling

try {
    List<String> ids = Arrays.asList("id1", "id2", "id3");
    store.removeAll(ids);
    System.out.println("Successfully removed");
} catch (IllegalArgumentException e) {
    System.err.println("Invalid input: " + e.getMessage());
} catch (Exception e) {
    System.err.println("Failed to remove: " + e.getMessage());
}

Performance Considerations

ID-based removal:

  • Fastest
  • Most predictable
  • Use when IDs are known

Filter-based removal:

  • Requires collection scan
  • Slower for large collections
  • Use when IDs not available

removeAll():

  • Fastest way to clear collection
  • Use for complete reset

Consistency levels:

  • STRONG: Higher latency, immediate visibility
  • EVENTUALLY: Best performance, delayed visibility
  • BOUNDED: Required for filter deletions

Related

  • Adding Embeddings
  • Searching Embeddings
  • Troubleshooting

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-milvus@1.11.0

docs

operations

adding.md

collection-management.md

removing.md

searching.md

advanced.md

api-reference.md

configuration.md

index.md

patterns.md

quickstart.md

troubleshooting.md

tile.json