CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-milvus

Milvus embedding store integration for LangChain4j

Overview
Eval results
Files

adding.mddocs/operations/

Adding Embeddings

Methods for storing embeddings in Milvus.

Methods

Single Embedding - Auto ID

String add(Embedding embedding);

Adds embedding with auto-generated UUID.

Embedding embedding = model.embed(text).content();
String id = store.add(embedding);

Single Embedding - Custom ID

void add(String id, Embedding embedding);

Adds embedding with specified ID.

store.add("doc-123", embedding);

With Text and Metadata

String add(Embedding embedding, TextSegment textSegment);

Adds embedding with associated text and metadata.

import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.document.Metadata;

Metadata metadata = Metadata.from(Map.of(
    "source", "docs.pdf",
    "page", 42,
    "category", "technical"
));

TextSegment segment = TextSegment.from("Document text", metadata);
String id = store.add(embedding, segment);

Batch - Auto IDs

List<String> addAll(List<Embedding> embeddings);

Adds multiple embeddings with auto-generated IDs.

List<Embedding> embeddings = /* list of embeddings */;
List<String> ids = store.addAll(embeddings);

Batch - Custom IDs and Segments

void addAll(List<String> ids, List<Embedding> embeddings, List<TextSegment> textSegments);

Adds multiple embeddings with custom IDs and segments.

Requirements: All lists must have same size

List<String> ids = Arrays.asList("id1", "id2", "id3");
List<Embedding> embeddings = /* list */;
List<TextSegment> segments = /* list */;

store.addAll(ids, embeddings, segments);

Without segments:

store.addAll(ids, embeddings, null);

Creating TextSegments

Simple Text

TextSegment segment = TextSegment.from("Plain text");

With Metadata

Metadata metadata = Metadata.from(Map.of(
    "title", "Document Title",
    "author", "John Doe",
    "year", 2024,
    "active", true
));

TextSegment segment = TextSegment.from("Content", metadata);

Patterns

Document Ingestion

import dev.langchain4j.model.embedding.EmbeddingModel;

EmbeddingModel model = /* your model */;
List<String> documents = /* your documents */;

List<Embedding> embeddings = new ArrayList<>();
List<TextSegment> segments = new ArrayList<>();

for (String doc : documents) {
    embeddings.add(model.embed(doc).content());
    segments.add(TextSegment.from(doc));
}

List<String> ids = store.addAll(embeddings);

With Custom IDs

List<String> ids = new ArrayList<>();
List<Embedding> embeddings = new ArrayList<>();
List<TextSegment> segments = new ArrayList<>();

for (Document doc : documents) {
    ids.add(doc.getId());
    embeddings.add(model.embed(doc.getText()).content());
    segments.add(TextSegment.from(doc.getText(), doc.getMetadata()));
}

store.addAll(ids, embeddings, segments);

Batch Processing

int batchSize = 500;

for (int i = 0; i < allEmbeddings.size(); i += batchSize) {
    int end = Math.min(i + batchSize, allEmbeddings.size());
    List<Embedding> batch = allEmbeddings.subList(i, end);
    store.addAll(batch);
}

Performance

For Bulk Inserts

MilvusEmbeddingStore store = MilvusEmbeddingStore.builder()
    .host("localhost")
    .collectionName("bulk")
    .dimension(384)
    .autoFlushOnInsert(false)  // Critical for performance
    .build();

store.addAll(largeList);

Batch Sizes

  • Recommended: 100-1000 embeddings per batch
  • Use addAll() instead of loops with add()
  • Larger batches = better throughput

Important Notes

  1. Dimension Matching: All embeddings must match collection dimension
  2. ID Uniqueness: Duplicate IDs may overwrite (behavior depends on Milvus config)
  3. Null Segments: addAll(ids, embeddings, null) stores only embeddings
  4. Empty Lists: addAll([]) is no-op
  5. Consistency: New data visibility depends on consistency level

Error Handling

try {
    String id = store.add(embedding, segment);
    System.out.println("Added: " + id);
} catch (Exception e) {
    System.err.println("Failed: " + e.getMessage());
    // Handle error
}

Related

  • Searching
  • Configuration
  • Patterns

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-milvus@1.11.0

docs

operations

adding.md

collection-management.md

removing.md

searching.md

advanced.md

api-reference.md

configuration.md

index.md

patterns.md

quickstart.md

troubleshooting.md

tile.json