CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-easy-rag

Zero-configuration RAG package that bundles document parsing, embedding, and splitting for easy Retrieval-Augmented Generation in Java applications

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

api-ingestion.mddocs/

Document Ingestion API

EmbeddingStoreIngestor

Core class for ingesting documents into an embedding store with automatic parsing, splitting, and embedding.

Static Methods (Zero Configuration)

Ingest Single Document

import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;

public static IngestionResult ingest(
    Document document,
    EmbeddingStore<TextSegment> embeddingStore
)

Parameters:

  • document - Document to ingest
  • embeddingStore - Where to store generated embeddings

Returns: IngestionResult with token usage information

Automatic Behavior:

  • Parses document using Apache Tika (if needed)
  • Splits into 300-token chunks with 30-token overlap
  • Generates embeddings using BGE-small-en-v1.5
  • Stores embeddings in provided store

Example:

Document doc = FileSystemDocumentLoader.loadDocument(path);
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
IngestionResult result = EmbeddingStoreIngestor.ingest(doc, store);

Ingest Multiple Documents

public static IngestionResult ingest(
    List<Document> documents,
    EmbeddingStore<TextSegment> embeddingStore
)

Parameters:

  • documents - List of documents to ingest
  • embeddingStore - Where to store generated embeddings

Returns: IngestionResult with aggregated token usage

Example:

List<Document> docs = FileSystemDocumentLoader.loadDocumentsRecursively(dir);
IngestionResult result = EmbeddingStoreIngestor.ingest(docs, store);

Builder Pattern (Custom Configuration)

Create Builder

public static Builder builder()

Returns: Builder for configuring EmbeddingStoreIngestor

Example:

EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
    .embeddingStore(store)
    .documentSplitter(customSplitter)  // Optional
    .embeddingModel(customModel)        // Optional
    .build();

Builder Methods

Document Transformer

public Builder documentTransformer(DocumentTransformer documentTransformer)

Transform documents before splitting. Use to preprocess, filter, or enrich documents.

Example:

DocumentTransformer addMetadata = doc -> {
    doc.metadata().put("source", "internal");
    return doc;
};

builder.documentTransformer(addMetadata);

Document Splitter

public Builder documentSplitter(DocumentSplitter documentSplitter)

Custom splitter for chunking documents. If not provided, uses SPI-discovered RecursiveDocumentSplitterFactory (300 tokens, 30 overlap).

Example:

import dev.langchain4j.data.document.splitter.DocumentSplitters;

DocumentSplitter customSplitter = DocumentSplitters.recursive(
    500,  // tokens per chunk
    50    // overlap
);

builder.documentSplitter(customSplitter);

Text Segment Transformer

public Builder textSegmentTransformer(TextSegmentTransformer textSegmentTransformer)

Transform text segments after splitting. Use to enrich metadata, filter, or modify text.

Example:

TextSegmentTransformer enricher = segment -> {
    segment.metadata().put("length", segment.text().length());
    return segment;
};

builder.textSegmentTransformer(enricher);

Embedding Model

public Builder embeddingModel(EmbeddingModel embeddingModel)

Custom embedding model. If not provided, uses SPI-discovered BgeSmallEnV15QuantizedEmbeddingModel.

Example:

import dev.langchain4j.model.openai.OpenAiEmbeddingModel;

EmbeddingModel model = OpenAiEmbeddingModel.builder()
    .apiKey(apiKey)
    .modelName("text-embedding-3-small")
    .build();

builder.embeddingModel(model);

Embedding Store (Required)

public Builder embeddingStore(EmbeddingStore<TextSegment> embeddingStore)

Required. The store where embeddings will be saved.

Example:

EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
builder.embeddingStore(store);

Build

public EmbeddingStoreIngestor build()

Builds the configured EmbeddingStoreIngestor.

Throws: IllegalArgumentException if embeddingStore not set

Instance Methods

Ingest Single Document

public IngestionResult ingest(Document document)

Ingest document using configured pipeline.

Returns: IngestionResult with token usage

Ingest Document List

public IngestionResult ingest(List<Document> documents)

Ingest multiple documents using configured pipeline.

Returns: IngestionResult with aggregated token usage

Ingest Varargs

public IngestionResult ingest(Document... documents)

Ingest multiple documents (varargs) using configured pipeline.

Returns: IngestionResult with aggregated token usage

Complete Example

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.IngestionResult;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;

// Custom configuration
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
    .documentSplitter(DocumentSplitters.recursive(500, 50))
    .textSegmentTransformer(segment -> {
        // Add metadata to each segment
        segment.metadata().put("ingested_at", System.currentTimeMillis());
        return segment;
    })
    .embeddingStore(new InMemoryEmbeddingStore<>())
    .build();

// Ingest documents
List<Document> documents = loadDocuments();
IngestionResult result = ingestor.ingest(documents);

System.out.println("Processed " + result.tokenUsage().totalTokenCount() + " tokens");

Related APIs

  • Document Loading API - Loading documents from filesystem
  • Core Types - Document, TextSegment types
  • Storage Types - EmbeddingStore implementations
  • Configuration - Default splitter and embedding model settings

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-easy-rag

docs

api-document-loading.md

api-ingestion.md

api-retrieval.md

api-types-chat.md

api-types-core.md

api-types-storage.md

architecture.md

configuration.md

examples.md

index.md

quickstart.md

reference.md

troubleshooting.md

tile.json