tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag

Easy RAG extension for Quarkus LangChain4j that dramatically simplifies implementing Retrieval Augmented Generation pipelines with automatic document ingestion and embedding store management

Overview

Eval results

Files

Quarkus LangChain4j Easy RAG

Name: tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag
Author: tessl

The Quarkus LangChain4j Easy RAG extension provides the easiest way to implement Retrieval Augmented Generation (RAG) pipelines in Quarkus applications. It automates document ingestion, parsing, splitting, embedding generation, and retrieval with minimal configuration, enabling developers to quickly add RAG capabilities to their applications.

Package Information

Package Name: quarkus-langchain4j-easy-rag
Maven Coordinates: io.quarkiverse.langchain4j:quarkus-langchain4j-easy-rag
Package Type: Maven/Quarkus Extension
Language: Java
Installation:
```
<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-easy-rag</artifactId>
    <version>1.7.4</version>
</dependency>
```

Important: Native compilation is not supported with this extension.

Core Imports

import io.quarkiverse.langchain4j.easyrag.EasyRagManualIngestion;
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRagConfig;
import io.quarkiverse.langchain4j.easyrag.runtime.IngestionStrategy;
import jakarta.inject.Inject;

Basic Usage

The Easy RAG extension works with minimal configuration. You only need to:

Add an embedding model provider dependency (e.g., quarkus-langchain4j-openai or quarkus-langchain4j-ollama)
Configure the document path in application.properties:

# Required: path to documents directory
quarkus.langchain4j.easy-rag.path=/path/to/documents

# Optional: use classpath instead of filesystem
quarkus.langchain4j.easy-rag.path-type=CLASSPATH

The extension automatically:

Creates an InMemoryEmbeddingStore if no embedding store bean exists
Creates a RetrievalAugmentor bean if none exists
Ingests all documents from the configured path at startup
Parses documents using Apache Tika (supports PDF, DOCX, HTML, plain text, OCR images)
Splits documents into segments with configurable token sizes
Generates and stores embeddings

Example: Using the auto-generated RetrievalAugmentor in an AI service:

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService
public interface ChatBot {

    @SystemMessage("You are a helpful assistant. Answer based on the provided context.")
    String chat(@UserMessage String userMessage);
}

import jakarta.inject.Inject;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;

@Path("/chat")
public class ChatResource {

    @Inject
    ChatBot chatBot;

    @POST
    public String chat(String message) {
        return chatBot.chat(message);
    }
}

The RetrievalAugmentor is automatically injected and used by the AI service to augment prompts with relevant document context.

Architecture

The Easy RAG extension architecture integrates Quarkus CDI and LangChain4j to provide seamless RAG capabilities. Key components include the EasyRagIngestor for document processing, EasyRetrievalAugmentor for similarity search, and automatic CDI bean synthesis for zero-configuration operation.

Architecture

Capabilities

Manual Ingestion Control

Provides programmatic control over when document ingestion occurs, enabling scenarios where ingestion timing needs to be controlled by application logic rather than happening at startup.

package io.quarkiverse.langchain4j.easyrag;

@ApplicationScoped
public class EasyRagManualIngestion {
    public void ingest();
}

Manual Ingestion Control

Configuration

Comprehensive configuration options for controlling document ingestion behavior, retrieval parameters, and embeddings management.

package io.quarkiverse.langchain4j.easyrag.runtime;

import io.quarkus.runtime.annotations.ConfigRoot;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.ConfigMapping;
import io.smallrye.config.WithDefault;
import java.util.OptionalDouble;

@ConfigRoot(phase = RUN_TIME)
@ConfigMapping(prefix = "quarkus.langchain4j.easy-rag")
public interface EasyRagConfig {
    /**
     * Path to the directory containing documents to ingest (required).
     */
    String path();

    /**
     * Type of path: FILESYSTEM or CLASSPATH (default: FILESYSTEM).
     */
    @WithDefault("FILESYSTEM")
    PathType pathType();

    /**
     * File filtering pattern (default: glob:**).
     */
    @WithDefault("glob:**")
    String pathMatcher();

    /**
     * Recursively scan subdirectories (default: true).
     */
    @WithDefault("true")
    Boolean recursive();

    /**
     * Maximum segment size in tokens (default: 300).
     */
    @WithDefault("300")
    Integer maxSegmentSize();

    /**
     * Maximum overlap between segments in tokens (default: 30).
     */
    @WithDefault("30")
    Integer maxOverlapSize();

    /**
     * Maximum number of retrieval results (default: 5).
     */
    @WithDefault("5")
    Integer maxResults();

    /**
     * Minimum similarity score threshold (optional, no filtering if not set).
     */
    OptionalDouble minScore();

    /**
     * Ingestion strategy (default: ON).
     */
    @WithDefault("ON")
    IngestionStrategy ingestionStrategy();

    /**
     * Configuration for embeddings reuse.
     */
    ReuseEmbeddingsConfig reuseEmbeddings();

    /**
     * Type of path reference for document location.
     */
    enum PathType {
        /** Path represents a filesystem directory or file */
        FILESYSTEM,
        /** Path represents a classpath resource location */
        CLASSPATH
    }

    /**
     * Configuration for reusing embeddings across application restarts.
     * Only supported with InMemoryEmbeddingStore.
     */
    @ConfigGroup
    interface ReuseEmbeddingsConfig {
        /**
         * Enable embeddings reuse (default: false).
         */
        @WithDefault("false")
        boolean enabled();

        /**
         * File path for loading/saving embeddings (default: easy-rag-embeddings.json).
         */
        @WithDefault("easy-rag-embeddings.json")
        String file();
    }
}

Configuration Reference

Retrieval Augmentor

The auto-generated RetrievalAugmentor implementation that integrates with LangChain4j AI services to provide RAG capabilities.

package io.quarkiverse.langchain4j.easyrag.runtime;

public class EasyRetrievalAugmentor implements RetrievalAugmentor {
    public EasyRetrievalAugmentor(
        EasyRagConfig config,
        EmbeddingModel embeddingModel,
        EmbeddingStore embeddingStore
    );

    public AugmentationResult augment(AugmentationRequest augmentationRequest);
}

Retrieval Augmentor

Document Ingestion

Internal document ingestion functionality that handles loading, parsing, splitting, and embedding documents from filesystem or classpath.

package io.quarkiverse.langchain4j.easyrag.runtime;

public class EasyRagIngestor {
    public EasyRagIngestor(
        EmbeddingModel embeddingModel,
        EmbeddingStore<TextSegment> embeddingStore,
        EasyRagConfig config
    );

    public void ingest();
}

Document Ingestion

Types

IngestionStrategy Enum

Controls when document ingestion occurs.

package io.quarkiverse.langchain4j.easyrag.runtime;

/**
 * Strategy for controlling when document ingestion occurs.
 */
public enum IngestionStrategy {
    /**
     * Automatically ingest documents at application startup (default).
     * Documents are loaded, parsed, split, and embedded immediately when the application starts.
     */
    ON,

    /**
     * Do not perform ingestion.
     * Use when the embedding store is already populated or documents are managed externally.
     */
    OFF,

    /**
     * Wait for manual trigger via EasyRagManualIngestion.ingest().
     * Provides programmatic control over when ingestion occurs.
     */
    MANUAL
}

PathType Enum

Specifies the type of path reference for document location.

package io.quarkiverse.langchain4j.easyrag.runtime;

/**
 * Type of path reference for document location.
 * Nested enum within EasyRagConfig interface.
 */
public enum PathType {
    /**
     * Path represents a filesystem directory or file reference.
     * Used for documents stored in the local filesystem.
     */
    FILESYSTEM,

    /**
     * Path represents a classpath resource location.
     * Used for documents packaged within the application JAR.
     */
    CLASSPATH
}

Key Features

Automatic Bean Creation

The extension automatically creates CDI beans when they are not already present:

InMemoryEmbeddingStore: Created if no EmbeddingStore bean exists
EasyRetrievalAugmentor: Created if no RetrievalAugmentor bean exists

This allows you to replace these with custom implementations by simply providing your own beans.

Document Format Support

Uses Apache Tika for universal document parsing, supporting:

Plain text files
PDF documents
Microsoft Word (DOCX)
HTML files
Images with text via OCR (requires Tesseract library)

Integration with Persistent Stores

While an in-memory store is provided by default for quick prototyping, you can use persistent embedding stores by adding extensions:

quarkus-langchain4j-redis
quarkus-langchain4j-chroma
quarkus-langchain4j-infinispan

When a persistent store extension is added, the Easy RAG extension uses it automatically instead of creating an in-memory store.

Dev UI Integration

In Quarkus dev mode, access the Dev UI at http://localhost:8080/q/dev-ui and click the 'Chat' button in the LangChain4j card to test your RAG pipeline without building a frontend.

Embeddings Reuse for Development

Speed up development cycles by caching embeddings to a file:

quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=true
quarkus.langchain4j.easy-rag.reuse-embeddings.file=embeddings.json

When enabled, embeddings are computed once and reused across application restarts, significantly reducing startup time during development.

Required Dependencies

The Easy RAG extension requires an embedding model provider. Choose one:

OpenAI: io.quarkiverse.langchain4j:quarkus-langchain4j-openai
Ollama: io.quarkiverse.langchain4j:quarkus-langchain4j-ollama
In-process models: Various in-process embedding model extensions

If multiple embedding model providers are present, specify which to use:

quarkus.langchain4j.embedding-model.provider=openai

Common Configuration Examples

Basic filesystem ingestion:

quarkus.langchain4j.easy-rag.path=/data/documents

Classpath ingestion with file filtering:

quarkus.langchain4j.easy-rag.path=documents
quarkus.langchain4j.easy-rag.path-type=CLASSPATH
quarkus.langchain4j.easy-rag.path-matcher=glob:**.{txt,md,pdf}

Manual ingestion control:

quarkus.langchain4j.easy-rag.path=/data/documents
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL

Custom segment sizing:

quarkus.langchain4j.easy-rag.max-segment-size=500
quarkus.langchain4j.easy-rag.max-overlap-size=50

Retrieval tuning:

quarkus.langchain4j.easy-rag.max-results=10
quarkus.langchain4j.easy-rag.min-score=0.7

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag@1.7.0

Workspace: tessl
Visibility: Public
Created: 4 days ago
Last updated: 4 days ago
Describes: pkg:maven/io.quarkiverse.langchain4j/quarkus-langchain4j-easy-rag@1.7.x