tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag

Easy RAG extension for Quarkus LangChain4j that dramatically simplifies implementing Retrieval Augmented Generation pipelines with automatic document ingestion and embedding store management

Overview

Eval results

Files

Retrieval Augmentor

Name: tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag
Author: tessl

The EasyRetrievalAugmentor is automatically created by the Easy RAG extension if no other RetrievalAugmentor bean exists in your application. It integrates with LangChain4j AI services to provide Retrieval Augmented Generation capabilities.

API

package io.quarkiverse.langchain4j.easyrag.runtime;

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.store.embedding.EmbeddingStore;

/**
 * Retrieval augmentor automatically generated by the Easy RAG extension
 * if no other retrieval augmentor is found.
 */
public class EasyRetrievalAugmentor implements RetrievalAugmentor {

    /**
     * Creates an EasyRetrievalAugmentor with the specified configuration.
     *
     * @param config Configuration for retrieval behavior (max results, min score)
     * @param embeddingModel Model for generating query embeddings
     * @param embeddingStore Store containing document embeddings
     */
    public EasyRetrievalAugmentor(
        EasyRagConfig config,
        EmbeddingModel embeddingModel,
        EmbeddingStore embeddingStore
    );

    /**
     * Augments the user message with relevant context from the embedding store.
     *
     * @param augmentationRequest Request containing user message and metadata
     * @return AugmentationResult containing retrieved content and metadata
     */
    public AugmentationResult augment(AugmentationRequest augmentationRequest);
}

Automatic Bean Creation

The Easy RAG extension automatically creates an EasyRetrievalAugmentor bean when:

No other bean implementing RetrievalAugmentor exists in the application
An EmbeddingModel bean is available
An EmbeddingStore bean is available

This automatic creation happens at build time via Quarkus CDI bean synthesis.

How It Works

The EasyRetrievalAugmentor implements the following retrieval pipeline:

Query Embedding: Converts the user's query into an embedding using the configured EmbeddingModel
Similarity Search: Searches the EmbeddingStore for the most similar document segments
Filtering: Applies max-results limit and min-score threshold from configuration
Context Assembly: Packages the retrieved segments into an AugmentationResult
Augmentation: The LangChain4j framework automatically injects the retrieved context into the prompt before sending to the LLM

Integration with AI Services

The RetrievalAugmentor is automatically used by LangChain4j AI services:

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService
public interface DocumentAssistant {

    @SystemMessage("You are a helpful assistant. Answer based on the provided context.")
    String chat(@UserMessage String userMessage);
}

When chat() is called:

The user message is sent to EasyRetrievalAugmentor.augment()
Relevant document segments are retrieved
The segments are added as context to the prompt
The augmented prompt is sent to the LLM
The LLM generates a response based on the user message and retrieved context

Configuration

The retrieval behavior is controlled by configuration properties:

# Maximum number of segments to retrieve
quarkus.langchain4j.easy-rag.max-results=5

# Minimum similarity score threshold
quarkus.langchain4j.easy-rag.min-score=0.7

See Configuration Reference for details.

Custom Retrieval Augmentor

If you need custom retrieval logic, you can provide your own RetrievalAugmentor bean. When a custom bean exists, the Easy RAG extension will not create the automatic EasyRetrievalAugmentor:

import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.query.Query;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

@ApplicationScoped
public class CustomRetrievalAugmentor implements RetrievalAugmentor {

    @Inject
    ContentRetriever contentRetriever;

    @Override
    public AugmentationResult augment(AugmentationRequest request) {
        // Custom retrieval logic
        Query query = request.userMessage();

        // Apply custom query transformation
        String transformedQuery = transformQuery(query.text());

        // Retrieve with custom logic
        List<Content> contents = contentRetriever.retrieve(Query.from(transformedQuery));

        // Apply custom filtering or ranking
        List<Content> filteredContents = customFilter(contents);

        return AugmentationResult.builder()
            .contents(filteredContents)
            .build();
    }

    private String transformQuery(String query) {
        // Custom query transformation
        return query;
    }

    private List<Content> customFilter(List<Content> contents) {
        // Custom filtering logic
        return contents;
    }
}

Advanced Usage: Manual Instantiation

While uncommon, you can manually instantiate EasyRetrievalAugmentor if needed:

import io.quarkiverse.langchain4j.easyrag.runtime.EasyRetrievalAugmentor;
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRagConfig;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import jakarta.inject.Inject;

public class CustomSetup {

    @Inject
    EasyRagConfig config;

    @Inject
    EmbeddingModel embeddingModel;

    @Inject
    EmbeddingStore embeddingStore;

    public EasyRetrievalAugmentor createAugmentor() {
        return new EasyRetrievalAugmentor(config, embeddingModel, embeddingStore);
    }
}

This is typically only needed for advanced scenarios like:

Creating multiple retrieval augmentors with different configurations
Dynamically switching between augmentors
Testing and development

Understanding AugmentationRequest and AugmentationResult

AugmentationRequest

Contains the user's query and metadata:

AugmentationRequest {
    UserMessage userMessage;  // The user's query
    Metadata metadata;        // Additional context (chat memory, etc.)
}

AugmentationResult

Contains the retrieved context:

AugmentationResult {
    List<Content> contents;   // Retrieved document segments
    UserMessage userMessage;  // Optional modified user message
    SystemMessage systemMessage; // Optional modified system message
}

The contents list contains the relevant document segments that will be injected into the prompt.

Retrieval Quality Tips

Tuning max-results

Too few results (e.g., 1-2): May miss important context
Too many results (e.g., 20+): Adds noise and uses more tokens
Recommended: Start with 5-7 and adjust based on your use case

Tuning min-score

No threshold: Returns results even if barely relevant
High threshold (e.g., 0.8): Only returns very similar segments
Recommended: Start without a threshold, then add if you observe irrelevant results

Document Segment Sizing

Segment size affects retrieval quality:

# Smaller segments for precise retrieval
quarkus.langchain4j.easy-rag.max-segment-size=200

# Larger segments for more context
quarkus.langchain4j.easy-rag.max-segment-size=500

Debugging Retrieval

To understand what's being retrieved, you can create a custom augmentor that logs retrieval results:

import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRetrievalAugmentor;
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRagConfig;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.jboss.logging.Logger;

@ApplicationScoped
public class LoggingRetrievalAugmentor implements RetrievalAugmentor {

    private static final Logger LOG = Logger.getLogger(LoggingRetrievalAugmentor.class);

    private final EasyRetrievalAugmentor delegate;

    @Inject
    public LoggingRetrievalAugmentor(
            EasyRagConfig config,
            EmbeddingModel embeddingModel,
            EmbeddingStore embeddingStore) {
        this.delegate = new EasyRetrievalAugmentor(config, embeddingModel, embeddingStore);
    }

    @Override
    public AugmentationResult augment(AugmentationRequest request) {
        LOG.infof("Query: %s", request.userMessage().text());

        AugmentationResult result = delegate.augment(request);

        LOG.infof("Retrieved %d segments", result.contents().size());
        result.contents().forEach(content ->
            LOG.infof("Segment: %s", content.textSegment().text())
        );

        return result;
    }
}

Integration with Different Embedding Models

The EasyRetrievalAugmentor works with any LangChain4j EmbeddingModel:

OpenAI Embeddings

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-openai</artifactId>
</dependency>

quarkus.langchain4j.openai.api-key=${OPENAI_API_KEY}
quarkus.langchain4j.openai.embedding-model.model-name=text-embedding-3-small

Ollama Embeddings (Local)

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ollama</artifactId>
</dependency>

quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text

In-Process Embeddings

For completely offline operation:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-embedding-onnx</artifactId>
</dependency>

Relationship with Embedding Store

The EasyRetrievalAugmentor queries whatever EmbeddingStore bean is available:

In-Memory Store (Default)

Automatically created by Easy RAG extension:

Fast but non-persistent
Data lost on application restart
Good for development and simple use cases

Redis Store

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-redis</artifactId>
</dependency>

quarkus.langchain4j.redis.dimension=384
quarkus.redis.hosts=redis://localhost:6379

Chroma Store

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-chroma</artifactId>
</dependency>

quarkus.langchain4j.chroma.base-url=http://localhost:8000

Performance Considerations

Query-Time Performance

The augment() method involves:

Embedding generation: Depends on embedding model (local vs remote)
Similarity search: Depends on embedding store (in-memory vs database)
Result assembly: Minimal overhead

Typical latencies:

In-memory store: < 100ms
Redis store: 100-300ms
Remote embedding model: Adds 200-1000ms

Caching

For repeated queries, consider implementing query caching:

@ApplicationScoped
public class CachingRetrievalAugmentor implements RetrievalAugmentor {

    private final Map<String, AugmentationResult> cache = new ConcurrentHashMap<>();
    private final EasyRetrievalAugmentor delegate;

    @Inject
    public CachingRetrievalAugmentor(
            EasyRagConfig config,
            EmbeddingModel embeddingModel,
            EmbeddingStore embeddingStore) {
        this.delegate = new EasyRetrievalAugmentor(config, embeddingModel, embeddingStore);
    }

    @Override
    public AugmentationResult augment(AugmentationRequest request) {
        String query = request.userMessage().text();
        return cache.computeIfAbsent(query, k -> delegate.augment(request));
    }
}

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag