Easy RAG extension for Quarkus LangChain4j that dramatically simplifies implementing Retrieval Augmented Generation pipelines with automatic document ingestion and embedding store management
The EasyRetrievalAugmentor is automatically created by the Easy RAG extension if no other RetrievalAugmentor bean exists in your application. It integrates with LangChain4j AI services to provide Retrieval Augmented Generation capabilities.
package io.quarkiverse.langchain4j.easyrag.runtime;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.store.embedding.EmbeddingStore;
/**
* Retrieval augmentor automatically generated by the Easy RAG extension
* if no other retrieval augmentor is found.
*/
public class EasyRetrievalAugmentor implements RetrievalAugmentor {
/**
* Creates an EasyRetrievalAugmentor with the specified configuration.
*
* @param config Configuration for retrieval behavior (max results, min score)
* @param embeddingModel Model for generating query embeddings
* @param embeddingStore Store containing document embeddings
*/
public EasyRetrievalAugmentor(
EasyRagConfig config,
EmbeddingModel embeddingModel,
EmbeddingStore embeddingStore
);
/**
* Augments the user message with relevant context from the embedding store.
*
* @param augmentationRequest Request containing user message and metadata
* @return AugmentationResult containing retrieved content and metadata
*/
public AugmentationResult augment(AugmentationRequest augmentationRequest);
}The Easy RAG extension automatically creates an EasyRetrievalAugmentor bean when:
RetrievalAugmentor exists in the applicationEmbeddingModel bean is availableEmbeddingStore bean is availableThis automatic creation happens at build time via Quarkus CDI bean synthesis.
The EasyRetrievalAugmentor implements the following retrieval pipeline:
Query Embedding: Converts the user's query into an embedding using the configured EmbeddingModel
Similarity Search: Searches the EmbeddingStore for the most similar document segments
Filtering: Applies max-results limit and min-score threshold from configuration
Context Assembly: Packages the retrieved segments into an AugmentationResult
Augmentation: The LangChain4j framework automatically injects the retrieved context into the prompt before sending to the LLM
The RetrievalAugmentor is automatically used by LangChain4j AI services:
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
@RegisterAiService
public interface DocumentAssistant {
@SystemMessage("You are a helpful assistant. Answer based on the provided context.")
String chat(@UserMessage String userMessage);
}When chat() is called:
EasyRetrievalAugmentor.augment()The retrieval behavior is controlled by configuration properties:
# Maximum number of segments to retrieve
quarkus.langchain4j.easy-rag.max-results=5
# Minimum similarity score threshold
quarkus.langchain4j.easy-rag.min-score=0.7See Configuration Reference for details.
If you need custom retrieval logic, you can provide your own RetrievalAugmentor bean. When a custom bean exists, the Easy RAG extension will not create the automatic EasyRetrievalAugmentor:
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.query.Query;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
@ApplicationScoped
public class CustomRetrievalAugmentor implements RetrievalAugmentor {
@Inject
ContentRetriever contentRetriever;
@Override
public AugmentationResult augment(AugmentationRequest request) {
// Custom retrieval logic
Query query = request.userMessage();
// Apply custom query transformation
String transformedQuery = transformQuery(query.text());
// Retrieve with custom logic
List<Content> contents = contentRetriever.retrieve(Query.from(transformedQuery));
// Apply custom filtering or ranking
List<Content> filteredContents = customFilter(contents);
return AugmentationResult.builder()
.contents(filteredContents)
.build();
}
private String transformQuery(String query) {
// Custom query transformation
return query;
}
private List<Content> customFilter(List<Content> contents) {
// Custom filtering logic
return contents;
}
}While uncommon, you can manually instantiate EasyRetrievalAugmentor if needed:
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRetrievalAugmentor;
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRagConfig;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import jakarta.inject.Inject;
public class CustomSetup {
@Inject
EasyRagConfig config;
@Inject
EmbeddingModel embeddingModel;
@Inject
EmbeddingStore embeddingStore;
public EasyRetrievalAugmentor createAugmentor() {
return new EasyRetrievalAugmentor(config, embeddingModel, embeddingStore);
}
}This is typically only needed for advanced scenarios like:
Contains the user's query and metadata:
AugmentationRequest {
UserMessage userMessage; // The user's query
Metadata metadata; // Additional context (chat memory, etc.)
}Contains the retrieved context:
AugmentationResult {
List<Content> contents; // Retrieved document segments
UserMessage userMessage; // Optional modified user message
SystemMessage systemMessage; // Optional modified system message
}The contents list contains the relevant document segments that will be injected into the prompt.
Segment size affects retrieval quality:
# Smaller segments for precise retrieval
quarkus.langchain4j.easy-rag.max-segment-size=200
# Larger segments for more context
quarkus.langchain4j.easy-rag.max-segment-size=500To understand what's being retrieved, you can create a custom augmentor that logs retrieval results:
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.AugmentationRequest;
import dev.langchain4j.rag.AugmentationResult;
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRetrievalAugmentor;
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRagConfig;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.jboss.logging.Logger;
@ApplicationScoped
public class LoggingRetrievalAugmentor implements RetrievalAugmentor {
private static final Logger LOG = Logger.getLogger(LoggingRetrievalAugmentor.class);
private final EasyRetrievalAugmentor delegate;
@Inject
public LoggingRetrievalAugmentor(
EasyRagConfig config,
EmbeddingModel embeddingModel,
EmbeddingStore embeddingStore) {
this.delegate = new EasyRetrievalAugmentor(config, embeddingModel, embeddingStore);
}
@Override
public AugmentationResult augment(AugmentationRequest request) {
LOG.infof("Query: %s", request.userMessage().text());
AugmentationResult result = delegate.augment(request);
LOG.infof("Retrieved %d segments", result.contents().size());
result.contents().forEach(content ->
LOG.infof("Segment: %s", content.textSegment().text())
);
return result;
}
}The EasyRetrievalAugmentor works with any LangChain4j EmbeddingModel:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-openai</artifactId>
</dependency>quarkus.langchain4j.openai.api-key=${OPENAI_API_KEY}
quarkus.langchain4j.openai.embedding-model.model-name=text-embedding-3-small<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
</dependency>quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-textFor completely offline operation:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-embedding-onnx</artifactId>
</dependency>The EasyRetrievalAugmentor queries whatever EmbeddingStore bean is available:
Automatically created by Easy RAG extension:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-redis</artifactId>
</dependency>quarkus.langchain4j.redis.dimension=384
quarkus.redis.hosts=redis://localhost:6379<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-chroma</artifactId>
</dependency>quarkus.langchain4j.chroma.base-url=http://localhost:8000The augment() method involves:
Typical latencies:
For repeated queries, consider implementing query caching:
@ApplicationScoped
public class CachingRetrievalAugmentor implements RetrievalAugmentor {
private final Map<String, AugmentationResult> cache = new ConcurrentHashMap<>();
private final EasyRetrievalAugmentor delegate;
@Inject
public CachingRetrievalAugmentor(
EasyRagConfig config,
EmbeddingModel embeddingModel,
EmbeddingStore embeddingStore) {
this.delegate = new EasyRetrievalAugmentor(config, embeddingModel, embeddingStore);
}
@Override
public AugmentationResult augment(AugmentationRequest request) {
String query = request.userMessage().text();
return cache.computeIfAbsent(query, k -> delegate.augment(request));
}
}max-results and min-score settingsInstall with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag@1.7.0