Easy RAG extension for Quarkus LangChain4j that dramatically simplifies implementing Retrieval Augmented Generation pipelines with automatic document ingestion and embedding store management
The Easy RAG extension provides comprehensive configuration options for controlling document ingestion, retrieval behavior, and embeddings management.
package io.quarkiverse.langchain4j.easyrag.runtime;
import io.quarkus.runtime.annotations.ConfigRoot;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.ConfigMapping;
import io.smallrye.config.WithDefault;
import java.util.OptionalDouble;
@ConfigRoot(phase = RUN_TIME)
@ConfigMapping(prefix = "quarkus.langchain4j.easy-rag")
public interface EasyRagConfig {
/**
* Path to the directory containing documents to ingest.
* Can be absolute or relative filesystem path, or classpath reference.
*/
String path();
/**
* Whether path() represents a filesystem or classpath reference.
* Default: FILESYSTEM
*/
@WithDefault("FILESYSTEM")
PathType pathType();
/**
* File filtering pattern using FileSystem path matcher syntax.
* Example: glob:**.{txt,pdf} to match text and PDF files recursively.
* Default: glob:** (all files)
*/
@WithDefault("glob:**")
String pathMatcher();
/**
* Whether to recursively scan subdirectories.
* Default: true
*/
@WithDefault("true")
Boolean recursive();
/**
* Maximum segment size when splitting documents, in tokens.
* Default: 300
*/
@WithDefault("300")
Integer maxSegmentSize();
/**
* Maximum overlap between segments, in tokens.
* Default: 30
*/
@WithDefault("30")
Integer maxOverlapSize();
/**
* Maximum number of results to return from retrieval augmentor.
* Default: 5
*/
@WithDefault("5")
Integer maxResults();
/**
* Minimum score threshold for retrieval results.
* Results below this score are filtered out.
* Optional - no filtering if not specified.
*/
OptionalDouble minScore();
/**
* When document ingestion should occur.
* Default: ON
*/
@WithDefault("ON")
IngestionStrategy ingestionStrategy();
/**
* Configuration for embeddings reuse functionality.
*/
ReuseEmbeddingsConfig reuseEmbeddings();
enum PathType {
/** Path represents a filesystem reference */
FILESYSTEM,
/** Path represents a classpath reference */
CLASSPATH
}
@ConfigGroup
interface ReuseEmbeddingsConfig {
/**
* Whether to reuse embeddings from previous runs.
* Only supported with InMemoryEmbeddingStore.
* Default: false
*/
@WithDefault("false")
boolean enabled();
/**
* File path to load/save embeddings when reuse is enabled.
* Default: easy-rag-embeddings.json
*/
@WithDefault("easy-rag-embeddings.json")
String file();
}
}All properties use the prefix quarkus.langchain4j.easy-rag.
path (required)Path to the directory containing documents to ingest.
# Absolute filesystem path
quarkus.langchain4j.easy-rag.path=/opt/data/documents
# Relative filesystem path (resolved from working directory)
quarkus.langchain4j.easy-rag.path=./documents
# Classpath resource (when path-type=CLASSPATH)
quarkus.langchain4j.easy-rag.path=documentspath-typeSpecifies whether the path is a filesystem or classpath reference.
# Use filesystem path
quarkus.langchain4j.easy-rag.path-type=FILESYSTEM
# Use classpath resource
quarkus.langchain4j.easy-rag.path-type=CLASSPATHWhen to use CLASSPATH:
src/main/resources or similarpath-matcherFile filtering pattern using Java FileSystem path matcher syntax.
glob:**glob: and regex: patterns# Match all files recursively (default)
quarkus.langchain4j.easy-rag.path-matcher=glob:**
# Match only text files
quarkus.langchain4j.easy-rag.path-matcher=glob:**.txt
# Match multiple file types
quarkus.langchain4j.easy-rag.path-matcher=glob:**.{txt,md,pdf}
# Match files in specific subdirectory
quarkus.langchain4j.easy-rag.path-matcher=glob:docs/**.md
# Use regex pattern
quarkus.langchain4j.easy-rag.path-matcher=regex:.*\\.(txt|pdf)Glob pattern syntax:
* - Matches any string not crossing directory boundaries** - Matches any string crossing directory boundaries? - Matches exactly one character{a,b} - Matches either a or b[abc] - Matches one character from the set[a-z] - Matches one character from the rangerecursiveWhether to recursively scan subdirectories.
# Scan subdirectories (default)
quarkus.langchain4j.easy-rag.recursive=true
# Only scan the specified directory, not subdirectories
quarkus.langchain4j.easy-rag.recursive=falsemax-segment-sizeMaximum size of document segments in tokens.
# Default size
quarkus.langchain4j.easy-rag.max-segment-size=300
# Smaller segments for more precise retrieval
quarkus.langchain4j.easy-rag.max-segment-size=200
# Larger segments for more context
quarkus.langchain4j.easy-rag.max-segment-size=500Considerations:
max-overlap-sizeMaximum overlap between adjacent segments in tokens.
# Default overlap
quarkus.langchain4j.easy-rag.max-overlap-size=30
# More overlap for better continuity
quarkus.langchain4j.easy-rag.max-overlap-size=50
# No overlap
quarkus.langchain4j.easy-rag.max-overlap-size=0Considerations:
max-resultsMaximum number of relevant segments to retrieve.
# Default
quarkus.langchain4j.easy-rag.max-results=5
# More context for complex queries
quarkus.langchain4j.easy-rag.max-results=10
# Minimal context for simple queries
quarkus.langchain4j.easy-rag.max-results=3Considerations:
min-scoreMinimum similarity score threshold for retrieval results.
# Only include highly relevant results
quarkus.langchain4j.easy-rag.min-score=0.7
# More permissive threshold
quarkus.langchain4j.easy-rag.min-score=0.5Considerations:
ingestion-strategyControls when document ingestion occurs.
ON: Automatically ingest at application startupOFF: Do not ingest (useful for pre-populated persistent stores)MANUAL: Wait for manual trigger via EasyRagManualIngestion.ingest()# Automatic ingestion at startup (default)
quarkus.langchain4j.easy-rag.ingestion-strategy=ON
# No ingestion (pre-populated store)
quarkus.langchain4j.easy-rag.ingestion-strategy=OFF
# Manual control
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUALWhen to use each strategy:
reuse-embeddings.enabledEnable embeddings reuse for faster development cycles.
InMemoryEmbeddingStore# Enable embeddings reuse
quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=trueHow it works:
Important: Only use in development. Not compatible with persistent stores.
reuse-embeddings.fileFile path for storing/loading embeddings.
easy-rag-embeddings.json# Default location
quarkus.langchain4j.easy-rag.reuse-embeddings.file=easy-rag-embeddings.json
# Custom location
quarkus.langchain4j.easy-rag.reuse-embeddings.file=/tmp/embeddings.json# Only required property
quarkus.langchain4j.easy-rag.path=/data/documentsquarkus.langchain4j.easy-rag.path=src/main/resources/documents
quarkus.langchain4j.easy-rag.path-type=CLASSPATH
quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=truequarkus.langchain4j.easy-rag.path=/opt/data/documents
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL
quarkus.langchain4j.easy-rag.max-segment-size=400
quarkus.langchain4j.easy-rag.max-overlap-size=40
quarkus.langchain4j.easy-rag.max-results=7
quarkus.langchain4j.easy-rag.min-score=0.6
# Redis store configuration
quarkus.langchain4j.redis.dimension=384quarkus.langchain4j.easy-rag.path=/data/docs
quarkus.langchain4j.easy-rag.path-matcher=glob:**.{md,txt,pdf}
quarkus.langchain4j.easy-rag.max-segment-size=500
quarkus.langchain4j.easy-rag.max-overlap-size=50
quarkus.langchain4j.easy-rag.max-results=10
quarkus.langchain4j.easy-rag.min-score=0.7quarkus.langchain4j.easy-rag.path=knowledge-base
quarkus.langchain4j.easy-rag.path-type=CLASSPATH
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL
quarkus.langchain4j.easy-rag.recursive=true
quarkus.langchain4j.easy-rag.path-matcher=glob:**.mdYou can inject the configuration interface in your beans:
import io.quarkiverse.langchain4j.easyrag.runtime.EasyRagConfig;
import jakarta.inject.Inject;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class MyService {
@Inject
EasyRagConfig config;
public void logConfiguration() {
System.out.println("Document path: " + config.path());
System.out.println("Max results: " + config.maxResults());
System.out.println("Ingestion strategy: " + config.ingestionStrategy());
}
}All properties can be set via environment variables using Quarkus conventions:
# Convert dots to underscores and use uppercase
export QUARKUS_LANGCHAIN4J_EASY_RAG_PATH=/data/documents
export QUARKUS_LANGCHAIN4J_EASY_RAG_MAX_RESULTS=10
export QUARKUS_LANGCHAIN4J_EASY_RAG_INGESTION_STRATEGY=MANUALUse Quarkus profiles for environment-specific configuration:
# Development profile
%dev.quarkus.langchain4j.easy-rag.path=src/main/resources/docs
%dev.quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=true
# Production profile
%prod.quarkus.langchain4j.easy-rag.path=/opt/data/documents
%prod.quarkus.langchain4j.easy-rag.ingestion-strategy=MANUALingestion-strategy=MANUALInstall with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag