CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag

Easy RAG extension for Quarkus LangChain4j that dramatically simplifies implementing Retrieval Augmented Generation pipelines with automatic document ingestion and embedding store management

Overview
Eval results
Files

configuration.mddocs/

Configuration Reference

The Easy RAG extension provides comprehensive configuration options for controlling document ingestion, retrieval behavior, and embeddings management.

Configuration Interface

package io.quarkiverse.langchain4j.easyrag.runtime;

import io.quarkus.runtime.annotations.ConfigRoot;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.ConfigMapping;
import io.smallrye.config.WithDefault;
import java.util.OptionalDouble;

@ConfigRoot(phase = RUN_TIME)
@ConfigMapping(prefix = "quarkus.langchain4j.easy-rag")
public interface EasyRagConfig {

    /**
     * Path to the directory containing documents to ingest.
     * Can be absolute or relative filesystem path, or classpath reference.
     */
    String path();

    /**
     * Whether path() represents a filesystem or classpath reference.
     * Default: FILESYSTEM
     */
    @WithDefault("FILESYSTEM")
    PathType pathType();

    /**
     * File filtering pattern using FileSystem path matcher syntax.
     * Example: glob:**.{txt,pdf} to match text and PDF files recursively.
     * Default: glob:** (all files)
     */
    @WithDefault("glob:**")
    String pathMatcher();

    /**
     * Whether to recursively scan subdirectories.
     * Default: true
     */
    @WithDefault("true")
    Boolean recursive();

    /**
     * Maximum segment size when splitting documents, in tokens.
     * Default: 300
     */
    @WithDefault("300")
    Integer maxSegmentSize();

    /**
     * Maximum overlap between segments, in tokens.
     * Default: 30
     */
    @WithDefault("30")
    Integer maxOverlapSize();

    /**
     * Maximum number of results to return from retrieval augmentor.
     * Default: 5
     */
    @WithDefault("5")
    Integer maxResults();

    /**
     * Minimum score threshold for retrieval results.
     * Results below this score are filtered out.
     * Optional - no filtering if not specified.
     */
    OptionalDouble minScore();

    /**
     * When document ingestion should occur.
     * Default: ON
     */
    @WithDefault("ON")
    IngestionStrategy ingestionStrategy();

    /**
     * Configuration for embeddings reuse functionality.
     */
    ReuseEmbeddingsConfig reuseEmbeddings();

    enum PathType {
        /** Path represents a filesystem reference */
        FILESYSTEM,
        /** Path represents a classpath reference */
        CLASSPATH
    }

    @ConfigGroup
    interface ReuseEmbeddingsConfig {
        /**
         * Whether to reuse embeddings from previous runs.
         * Only supported with InMemoryEmbeddingStore.
         * Default: false
         */
        @WithDefault("false")
        boolean enabled();

        /**
         * File path to load/save embeddings when reuse is enabled.
         * Default: easy-rag-embeddings.json
         */
        @WithDefault("easy-rag-embeddings.json")
        String file();
    }
}

Configuration Properties

All properties use the prefix quarkus.langchain4j.easy-rag.

Document Source Configuration

path (required)

Path to the directory containing documents to ingest.

  • Type: String
  • Required: Yes
  • Examples:
    # Absolute filesystem path
    quarkus.langchain4j.easy-rag.path=/opt/data/documents
    
    # Relative filesystem path (resolved from working directory)
    quarkus.langchain4j.easy-rag.path=./documents
    
    # Classpath resource (when path-type=CLASSPATH)
    quarkus.langchain4j.easy-rag.path=documents

path-type

Specifies whether the path is a filesystem or classpath reference.

  • Type: Enum (FILESYSTEM, CLASSPATH)
  • Default: FILESYSTEM
  • Examples:
    # Use filesystem path
    quarkus.langchain4j.easy-rag.path-type=FILESYSTEM
    
    # Use classpath resource
    quarkus.langchain4j.easy-rag.path-type=CLASSPATH

When to use CLASSPATH:

  • Documents are packaged inside your application JAR
  • Documents are in src/main/resources or similar
  • You want documents to be part of the application artifact

path-matcher

File filtering pattern using Java FileSystem path matcher syntax.

  • Type: String
  • Default: glob:**
  • Syntax: Supports glob: and regex: patterns
  • Examples:
    # Match all files recursively (default)
    quarkus.langchain4j.easy-rag.path-matcher=glob:**
    
    # Match only text files
    quarkus.langchain4j.easy-rag.path-matcher=glob:**.txt
    
    # Match multiple file types
    quarkus.langchain4j.easy-rag.path-matcher=glob:**.{txt,md,pdf}
    
    # Match files in specific subdirectory
    quarkus.langchain4j.easy-rag.path-matcher=glob:docs/**.md
    
    # Use regex pattern
    quarkus.langchain4j.easy-rag.path-matcher=regex:.*\\.(txt|pdf)

Glob pattern syntax:

  • * - Matches any string not crossing directory boundaries
  • ** - Matches any string crossing directory boundaries
  • ? - Matches exactly one character
  • {a,b} - Matches either a or b
  • [abc] - Matches one character from the set
  • [a-z] - Matches one character from the range

recursive

Whether to recursively scan subdirectories.

  • Type: Boolean
  • Default: true
  • Examples:
    # Scan subdirectories (default)
    quarkus.langchain4j.easy-rag.recursive=true
    
    # Only scan the specified directory, not subdirectories
    quarkus.langchain4j.easy-rag.recursive=false

Document Splitting Configuration

max-segment-size

Maximum size of document segments in tokens.

  • Type: Integer
  • Default: 300
  • Recommended range: 100-1000
  • Examples:
    # Default size
    quarkus.langchain4j.easy-rag.max-segment-size=300
    
    # Smaller segments for more precise retrieval
    quarkus.langchain4j.easy-rag.max-segment-size=200
    
    # Larger segments for more context
    quarkus.langchain4j.easy-rag.max-segment-size=500

Considerations:

  • Smaller segments: More precise retrieval, but may lack context
  • Larger segments: More context, but less precise retrieval
  • Should be smaller than your LLM's context window
  • Token estimation uses HuggingFace token count estimator

max-overlap-size

Maximum overlap between adjacent segments in tokens.

  • Type: Integer
  • Default: 30
  • Recommended: 10% of max-segment-size
  • Examples:
    # Default overlap
    quarkus.langchain4j.easy-rag.max-overlap-size=30
    
    # More overlap for better continuity
    quarkus.langchain4j.easy-rag.max-overlap-size=50
    
    # No overlap
    quarkus.langchain4j.easy-rag.max-overlap-size=0

Considerations:

  • Overlap helps preserve context across segment boundaries
  • Too much overlap increases storage requirements and processing time
  • Too little overlap may break concepts that span segment boundaries

Retrieval Configuration

max-results

Maximum number of relevant segments to retrieve.

  • Type: Integer
  • Default: 5
  • Examples:
    # Default
    quarkus.langchain4j.easy-rag.max-results=5
    
    # More context for complex queries
    quarkus.langchain4j.easy-rag.max-results=10
    
    # Minimal context for simple queries
    quarkus.langchain4j.easy-rag.max-results=3

Considerations:

  • More results provide more context but increase token usage
  • Consider your LLM's context window size
  • Balance between relevant context and noise

min-score

Minimum similarity score threshold for retrieval results.

  • Type: Double (optional)
  • Default: Not set (no filtering)
  • Range: 0.0 to 1.0 (depending on embedding model)
  • Examples:
    # Only include highly relevant results
    quarkus.langchain4j.easy-rag.min-score=0.7
    
    # More permissive threshold
    quarkus.langchain4j.easy-rag.min-score=0.5

Considerations:

  • Score meaning depends on the embedding model and distance metric
  • Cosine similarity typically ranges from -1 to 1 (higher is more similar)
  • Experiment to find the right threshold for your use case
  • If no results meet the threshold, no context is provided to the LLM

Ingestion Strategy Configuration

ingestion-strategy

Controls when document ingestion occurs.

  • Type: Enum (ON, OFF, MANUAL)
  • Default: ON
  • Values:
    • ON: Automatically ingest at application startup
    • OFF: Do not ingest (useful for pre-populated persistent stores)
    • MANUAL: Wait for manual trigger via EasyRagManualIngestion.ingest()
  • Examples:
    # Automatic ingestion at startup (default)
    quarkus.langchain4j.easy-rag.ingestion-strategy=ON
    
    # No ingestion (pre-populated store)
    quarkus.langchain4j.easy-rag.ingestion-strategy=OFF
    
    # Manual control
    quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL

When to use each strategy:

  • ON: Development, simple applications, documents change infrequently
  • OFF: Production with persistent store that's already populated
  • MANUAL: Dynamic document sources, scheduled updates, event-driven ingestion

Embeddings Reuse Configuration

reuse-embeddings.enabled

Enable embeddings reuse for faster development cycles.

  • Type: Boolean
  • Default: false
  • Limitation: Only works with InMemoryEmbeddingStore
  • Examples:
    # Enable embeddings reuse
    quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=true

How it works:

  1. On first run: Compute embeddings and save to file
  2. On subsequent runs: Load embeddings from file instead of recomputing
  3. Significantly reduces startup time in development

Important: Only use in development. Not compatible with persistent stores.

reuse-embeddings.file

File path for storing/loading embeddings.

  • Type: String
  • Default: easy-rag-embeddings.json
  • Examples:
    # Default location
    quarkus.langchain4j.easy-rag.reuse-embeddings.file=easy-rag-embeddings.json
    
    # Custom location
    quarkus.langchain4j.easy-rag.reuse-embeddings.file=/tmp/embeddings.json

Complete Configuration Examples

Minimal Configuration

# Only required property
quarkus.langchain4j.easy-rag.path=/data/documents

Development Configuration

quarkus.langchain4j.easy-rag.path=src/main/resources/documents
quarkus.langchain4j.easy-rag.path-type=CLASSPATH
quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=true

Production Configuration with Persistent Store

quarkus.langchain4j.easy-rag.path=/opt/data/documents
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL
quarkus.langchain4j.easy-rag.max-segment-size=400
quarkus.langchain4j.easy-rag.max-overlap-size=40
quarkus.langchain4j.easy-rag.max-results=7
quarkus.langchain4j.easy-rag.min-score=0.6

# Redis store configuration
quarkus.langchain4j.redis.dimension=384

Fine-tuned Retrieval

quarkus.langchain4j.easy-rag.path=/data/docs
quarkus.langchain4j.easy-rag.path-matcher=glob:**.{md,txt,pdf}
quarkus.langchain4j.easy-rag.max-segment-size=500
quarkus.langchain4j.easy-rag.max-overlap-size=50
quarkus.langchain4j.easy-rag.max-results=10
quarkus.langchain4j.easy-rag.min-score=0.7

Classpath Resources with Manual Ingestion

quarkus.langchain4j.easy-rag.path=knowledge-base
quarkus.langchain4j.easy-rag.path-type=CLASSPATH
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL
quarkus.langchain4j.easy-rag.recursive=true
quarkus.langchain4j.easy-rag.path-matcher=glob:**.md

Injecting Configuration

You can inject the configuration interface in your beans:

import io.quarkiverse.langchain4j.easyrag.runtime.EasyRagConfig;
import jakarta.inject.Inject;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class MyService {

    @Inject
    EasyRagConfig config;

    public void logConfiguration() {
        System.out.println("Document path: " + config.path());
        System.out.println("Max results: " + config.maxResults());
        System.out.println("Ingestion strategy: " + config.ingestionStrategy());
    }
}

Environment Variables

All properties can be set via environment variables using Quarkus conventions:

# Convert dots to underscores and use uppercase
export QUARKUS_LANGCHAIN4J_EASY_RAG_PATH=/data/documents
export QUARKUS_LANGCHAIN4J_EASY_RAG_MAX_RESULTS=10
export QUARKUS_LANGCHAIN4J_EASY_RAG_INGESTION_STRATEGY=MANUAL

Configuration Profiles

Use Quarkus profiles for environment-specific configuration:

# Development profile
%dev.quarkus.langchain4j.easy-rag.path=src/main/resources/docs
%dev.quarkus.langchain4j.easy-rag.reuse-embeddings.enabled=true

# Production profile
%prod.quarkus.langchain4j.easy-rag.path=/opt/data/documents
%prod.quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL

Related Documentation

  • Manual Ingestion Control - For ingestion-strategy=MANUAL
  • Document Ingestion - How document processing works

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag

docs

architecture.md

configuration.md

document-ingestion.md

index.md

manual-ingestion.md

retrieval-augmentor.md

tile.json