CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag

Easy RAG extension for Quarkus LangChain4j that dramatically simplifies implementing Retrieval Augmented Generation pipelines with automatic document ingestion and embedding store management

Overview
Eval results
Files

manual-ingestion.mddocs/

Manual Ingestion Control

The EasyRagManualIngestion bean provides programmatic control over document ingestion timing, enabling scenarios where you need to trigger ingestion based on application logic rather than at startup.

API

package io.quarkiverse.langchain4j.easyrag;

import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class EasyRagManualIngestion {
    /**
     * Triggers manual document ingestion.
     *
     * This method loads documents from the configured path, splits them into segments,
     * generates embeddings, and stores them in the embedding store.
     *
     * @throws IllegalStateException if ingestion strategy is not set to MANUAL
     */
    public void ingest();
}

Configuration

To use manual ingestion, set the ingestion strategy to MANUAL:

quarkus.langchain4j.easy-rag.path=/path/to/documents
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL

Usage Examples

Trigger ingestion on application startup event

import io.quarkus.runtime.StartupEvent;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;
import jakarta.inject.Singleton;

@Singleton
public class IngestionInitializer {

    @Inject
    EasyRagManualIngestion manualIngestion;

    void onStart(@Observes StartupEvent event) {
        // Trigger ingestion after some custom initialization
        performCustomSetup();
        manualIngestion.ingest();
    }

    private void performCustomSetup() {
        // Custom initialization logic
    }
}

Trigger ingestion via REST endpoint

import jakarta.inject.Inject;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.core.Response;

@Path("/admin")
public class AdminResource {

    @Inject
    EasyRagManualIngestion manualIngestion;

    @POST
    @Path("/ingest")
    public Response triggerIngestion() {
        try {
            manualIngestion.ingest();
            return Response.ok("Ingestion completed successfully").build();
        } catch (Exception e) {
            return Response.serverError()
                .entity("Ingestion failed: " + e.getMessage())
                .build();
        }
    }
}

Scheduled ingestion

import io.quarkus.scheduler.Scheduled;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.jboss.logging.Logger;

@ApplicationScoped
public class ScheduledIngestion {

    private static final Logger LOG = Logger.getLogger(ScheduledIngestion.class);

    @Inject
    EasyRagManualIngestion manualIngestion;

    @Scheduled(cron = "0 0 2 * * ?") // Daily at 2 AM
    public void performScheduledIngestion() {
        LOG.info("Starting scheduled document ingestion");
        manualIngestion.ingest();
        LOG.info("Scheduled document ingestion completed");
    }
}

Conditional ingestion based on external signal

import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.eclipse.microprofile.reactive.messaging.Incoming;
import org.jboss.logging.Logger;

@ApplicationScoped
public class DocumentUpdateListener {

    private static final Logger LOG = Logger.getLogger(DocumentUpdateListener.class);

    @Inject
    EasyRagManualIngestion manualIngestion;

    @Incoming("document-updates")
    public void onDocumentUpdate(String message) {
        LOG.infof("Received document update notification: %s", message);
        manualIngestion.ingest();
        LOG.info("Re-ingestion completed");
    }
}

Behavior

When ingest() is called:

  1. Validation: Checks that ingestion-strategy is set to MANUAL. Throws IllegalStateException if not.

  2. Document Loading: Loads documents from the path specified by quarkus.langchain4j.easy-rag.path using the configured path-type (FILESYSTEM or CLASSPATH).

  3. Filtering: Applies the path-matcher pattern to filter files.

  4. Parsing: Uses Apache Tika to parse documents into text content.

  5. Splitting: Splits documents into segments based on max-segment-size and max-overlap-size configuration.

  6. Embedding Generation: Generates embeddings for each segment using the configured EmbeddingModel.

  7. Storage: Stores the embeddings in the configured EmbeddingStore.

  8. Embeddings Reuse: If reuse-embeddings.enabled=true and using an InMemoryEmbeddingStore, saves embeddings to the configured file.

Error Handling

The ingest() method throws an IllegalStateException if:

  • The ingestion strategy is not set to MANUAL

The method may also throw runtime exceptions if:

  • The configured path does not exist or is not accessible
  • Document parsing fails
  • Embedding generation fails
  • Embedding store operations fail

Always handle these exceptions appropriately in your application:

try {
    manualIngestion.ingest();
} catch (IllegalStateException e) {
    LOG.error("Ingestion strategy must be MANUAL", e);
} catch (Exception e) {
    LOG.error("Ingestion failed", e);
    // Implement retry logic or notify administrators
}

Use Cases

Manual ingestion is useful when:

  • Dynamic Document Sources: Documents are uploaded by users and need to be ingested on-demand
  • Scheduled Updates: Documents are updated periodically (e.g., nightly) and need re-ingestion
  • Resource Management: Ingestion is resource-intensive and should happen during off-peak hours
  • Event-Driven: External systems notify your application when documents change
  • Testing: You need precise control over when ingestion happens during integration tests
  • Pre-populated Stores: Using a persistent store that might already contain embeddings, and want to control when to refresh

Integration with Persistent Stores

When using persistent embedding stores (Redis, Chroma, Infinispan), manual ingestion allows you to:

  • Avoid duplicate ingestion on application restart
  • Refresh embeddings only when documents actually change
  • Coordinate ingestion across multiple application instances

Example configuration with Redis:

quarkus.langchain4j.easy-rag.path=/data/documents
quarkus.langchain4j.easy-rag.ingestion-strategy=MANUAL

# Redis as persistent store
quarkus.langchain4j.redis.dimension=384

Dependency Injection

The EasyRagManualIngestion bean is:

  • Scope: @ApplicationScoped
  • Automatic: Always registered by the Easy RAG extension
  • Injectable: Can be injected anywhere in your application using @Inject
import jakarta.inject.Inject;

public class MyService {
    @Inject
    EasyRagManualIngestion manualIngestion;
}

Related Configuration

See Configuration Reference for:

  • quarkus.langchain4j.easy-rag.path - Document path
  • quarkus.langchain4j.easy-rag.ingestion-strategy - Must be MANUAL
  • quarkus.langchain4j.easy-rag.max-segment-size - Segment sizing
  • quarkus.langchain4j.easy-rag.max-overlap-size - Overlap sizing

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-easy-rag

docs

architecture.md

configuration.md

document-ingestion.md

index.md

manual-ingestion.md

retrieval-augmentor.md

tile.json