tessl/maven-dev-langchain4j--langchain4j-easy-rag

Zero-configuration RAG package that bundles document parsing, embedding, and splitting for easy Retrieval-Augmented Generation in Java applications

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Document Loading API

Name: tessl/maven-dev-langchain4j--langchain4j-easy-rag
Author: tessl

FileSystemDocumentLoader

Static utility class for loading documents from the filesystem. Automatically uses Apache Tika for parsing when available (included with easy-rag).

Load Single Document

Load from Path

import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;

public static Document loadDocument(Path filePath)

Parameters:

filePath - Path to document file

Returns: Document with parsed content

Supported Formats: All formats supported by Apache Tika (200+ formats including PDF, DOCX, TXT, HTML, etc.)

Example:

import java.nio.file.Path;
import java.nio.file.Paths;

Path path = Paths.get("documents/report.pdf");
Document document = FileSystemDocumentLoader.loadDocument(path);

Load from String Path

public static Document loadDocument(String filePath)

Parameters:

filePath - String path to document file

Returns: Document with parsed content

Example:

Document document = FileSystemDocumentLoader.loadDocument("documents/report.pdf");

Load with Custom Parser

public static Document loadDocument(Path filePath, DocumentParser documentParser)

Parameters:

filePath - Path to document file
documentParser - Custom parser implementation

Returns: Document with parsed content

Example:

DocumentParser customParser = new MyCustomParser();
Document document = FileSystemDocumentLoader.loadDocument(path, customParser);

public static Document loadDocument(String filePath, DocumentParser documentParser)

String path variant with custom parser.

Load Multiple Documents (Non-Recursive)

Load from Directory

public static List<Document> loadDocuments(Path directoryPath)

Parameters:

directoryPath - Path to directory

Returns: List of documents from directory (non-recursive)

Behavior: Loads all supported files directly in the directory (does not traverse subdirectories)

Example:

Path dir = Paths.get("documents");
List<Document> documents = FileSystemDocumentLoader.loadDocuments(dir);

Load from String Directory

public static List<Document> loadDocuments(String directoryPath)

String path variant.

Load with Pattern Matching

public static List<Document> loadDocuments(
    Path directoryPath,
    PathMatcher pathMatcher
)

Parameters:

directoryPath - Path to directory
pathMatcher - Filter for matching specific files

Returns: Filtered list of documents (non-recursive)

Example:

import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;

// Load only PDF files
PathMatcher pdfMatcher = FileSystems.getDefault()
    .getPathMatcher("glob:*.pdf");

List<Document> pdfs = FileSystemDocumentLoader.loadDocuments(
    Paths.get("documents"),
    pdfMatcher
);

public static List<Document> loadDocuments(
    String directoryPath,
    PathMatcher pathMatcher
)

String path variant with pattern matching.

Load with Custom Parser

public static List<Document> loadDocuments(
    Path directoryPath,
    DocumentParser documentParser
)

Load all documents with custom parser (non-recursive).

public static List<Document> loadDocuments(
    String directoryPath,
    DocumentParser documentParser
)

String path variant.

public static List<Document> loadDocuments(
    Path directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

Pattern matching with custom parser (non-recursive).

public static List<Document> loadDocuments(
    String directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

String path variant.

Load Multiple Documents (Recursive)

Load from Directory Tree

public static List<Document> loadDocumentsRecursively(Path directoryPath)

Parameters:

directoryPath - Root directory path

Returns: List of all documents in directory tree

Behavior: Recursively traverses all subdirectories

Example:

// Load all documents from directory and subdirectories
List<Document> allDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("documents")
);

Load from String Directory (Recursive)

public static List<Document> loadDocumentsRecursively(String directoryPath)

String path variant.

Load with Pattern Matching (Recursive)

public static List<Document> loadDocumentsRecursively(
    Path directoryPath,
    PathMatcher pathMatcher
)

Parameters:

directoryPath - Root directory path
pathMatcher - Filter for matching specific files

Returns: Filtered list from entire directory tree

Example:

import java.nio.file.FileSystems;

// Load only markdown files from entire tree
PathMatcher mdMatcher = FileSystems.getDefault()
    .getPathMatcher("glob:**.md");

List<Document> markdownDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs"),
    mdMatcher
);

public static List<Document> loadDocumentsRecursively(
    String directoryPath,
    PathMatcher pathMatcher
)

String path variant.

Load with Custom Parser (Recursive)

public static List<Document> loadDocumentsRecursively(
    Path directoryPath,
    DocumentParser documentParser
)

Recursively load all documents with custom parser.

public static List<Document> loadDocumentsRecursively(
    String directoryPath,
    DocumentParser documentParser
)

String path variant.

public static List<Document> loadDocumentsRecursively(
    Path directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

Recursive pattern matching with custom parser.

public static List<Document> loadDocumentsRecursively(
    String directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

String path variant.

Usage Patterns

Load All Documents from Directory Tree

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;

import java.nio.file.Paths;
import java.util.List;

// Load everything recursively
List<Document> allDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("knowledge-base")
);

System.out.println("Loaded " + allDocs.size() + " documents");

Load Specific File Types

import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;

// Create matcher for multiple extensions
PathMatcher docMatcher = FileSystems.getDefault()
    .getPathMatcher("glob:**.{pdf,docx,txt}");

List<Document> documents = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("documents"),
    docMatcher
);

Load with Metadata

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.Metadata;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Collectors;

// Load and enrich with metadata
Path dir = Paths.get("documents");
List<Document> documents = Files.walk(dir)
    .filter(Files::isRegularFile)
    .map(path -> {
        Document doc = FileSystemDocumentLoader.loadDocument(path);

        // Add custom metadata
        doc.metadata().put("file_name", path.getFileName().toString());
        doc.metadata().put("file_path", path.toString());
        doc.metadata().put("file_size", Files.size(path));

        return doc;
    })
    .collect(Collectors.toList());

Selective Loading by Directory

import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

// Load from multiple specific directories
List<Document> allDocuments = new ArrayList<>();

// Technical documentation
List<Document> techDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs/technical")
);
techDocs.forEach(doc -> doc.metadata().put("category", "technical"));
allDocuments.addAll(techDocs);

// User guides
List<Document> guides = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs/guides")
);
guides.forEach(doc -> doc.metadata().put("category", "guides"));
allDocuments.addAll(guides);

// API reference
List<Document> apiDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs/api")
);
apiDocs.forEach(doc -> doc.metadata().put("category", "api"));
allDocuments.addAll(apiDocs);

Error Handling

import dev.langchain4j.data.document.Document;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

List<Document> documents = new ArrayList<>();
List<Path> failedPaths = new ArrayList<>();

Path dir = Paths.get("documents");

try {
    Files.walk(dir)
        .filter(Files::isRegularFile)
        .forEach(path -> {
            try {
                Document doc = FileSystemDocumentLoader.loadDocument(path);
                documents.add(doc);
            } catch (Exception e) {
                System.err.println("Failed to load: " + path + " - " + e.getMessage());
                failedPaths.add(path);
            }
        });
} catch (IOException e) {
    System.err.println("Failed to walk directory: " + e.getMessage());
}

System.out.println("Loaded: " + documents.size());
System.out.println("Failed: " + failedPaths.size());

PathMatcher Patterns

PathMatcher uses glob syntax:

*.pdf - PDF files in current directory only
**.pdf - PDF files in current directory and all subdirectories
*.{pdf,docx} - PDF or DOCX files in current directory
**.{pdf,docx,txt} - PDF, DOCX, or TXT files anywhere in tree
**/api/** - Files in any directory named "api"
user-*.pdf - PDF files starting with "user-"

Example:

import java.nio.file.FileSystems;

PathMatcher matcher = FileSystems.getDefault().getPathMatcher("glob:**.{pdf,md}");

Supported Document Formats

When using easy-rag, Apache Tika is bundled and supports 200+ formats:

Documents: PDF, DOC, DOCX, ODT, RTF, Pages Spreadsheets: XLS, XLSX, ODS, Numbers Presentations: PPT, PPTX, ODP, Keynote Text: TXT, MD, HTML, XML, CSV, JSON Archives: ZIP, TAR, GZ, RAR Email: MSG, EML, MBOX Images: Extract text from images with OCR (when configured) Code: Java, Python, JavaScript, etc. (as plain text)

Related APIs

Core Types - Document type definition
Document Ingestion API - Ingesting loaded documents
Quick Start - Quick start examples

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-easy-rag

docs

api-document-loading.md

tessl/maven-dev-langchain4j--langchain4j-easy-rag

api-document-loading.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Document Loading API

FileSystemDocumentLoader

Load Single Document

Load from Path

Load from String Path

Load with Custom Parser

Load Multiple Documents (Non-Recursive)

Load from Directory

Load from String Directory

Load with Pattern Matching

Load with Custom Parser

Load Multiple Documents (Recursive)

Load from Directory Tree

Load from String Directory (Recursive)

Load with Pattern Matching (Recursive)

Load with Custom Parser (Recursive)

Usage Patterns

Load All Documents from Directory Tree

Load Specific File Types

Load with Metadata

Selective Loading by Directory

Error Handling

PathMatcher Patterns

Supported Document Formats

Related APIs

api-document-loading.mddocs/