CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-easy-rag

Zero-configuration RAG package that bundles document parsing, embedding, and splitting for easy Retrieval-Augmented Generation in Java applications

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

api-document-loading.mddocs/

Document Loading API

FileSystemDocumentLoader

Static utility class for loading documents from the filesystem. Automatically uses Apache Tika for parsing when available (included with easy-rag).

Load Single Document

Load from Path

import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;

public static Document loadDocument(Path filePath)

Parameters:

  • filePath - Path to document file

Returns: Document with parsed content

Supported Formats: All formats supported by Apache Tika (200+ formats including PDF, DOCX, TXT, HTML, etc.)

Example:

import java.nio.file.Path;
import java.nio.file.Paths;

Path path = Paths.get("documents/report.pdf");
Document document = FileSystemDocumentLoader.loadDocument(path);

Load from String Path

public static Document loadDocument(String filePath)

Parameters:

  • filePath - String path to document file

Returns: Document with parsed content

Example:

Document document = FileSystemDocumentLoader.loadDocument("documents/report.pdf");

Load with Custom Parser

public static Document loadDocument(Path filePath, DocumentParser documentParser)

Parameters:

  • filePath - Path to document file
  • documentParser - Custom parser implementation

Returns: Document with parsed content

Example:

DocumentParser customParser = new MyCustomParser();
Document document = FileSystemDocumentLoader.loadDocument(path, customParser);
public static Document loadDocument(String filePath, DocumentParser documentParser)

String path variant with custom parser.

Load Multiple Documents (Non-Recursive)

Load from Directory

public static List<Document> loadDocuments(Path directoryPath)

Parameters:

  • directoryPath - Path to directory

Returns: List of documents from directory (non-recursive)

Behavior: Loads all supported files directly in the directory (does not traverse subdirectories)

Example:

Path dir = Paths.get("documents");
List<Document> documents = FileSystemDocumentLoader.loadDocuments(dir);

Load from String Directory

public static List<Document> loadDocuments(String directoryPath)

String path variant.

Load with Pattern Matching

public static List<Document> loadDocuments(
    Path directoryPath,
    PathMatcher pathMatcher
)

Parameters:

  • directoryPath - Path to directory
  • pathMatcher - Filter for matching specific files

Returns: Filtered list of documents (non-recursive)

Example:

import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;

// Load only PDF files
PathMatcher pdfMatcher = FileSystems.getDefault()
    .getPathMatcher("glob:*.pdf");

List<Document> pdfs = FileSystemDocumentLoader.loadDocuments(
    Paths.get("documents"),
    pdfMatcher
);
public static List<Document> loadDocuments(
    String directoryPath,
    PathMatcher pathMatcher
)

String path variant with pattern matching.

Load with Custom Parser

public static List<Document> loadDocuments(
    Path directoryPath,
    DocumentParser documentParser
)

Load all documents with custom parser (non-recursive).

public static List<Document> loadDocuments(
    String directoryPath,
    DocumentParser documentParser
)

String path variant.

public static List<Document> loadDocuments(
    Path directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

Pattern matching with custom parser (non-recursive).

public static List<Document> loadDocuments(
    String directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

String path variant.

Load Multiple Documents (Recursive)

Load from Directory Tree

public static List<Document> loadDocumentsRecursively(Path directoryPath)

Parameters:

  • directoryPath - Root directory path

Returns: List of all documents in directory tree

Behavior: Recursively traverses all subdirectories

Example:

// Load all documents from directory and subdirectories
List<Document> allDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("documents")
);

Load from String Directory (Recursive)

public static List<Document> loadDocumentsRecursively(String directoryPath)

String path variant.

Load with Pattern Matching (Recursive)

public static List<Document> loadDocumentsRecursively(
    Path directoryPath,
    PathMatcher pathMatcher
)

Parameters:

  • directoryPath - Root directory path
  • pathMatcher - Filter for matching specific files

Returns: Filtered list from entire directory tree

Example:

import java.nio.file.FileSystems;

// Load only markdown files from entire tree
PathMatcher mdMatcher = FileSystems.getDefault()
    .getPathMatcher("glob:**.md");

List<Document> markdownDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs"),
    mdMatcher
);
public static List<Document> loadDocumentsRecursively(
    String directoryPath,
    PathMatcher pathMatcher
)

String path variant.

Load with Custom Parser (Recursive)

public static List<Document> loadDocumentsRecursively(
    Path directoryPath,
    DocumentParser documentParser
)

Recursively load all documents with custom parser.

public static List<Document> loadDocumentsRecursively(
    String directoryPath,
    DocumentParser documentParser
)

String path variant.

public static List<Document> loadDocumentsRecursively(
    Path directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

Recursive pattern matching with custom parser.

public static List<Document> loadDocumentsRecursively(
    String directoryPath,
    PathMatcher pathMatcher,
    DocumentParser documentParser
)

String path variant.

Usage Patterns

Load All Documents from Directory Tree

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;

import java.nio.file.Paths;
import java.util.List;

// Load everything recursively
List<Document> allDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("knowledge-base")
);

System.out.println("Loaded " + allDocs.size() + " documents");

Load Specific File Types

import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;

// Create matcher for multiple extensions
PathMatcher docMatcher = FileSystems.getDefault()
    .getPathMatcher("glob:**.{pdf,docx,txt}");

List<Document> documents = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("documents"),
    docMatcher
);

Load with Metadata

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.Metadata;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Collectors;

// Load and enrich with metadata
Path dir = Paths.get("documents");
List<Document> documents = Files.walk(dir)
    .filter(Files::isRegularFile)
    .map(path -> {
        Document doc = FileSystemDocumentLoader.loadDocument(path);

        // Add custom metadata
        doc.metadata().put("file_name", path.getFileName().toString());
        doc.metadata().put("file_path", path.toString());
        doc.metadata().put("file_size", Files.size(path));

        return doc;
    })
    .collect(Collectors.toList());

Selective Loading by Directory

import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

// Load from multiple specific directories
List<Document> allDocuments = new ArrayList<>();

// Technical documentation
List<Document> techDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs/technical")
);
techDocs.forEach(doc -> doc.metadata().put("category", "technical"));
allDocuments.addAll(techDocs);

// User guides
List<Document> guides = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs/guides")
);
guides.forEach(doc -> doc.metadata().put("category", "guides"));
allDocuments.addAll(guides);

// API reference
List<Document> apiDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
    Paths.get("docs/api")
);
apiDocs.forEach(doc -> doc.metadata().put("category", "api"));
allDocuments.addAll(apiDocs);

Error Handling

import dev.langchain4j.data.document.Document;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

List<Document> documents = new ArrayList<>();
List<Path> failedPaths = new ArrayList<>();

Path dir = Paths.get("documents");

try {
    Files.walk(dir)
        .filter(Files::isRegularFile)
        .forEach(path -> {
            try {
                Document doc = FileSystemDocumentLoader.loadDocument(path);
                documents.add(doc);
            } catch (Exception e) {
                System.err.println("Failed to load: " + path + " - " + e.getMessage());
                failedPaths.add(path);
            }
        });
} catch (IOException e) {
    System.err.println("Failed to walk directory: " + e.getMessage());
}

System.out.println("Loaded: " + documents.size());
System.out.println("Failed: " + failedPaths.size());

PathMatcher Patterns

PathMatcher uses glob syntax:

  • *.pdf - PDF files in current directory only
  • **.pdf - PDF files in current directory and all subdirectories
  • *.{pdf,docx} - PDF or DOCX files in current directory
  • **.{pdf,docx,txt} - PDF, DOCX, or TXT files anywhere in tree
  • **/api/** - Files in any directory named "api"
  • user-*.pdf - PDF files starting with "user-"

Example:

import java.nio.file.FileSystems;

PathMatcher matcher = FileSystems.getDefault().getPathMatcher("glob:**.{pdf,md}");

Supported Document Formats

When using easy-rag, Apache Tika is bundled and supports 200+ formats:

Documents: PDF, DOC, DOCX, ODT, RTF, Pages Spreadsheets: XLS, XLSX, ODS, Numbers Presentations: PPT, PPTX, ODP, Keynote Text: TXT, MD, HTML, XML, CSV, JSON Archives: ZIP, TAR, GZ, RAR Email: MSG, EML, MBOX Images: Extract text from images with OCR (when configured) Code: Java, Python, JavaScript, etc. (as plain text)

Related APIs

  • Core Types - Document type definition
  • Document Ingestion API - Ingesting loaded documents
  • Quick Start - Quick start examples

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-easy-rag@1.11.0

docs

api-document-loading.md

api-ingestion.md

api-retrieval.md

api-types-chat.md

api-types-core.md

api-types-storage.md

architecture.md

configuration.md

examples.md

index.md

quickstart.md

reference.md

troubleshooting.md

tile.json