Zero-configuration RAG package that bundles document parsing, embedding, and splitting for easy Retrieval-Augmented Generation in Java applications
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Static utility class for loading documents from the filesystem. Automatically uses Apache Tika for parsing when available (included with easy-rag).
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
public static Document loadDocument(Path filePath)Parameters:
filePath - Path to document fileReturns: Document with parsed content
Supported Formats: All formats supported by Apache Tika (200+ formats including PDF, DOCX, TXT, HTML, etc.)
Example:
import java.nio.file.Path;
import java.nio.file.Paths;
Path path = Paths.get("documents/report.pdf");
Document document = FileSystemDocumentLoader.loadDocument(path);public static Document loadDocument(String filePath)Parameters:
filePath - String path to document fileReturns: Document with parsed content
Example:
Document document = FileSystemDocumentLoader.loadDocument("documents/report.pdf");public static Document loadDocument(Path filePath, DocumentParser documentParser)Parameters:
filePath - Path to document filedocumentParser - Custom parser implementationReturns: Document with parsed content
Example:
DocumentParser customParser = new MyCustomParser();
Document document = FileSystemDocumentLoader.loadDocument(path, customParser);public static Document loadDocument(String filePath, DocumentParser documentParser)String path variant with custom parser.
public static List<Document> loadDocuments(Path directoryPath)Parameters:
directoryPath - Path to directoryReturns: List of documents from directory (non-recursive)
Behavior: Loads all supported files directly in the directory (does not traverse subdirectories)
Example:
Path dir = Paths.get("documents");
List<Document> documents = FileSystemDocumentLoader.loadDocuments(dir);public static List<Document> loadDocuments(String directoryPath)String path variant.
public static List<Document> loadDocuments(
Path directoryPath,
PathMatcher pathMatcher
)Parameters:
directoryPath - Path to directorypathMatcher - Filter for matching specific filesReturns: Filtered list of documents (non-recursive)
Example:
import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;
// Load only PDF files
PathMatcher pdfMatcher = FileSystems.getDefault()
.getPathMatcher("glob:*.pdf");
List<Document> pdfs = FileSystemDocumentLoader.loadDocuments(
Paths.get("documents"),
pdfMatcher
);public static List<Document> loadDocuments(
String directoryPath,
PathMatcher pathMatcher
)String path variant with pattern matching.
public static List<Document> loadDocuments(
Path directoryPath,
DocumentParser documentParser
)Load all documents with custom parser (non-recursive).
public static List<Document> loadDocuments(
String directoryPath,
DocumentParser documentParser
)String path variant.
public static List<Document> loadDocuments(
Path directoryPath,
PathMatcher pathMatcher,
DocumentParser documentParser
)Pattern matching with custom parser (non-recursive).
public static List<Document> loadDocuments(
String directoryPath,
PathMatcher pathMatcher,
DocumentParser documentParser
)String path variant.
public static List<Document> loadDocumentsRecursively(Path directoryPath)Parameters:
directoryPath - Root directory pathReturns: List of all documents in directory tree
Behavior: Recursively traverses all subdirectories
Example:
// Load all documents from directory and subdirectories
List<Document> allDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
Paths.get("documents")
);public static List<Document> loadDocumentsRecursively(String directoryPath)String path variant.
public static List<Document> loadDocumentsRecursively(
Path directoryPath,
PathMatcher pathMatcher
)Parameters:
directoryPath - Root directory pathpathMatcher - Filter for matching specific filesReturns: Filtered list from entire directory tree
Example:
import java.nio.file.FileSystems;
// Load only markdown files from entire tree
PathMatcher mdMatcher = FileSystems.getDefault()
.getPathMatcher("glob:**.md");
List<Document> markdownDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
Paths.get("docs"),
mdMatcher
);public static List<Document> loadDocumentsRecursively(
String directoryPath,
PathMatcher pathMatcher
)String path variant.
public static List<Document> loadDocumentsRecursively(
Path directoryPath,
DocumentParser documentParser
)Recursively load all documents with custom parser.
public static List<Document> loadDocumentsRecursively(
String directoryPath,
DocumentParser documentParser
)String path variant.
public static List<Document> loadDocumentsRecursively(
Path directoryPath,
PathMatcher pathMatcher,
DocumentParser documentParser
)Recursive pattern matching with custom parser.
public static List<Document> loadDocumentsRecursively(
String directoryPath,
PathMatcher pathMatcher,
DocumentParser documentParser
)String path variant.
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import java.nio.file.Paths;
import java.util.List;
// Load everything recursively
List<Document> allDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
Paths.get("knowledge-base")
);
System.out.println("Loaded " + allDocs.size() + " documents");import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;
// Create matcher for multiple extensions
PathMatcher docMatcher = FileSystems.getDefault()
.getPathMatcher("glob:**.{pdf,docx,txt}");
List<Document> documents = FileSystemDocumentLoader.loadDocumentsRecursively(
Paths.get("documents"),
docMatcher
);import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.Metadata;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Collectors;
// Load and enrich with metadata
Path dir = Paths.get("documents");
List<Document> documents = Files.walk(dir)
.filter(Files::isRegularFile)
.map(path -> {
Document doc = FileSystemDocumentLoader.loadDocument(path);
// Add custom metadata
doc.metadata().put("file_name", path.getFileName().toString());
doc.metadata().put("file_path", path.toString());
doc.metadata().put("file_size", Files.size(path));
return doc;
})
.collect(Collectors.toList());import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
// Load from multiple specific directories
List<Document> allDocuments = new ArrayList<>();
// Technical documentation
List<Document> techDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
Paths.get("docs/technical")
);
techDocs.forEach(doc -> doc.metadata().put("category", "technical"));
allDocuments.addAll(techDocs);
// User guides
List<Document> guides = FileSystemDocumentLoader.loadDocumentsRecursively(
Paths.get("docs/guides")
);
guides.forEach(doc -> doc.metadata().put("category", "guides"));
allDocuments.addAll(guides);
// API reference
List<Document> apiDocs = FileSystemDocumentLoader.loadDocumentsRecursively(
Paths.get("docs/api")
);
apiDocs.forEach(doc -> doc.metadata().put("category", "api"));
allDocuments.addAll(apiDocs);import dev.langchain4j.data.document.Document;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
List<Document> documents = new ArrayList<>();
List<Path> failedPaths = new ArrayList<>();
Path dir = Paths.get("documents");
try {
Files.walk(dir)
.filter(Files::isRegularFile)
.forEach(path -> {
try {
Document doc = FileSystemDocumentLoader.loadDocument(path);
documents.add(doc);
} catch (Exception e) {
System.err.println("Failed to load: " + path + " - " + e.getMessage());
failedPaths.add(path);
}
});
} catch (IOException e) {
System.err.println("Failed to walk directory: " + e.getMessage());
}
System.out.println("Loaded: " + documents.size());
System.out.println("Failed: " + failedPaths.size());PathMatcher uses glob syntax:
*.pdf - PDF files in current directory only**.pdf - PDF files in current directory and all subdirectories*.{pdf,docx} - PDF or DOCX files in current directory**.{pdf,docx,txt} - PDF, DOCX, or TXT files anywhere in tree**/api/** - Files in any directory named "api"user-*.pdf - PDF files starting with "user-"Example:
import java.nio.file.FileSystems;
PathMatcher matcher = FileSystems.getDefault().getPathMatcher("glob:**.{pdf,md}");When using easy-rag, Apache Tika is bundled and supports 200+ formats:
Documents: PDF, DOC, DOCX, ODT, RTF, Pages Spreadsheets: XLS, XLSX, ODS, Numbers Presentations: PPT, PPTX, ODP, Keynote Text: TXT, MD, HTML, XML, CSV, JSON Archives: ZIP, TAR, GZ, RAR Email: MSG, EML, MBOX Images: Extract text from images with OCR (when configured) Code: Java, Python, JavaScript, etc. (as plain text)
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-easy-rag