Common classes used across Spring AI providing document processing, text transformation, embedding utilities, observability support, and tokenization capabilities for AI application development
Document readers and writers provide I/O capabilities for loading documents from various sources and writing them to destinations.
The readers and writers layer consists of:
Reads JSON documents and converts them to Document objects.
package org.springframework.ai.reader;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.core.io.Resource;
import java.util.List;
class JsonReader implements DocumentReader {
/**
* Create reader for JSON resource.
* Converts entire JSON to documents.
* @param resource JSON resource to read
*/
JsonReader(Resource resource);
/**
* Create reader with specific JSON keys.
* Only specified keys are used for document content.
* @param resource JSON resource to read
* @param jsonKeysToUse keys to extract for content
*/
JsonReader(Resource resource, String... jsonKeysToUse);
/**
* Create reader with metadata generator.
* @param resource JSON resource to read
* @param jsonMetadataGenerator generator for metadata from JSON
* @param jsonKeysToUse keys to extract for content
*/
JsonReader(Resource resource, JsonMetadataGenerator jsonMetadataGenerator, String... jsonKeysToUse);
/**
* Read and parse JSON into documents.
* @return list of documents
*/
List<Document> get();
/**
* Read using JSON Pointer (RFC 6901).
* Allows navigation to specific parts of JSON structure.
* @param pointer JSON Pointer expression (e.g., "/data/items")
* @return list of documents from pointer location
*/
List<Document> get(String pointer);
}package org.springframework.ai.reader;
import java.util.Map;
@FunctionalInterface
interface JsonMetadataGenerator {
/**
* Generate metadata from JSON document map.
* @param jsonMap JSON document as map
* @return metadata map
*/
Map<String, Object> generate(Map<String, Object> jsonMap);
}package org.springframework.ai.reader;
import java.util.Map;
class EmptyJsonMetadataGenerator implements JsonMetadataGenerator {
/**
* Create empty metadata generator.
*/
EmptyJsonMetadataGenerator();
/**
* Generate empty metadata.
* @param jsonMap JSON document (ignored)
* @return empty map
*/
Map<String, Object> generate(Map<String, Object> jsonMap);
}import org.springframework.ai.reader.JsonReader;
import org.springframework.ai.document.Document;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.FileSystemResource;
import java.util.List;
import java.util.Map;
// Example JSON file content:
// {
// "articles": [
// {
// "title": "Introduction to AI",
// "content": "AI is transforming...",
// "author": "Jane Doe",
// "category": "technology"
// },
// {
// "title": "Machine Learning Basics",
// "content": "ML involves...",
// "author": "John Smith",
// "category": "education"
// }
// ]
// }
// Basic usage - read entire JSON
JsonReader basicReader = new JsonReader(new ClassPathResource("articles.json"));
List<Document> allDocs = basicReader.get();
for (Document doc : allDocs) {
System.out.println("Content: " + doc.getText());
System.out.println("Metadata: " + doc.getMetadata());
}
// Read specific JSON keys for content
JsonReader specificReader = new JsonReader(
new ClassPathResource("articles.json"),
"title", "content" // Only use these keys
);
List<Document> specificDocs = specificReader.get();
// Use JSON Pointer to navigate structure
JsonReader pointerReader = new JsonReader(new ClassPathResource("articles.json"));
List<Document> articleDocs = pointerReader.get("/articles");
// Directly accesses the "articles" array
// Custom metadata generation
JsonReader customReader = new JsonReader(
new ClassPathResource("articles.json"),
jsonMap -> {
// Custom metadata from JSON
return Map.of(
"author", jsonMap.get("author"),
"category", jsonMap.get("category"),
"processed_at", System.currentTimeMillis()
);
},
"title", "content"
);
List<Document> customDocs = customReader.get();
for (Document doc : customDocs) {
System.out.println("Author: " + doc.getMetadata().get("author"));
System.out.println("Category: " + doc.getMetadata().get("category"));
}
// Read from file system
JsonReader fileReader = new JsonReader(
new FileSystemResource("/data/documents.json"),
"text", "summary"
);
List<Document> fileDocs = fileReader.get();import org.springframework.ai.reader.JsonReader;
import org.springframework.ai.document.Document;
import org.springframework.core.io.ClassPathResource;
import java.util.List;
// Complex JSON structure:
// {
// "database": {
// "documents": [
// {"id": 1, "text": "First doc"},
// {"id": 2, "text": "Second doc"}
// ],
// "metadata": {
// "version": "1.0"
// }
// }
// }
JsonReader reader = new JsonReader(new ClassPathResource("complex.json"));
// Access nested array
List<Document> docs = reader.get("/database/documents");
// Access specific array element
List<Document> firstDoc = reader.get("/database/documents/0");
// Access nested object
List<Document> metadata = reader.get("/database/metadata");Reads plain text from Spring Resources.
package org.springframework.ai.reader;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.core.io.Resource;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
class TextReader implements DocumentReader {
// Metadata constants
static final String CHARSET_METADATA = "charset";
static final String SOURCE_METADATA = "source";
/**
* Create reader from resource URL.
* @param resourceUrl URL to text resource
*/
TextReader(String resourceUrl);
/**
* Create reader from Resource.
* @param resource text resource to read
*/
TextReader(Resource resource);
/**
* Get charset for reading.
* Default: UTF-8
* @return charset
*/
Charset getCharset();
/**
* Set charset for reading.
* @param charset charset to use
*/
void setCharset(Charset charset);
/**
* Get custom metadata to include in documents.
* @return metadata map
*/
Map<String, Object> getCustomMetadata();
/**
* Read text resource into document.
* @return list with single document
*/
List<Document> get();
}import org.springframework.ai.reader.TextReader;
import org.springframework.ai.document.Document;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.FileSystemResource;
import org.springframework.core.io.UrlResource;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.List;
// Read from classpath
TextReader classpathReader = new TextReader(new ClassPathResource("knowledge-base.txt"));
List<Document> docs = classpathReader.get();
Document doc = docs.get(0);
System.out.println("Content: " + doc.getText());
System.out.println("Source: " + doc.getMetadata().get(TextReader.SOURCE_METADATA));
System.out.println("Charset: " + doc.getMetadata().get(TextReader.CHARSET_METADATA));
// Read from file system
TextReader fileReader = new TextReader(new FileSystemResource("/data/document.txt"));
List<Document> fileDocs = fileReader.get();
// Read from URL
TextReader urlReader = new TextReader("https://example.com/document.txt");
List<Document> urlDocs = urlReader.get();
// Custom charset
TextReader customCharsetReader = new TextReader(new ClassPathResource("data-latin1.txt"));
customCharsetReader.setCharset(StandardCharsets.ISO_8859_1);
List<Document> latinDocs = customCharsetReader.get();
// Add custom metadata
TextReader customMetadataReader = new TextReader(new ClassPathResource("manual.txt"));
customMetadataReader.getCustomMetadata().put("document_type", "user-manual");
customMetadataReader.getCustomMetadata().put("version", "2.1");
customMetadataReader.getCustomMetadata().put("author", "Documentation Team");
List<Document> customDocs = customMetadataReader.get();
Document customDoc = customDocs.get(0);
System.out.println("Type: " + customDoc.getMetadata().get("document_type"));
System.out.println("Version: " + customDoc.getMetadata().get("version"));
System.out.println("Author: " + customDoc.getMetadata().get("author"));import org.springframework.ai.reader.TextReader;
import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.core.io.ClassPathResource;
import java.util.List;
// Read large text file
TextReader reader = new TextReader(new ClassPathResource("large-document.txt"));
reader.getCustomMetadata().put("source_file", "large-document.txt");
reader.getCustomMetadata().put("import_date", System.currentTimeMillis());
List<Document> documents = reader.get();
// Split into chunks
TokenTextSplitter splitter = TokenTextSplitter.builder()
.withChunkSize(500)
.build();
List<Document> chunks = splitter.apply(documents);
System.out.println("Read 1 document, created " + chunks.size() + " chunks");
// All chunks inherit the custom metadata
for (Document chunk : chunks) {
System.out.println("Chunk source: " + chunk.getMetadata().get("source_file"));
}Writes documents to files.
package org.springframework.ai.writer;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentWriter;
import org.springframework.ai.document.MetadataMode;
import java.util.List;
class FileDocumentWriter implements DocumentWriter {
// Metadata constants for page numbers
static final String METADATA_START_PAGE_NUMBER = "page_number";
static final String METADATA_END_PAGE_NUMBER = "end_page_number";
/**
* Create writer for file.
* @param fileName output file path
*/
FileDocumentWriter(String fileName);
/**
* Create writer with document markers.
* @param fileName output file path
* @param withDocumentMarkers true to add markers between documents
*/
FileDocumentWriter(String fileName, boolean withDocumentMarkers);
/**
* Create writer with full configuration.
* @param fileName output file path
* @param withDocumentMarkers true to add markers between documents
* @param metadataMode metadata inclusion mode
* @param append true to append to existing file
*/
FileDocumentWriter(String fileName, boolean withDocumentMarkers,
MetadataMode metadataMode, boolean append);
/**
* Write documents to file.
* @param docs documents to write
*/
void accept(List<Document> docs);
}import org.springframework.ai.writer.FileDocumentWriter;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentWriter;
import org.springframework.ai.document.MetadataMode;
import java.util.List;
import java.util.Map;
// Create documents to write
List<Document> docs = List.of(
Document.builder()
.text("First document content")
.metadata("author", "Alice")
.metadata("category", "intro")
.build(),
Document.builder()
.text("Second document content")
.metadata("author", "Bob")
.metadata("category", "advanced")
.build()
);
// Basic writing
DocumentWriter basicWriter = new FileDocumentWriter("output.txt");
basicWriter.write(docs);
// Write with document markers (separators between documents)
DocumentWriter markerWriter = new FileDocumentWriter(
"output-with-markers.txt",
true // with markers
);
markerWriter.write(docs);
// Write with metadata
DocumentWriter metadataWriter = new FileDocumentWriter(
"output-with-metadata.txt",
true, // with markers
MetadataMode.ALL, // include all metadata
false // overwrite file
);
metadataWriter.accept(docs);
// Append to existing file
DocumentWriter appendWriter = new FileDocumentWriter(
"output.txt",
true, // with markers
MetadataMode.INFERENCE, // inference metadata only
true // append mode
);
appendWriter.write(docs);
// Write chunks from splitting pipeline
TokenTextSplitter splitter = TokenTextSplitter.builder()
.withChunkSize(500)
.build();
List<Document> sourceDoc = List.of(new Document("Long content..."));
List<Document> chunks = splitter.apply(sourceDoc);
DocumentWriter chunkWriter = new FileDocumentWriter(
"chunks.txt",
true, // separate chunks with markers
MetadataMode.NONE, // content only
false
);
chunkWriter.write(chunks);import org.springframework.ai.writer.FileDocumentWriter;
import org.springframework.ai.document.Document;
import java.util.List;
// Documents with page number metadata
List<Document> pdfPages = List.of(
Document.builder()
.text("Content from page 1")
.metadata(FileDocumentWriter.METADATA_START_PAGE_NUMBER, 1)
.metadata(FileDocumentWriter.METADATA_END_PAGE_NUMBER, 1)
.build(),
Document.builder()
.text("Content from pages 2-3")
.metadata(FileDocumentWriter.METADATA_START_PAGE_NUMBER, 2)
.metadata(FileDocumentWriter.METADATA_END_PAGE_NUMBER, 3)
.build()
);
FileDocumentWriter writer = new FileDocumentWriter(
"pdf-export.txt",
true, // with markers
MetadataMode.ALL,
false
);
writer.write(pdfPages);
// Output includes page number metadata for each documentReformats extracted text by removing unwanted lines, aligning text, and consolidating blank lines.
package org.springframework.ai.reader;
class ExtractedTextFormatter {
/**
* Create builder for configuration.
* @return builder instance
*/
static Builder builder();
/**
* Get formatter with default settings.
* @return default formatter
*/
static ExtractedTextFormatter defaults();
/**
* Trim adjacent blank lines to single blank line.
* @param pageText text to process
* @return processed text
*/
static String trimAdjacentBlankLines(String pageText);
/**
* Align text to left margin (remove leading whitespace).
* @param pageText text to align
* @return aligned text
*/
static String alignToLeft(String pageText);
/**
* Delete bottom text lines.
* @param pageText text to process
* @param numberOfLines number of lines to delete
* @param lineSeparator line separator to use
* @return processed text
*/
static String deleteBottomTextLines(String pageText, int numberOfLines, String lineSeparator);
/**
* Delete top text lines.
* @param pageText text to process
* @param numberOfLines number of lines to delete
* @param lineSeparator line separator to use
* @return processed text
*/
static String deleteTopTextLines(String pageText, int numberOfLines, String lineSeparator);
/**
* Format page text.
* @param pageText text to format
* @return formatted text
*/
String format(String pageText);
/**
* Format page text with page number awareness.
* @param pageText text to format
* @param pageNumber page number (for conditional processing)
* @return formatted text
*/
String format(String pageText, int pageNumber);
}class ExtractedTextFormatter.Builder {
/**
* Enable/disable left alignment.
* Default: true
* @param leftAlignment true to align left
* @return this builder
*/
Builder withLeftAlignment(boolean leftAlignment);
/**
* Set number of top pages to skip before deleting lines.
* Useful for preserving title pages.
* Default: 0
* @param numberOfPages number of pages to skip
* @return this builder
*/
Builder withNumberOfTopPagesToSkipBeforeDelete(int numberOfPages);
/**
* Set number of top text lines to delete.
* Removes headers from each page.
* Default: 0
* @param numberOfLines number of lines to delete
* @return this builder
*/
Builder withNumberOfTopTextLinesToDelete(int numberOfLines);
/**
* Set number of bottom text lines to delete.
* Removes footers from each page.
* Default: 0
* @param numberOfLines number of lines to delete
* @return this builder
*/
Builder withNumberOfBottomTextLinesToDelete(int numberOfLines);
/**
* Override line separator.
* @param lineSeparator separator to use
* @return this builder
*/
Builder overrideLineSeparator(String lineSeparator);
/**
* Build the formatter.
* @return configured formatter
*/
ExtractedTextFormatter build();
}import org.springframework.ai.reader.ExtractedTextFormatter;
// Text extracted from PDF with issues:
String extractedText = """
Header: Company Name - Page 1
This is the actual content
that we want to keep.
It has extra whitespace issues.
Footer: Page 1 of 10
""";
// Use defaults (left align + trim blank lines)
ExtractedTextFormatter defaultFormatter = ExtractedTextFormatter.defaults();
String cleaned = defaultFormatter.format(extractedText);
// Custom formatting - remove headers and footers
ExtractedTextFormatter customFormatter = ExtractedTextFormatter.builder()
.withLeftAlignment(true) // Align to left
.withNumberOfTopTextLinesToDelete(1) // Remove 1 line from top (header)
.withNumberOfBottomTextLinesToDelete(1) // Remove 1 line from bottom (footer)
.build();
String formatted = customFormatter.format(extractedText);
// Process PDF with title page preservation
ExtractedTextFormatter pdfFormatter = ExtractedTextFormatter.builder()
.withNumberOfTopPagesToSkipBeforeDelete(1) // Don't process first page
.withNumberOfTopTextLinesToDelete(2) // Remove 2-line headers from other pages
.withNumberOfBottomTextLinesToDelete(1) // Remove 1-line footers
.build();
// Format with page number
String page1 = pdfFormatter.format(page1Text, 1); // Title page - headers preserved
String page2 = pdfFormatter.format(page2Text, 2); // Regular page - headers removed
// Static utility methods
String textWithBlanks = "Line 1\n\n\n\nLine 2";
String trimmed = ExtractedTextFormatter.trimAdjacentBlankLines(textWithBlanks);
// Result: "Line 1\n\nLine 2"
String indentedText = " Indented line\n More indent";
String aligned = ExtractedTextFormatter.alignToLeft(indentedText);
// Result: "Indented line\nMore indent"
String withHeader = "Header Line\nContent Line\nMore Content";
String noHeader = ExtractedTextFormatter.deleteTopTextLines(withHeader, 1, "\n");
// Result: "Content Line\nMore Content"
String withFooter = "Content Line\nMore Content\nFooter Line";
String noFooter = ExtractedTextFormatter.deleteBottomTextLines(withFooter, 1, "\n");
// Result: "Content Line\nMore Content"import org.springframework.ai.reader.ExtractedTextFormatter;
import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import java.util.ArrayList;
import java.util.List;
/**
* Complete pipeline for processing extracted PDF text.
*/
class PdfProcessor {
private final ExtractedTextFormatter formatter;
private final TokenTextSplitter splitter;
public PdfProcessor() {
// Configure formatter to clean up PDF artifacts
this.formatter = ExtractedTextFormatter.builder()
.withLeftAlignment(true)
.withNumberOfTopPagesToSkipBeforeDelete(1) // Preserve title page
.withNumberOfTopTextLinesToDelete(2) // Remove headers
.withNumberOfBottomTextLinesToDelete(1) // Remove footers
.build();
// Configure splitter for chunks
this.splitter = TokenTextSplitter.builder()
.withChunkSize(500)
.build();
}
public List<Document> processPdf(List<String> pdfPages) {
List<Document> cleanedPages = new ArrayList<>();
for (int i = 0; i < pdfPages.size(); i++) {
String rawText = pdfPages.get(i);
String cleanText = formatter.format(rawText, i + 1);
Document doc = Document.builder()
.text(cleanText)
.metadata("page_number", i + 1)
.metadata("total_pages", pdfPages.size())
.build();
cleanedPages.add(doc);
}
// Split into chunks
return splitter.apply(cleanedPages);
}
}
// Usage
List<String> pdfPages = List.of(
"Title Page Content",
"Header\nPage 2 content\nFooter",
"Header\nPage 3 content\nFooter"
);
PdfProcessor processor = new PdfProcessor();
List<Document> processedChunks = processor.processPdf(pdfPages);import org.springframework.ai.reader.JsonReader;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.document.Document;
import org.springframework.core.io.Resource;
import org.springframework.core.io.ClassPathResource;
import java.util.ArrayList;
import java.util.List;
/**
* Load documents from multiple formats.
*/
class MultiFormatLoader {
public List<Document> loadAll() {
List<Document> allDocs = new ArrayList<>();
// Load JSON documents
JsonReader jsonReader = new JsonReader(
new ClassPathResource("data.json"),
"title", "content"
);
allDocs.addAll(jsonReader.get());
// Load text files
TextReader textReader = new TextReader(new ClassPathResource("readme.txt"));
allDocs.addAll(textReader.get());
// Load more text files
TextReader manualReader = new TextReader(new ClassPathResource("manual.txt"));
manualReader.getCustomMetadata().put("type", "manual");
allDocs.addAll(manualReader.get());
return allDocs;
}
}import org.springframework.ai.reader.TextReader;
import org.springframework.ai.writer.FileDocumentWriter;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.document.Document;
import org.springframework.core.io.ClassPathResource;
import java.util.List;
// Extract
TextReader reader = new TextReader(new ClassPathResource("input.txt"));
List<Document> documents = reader.get();
// Transform
TokenTextSplitter splitter = TokenTextSplitter.builder()
.withChunkSize(500)
.build();
List<Document> chunks = splitter.apply(documents);
// Load
FileDocumentWriter writer = new FileDocumentWriter(
"output.txt",
true, // with markers
MetadataMode.ALL,
false
);
writer.write(chunks);Thread Safety:
JsonReader: Thread-safe for reading (stateless)TextReader: Thread-safe for reading (stateless)FileDocumentWriter: NOT thread-safe for concurrent writes to same fileExtractedTextFormatter: Thread-safe (stateless)Performance:
JsonReader: Memory-based parsing, O(n) where n is JSON sizeTextReader: Streaming read, efficient for large filesFileDocumentWriter: I/O bound, sequential writesExtractedTextFormatter: O(n) where n is text lengthMemory Considerations:
Common Exceptions:
IOException: File not found, permission denied, network errorsJsonProcessingException: Invalid JSON format (JsonReader)IllegalArgumentException: Invalid parameters (null resources, empty paths)RuntimeException: Unexpected errors (encoding issues, resource access failures)Edge Cases:
// Empty file
TextReader reader = new TextReader(new ClassPathResource("empty.txt"));
List<Document> docs = reader.get(); // Returns list with single document containing empty string
// Invalid JSON
JsonReader reader = new JsonReader(new ClassPathResource("invalid.json"));
try {
List<Document> docs = reader.get(); // Throws JsonProcessingException wrapped in RuntimeException
} catch (RuntimeException e) {
// Handle invalid JSON
}
// Missing JSON pointer path
JsonReader reader = new JsonReader(new ClassPathResource("data.json"));
List<Document> docs = reader.get("/nonexistent/path"); // Returns empty list
// Concurrent writes (unsafe)
FileDocumentWriter writer = new FileDocumentWriter("output.txt");
// DON'T: Multiple threads writing concurrently
// DO: Synchronize or use separate writers
// Charset issues
TextReader reader = new TextReader(new ClassPathResource("utf16.txt"));
reader.setCharset(StandardCharsets.UTF_8); // Wrong charset
List<Document> docs = reader.get(); // May produce garbled textInstall with Tessl CLI
npx tessl i tessl/maven-org-springframework-ai--spring-ai-commons