Common classes used across Spring AI providing document processing, text transformation, embedding utilities, observability support, and tokenization capabilities for AI application development
Document processing provides core interfaces and implementations for ETL (Extract, Transform, Load) operations on documents in AI pipelines.
The document processing layer consists of:
These interfaces follow functional programming patterns and extend standard Java functional interfaces (Supplier, Consumer, Function).
Reads documents from a source and returns a list of Document objects.
package org.springframework.ai.document;
import java.util.List;
import java.util.function.Supplier;
interface DocumentReader extends Supplier<List<Document>> {
/**
* Read documents from the source.
* Default implementation calls get().
* @return list of documents
*/
default List<Document> read() {
return get();
}
/**
* Get documents from the source (from Supplier interface).
* @return list of documents
*/
List<Document> get();
}import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.reader.JsonReader;
import org.springframework.core.io.ClassPathResource;
import java.util.List;
// Read text documents
DocumentReader textReader = new TextReader(new ClassPathResource("data.txt"));
List<Document> textDocs = textReader.get();
// Read JSON documents
DocumentReader jsonReader = new JsonReader(new ClassPathResource("data.json"));
List<Document> jsonDocs = jsonReader.read();
// Process documents
for (Document doc : textDocs) {
System.out.println("ID: " + doc.getId());
System.out.println("Content: " + doc.getText());
System.out.println("Metadata: " + doc.getMetadata());
}See Readers and Writers documentation for specific reader implementations (JsonReader, TextReader).
Writes a list of Document instances to a destination.
package org.springframework.ai.document;
import java.util.List;
import java.util.function.Consumer;
interface DocumentWriter extends Consumer<List<Document>> {
/**
* Write documents to the destination.
* Default implementation calls accept().
* @param documents list of documents to write
*/
default void write(List<Document> documents) {
accept(documents);
}
/**
* Accept documents (from Consumer interface).
* @param documents list of documents to write
*/
void accept(List<Document> documents);
}import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentWriter;
import org.springframework.ai.document.MetadataMode;
import org.springframework.ai.writer.FileDocumentWriter;
import java.util.List;
// Create documents
List<Document> docs = List.of(
new Document("First document"),
new Document("Second document"),
new Document("Third document")
);
// Write to file
DocumentWriter writer = new FileDocumentWriter("output.txt");
writer.write(docs);
// Write with document markers and metadata
DocumentWriter writerWithMarkers = new FileDocumentWriter(
"output-with-metadata.txt",
true, // with document markers
MetadataMode.ALL,
false // don't append
);
writerWithMarkers.accept(docs);See Readers and Writers documentation for specific writer implementations (FileDocumentWriter).
Transforms a list of documents into another list of documents.
package org.springframework.ai.document;
import java.util.List;
import java.util.function.Function;
interface DocumentTransformer extends Function<List<Document>, List<Document>> {
/**
* Transform documents.
* Default implementation calls apply().
* @param documents list of documents to transform
* @return transformed list of documents
*/
default List<Document> transform(List<Document> documents) {
return apply(documents);
}
/**
* Apply transformation (from Function interface).
* @param input list of documents to transform
* @return transformed list of documents
*/
List<Document> apply(List<Document> input);
}import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentTransformer;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.transformer.ContentFormatTransformer;
import org.springframework.ai.document.ContentFormatter;
import org.springframework.ai.document.DefaultContentFormatter;
import java.util.List;
// Split documents into chunks
DocumentTransformer splitter = TokenTextSplitter.builder()
.withChunkSize(500)
.build();
List<Document> originalDocs = List.of(
new Document("Long document content that needs to be split...")
);
List<Document> chunks = splitter.apply(originalDocs);
// Apply content formatting
ContentFormatter formatter = DefaultContentFormatter.defaultConfig();
DocumentTransformer formatTransformer = new ContentFormatTransformer(formatter);
List<Document> formattedDocs = formatTransformer.transform(chunks);
// Chain transformers
List<Document> result = formatTransformer.apply(splitter.apply(originalDocs));Common DocumentTransformer implementations:
Generates unique document IDs from content.
package org.springframework.ai.document.id;
interface IdGenerator {
/**
* Generate a unique ID from content.
* @param contents variable content to generate ID from
* @return unique ID string
*/
String generateId(Object... contents);
}Generates random UUID-based IDs.
package org.springframework.ai.document.id;
class RandomIdGenerator implements IdGenerator {
RandomIdGenerator();
/**
* Generate a random UUID.
* @param contents ignored
* @return random UUID as string
*/
String generateId(Object... contents);
}Generates IDs based on SHA-256 hash of content.
package org.springframework.ai.document.id;
import java.nio.charset.Charset;
class JdkSha256HexIdGenerator implements IdGenerator {
/**
* Create generator with SHA-256 and UTF-8.
*/
JdkSha256HexIdGenerator();
/**
* Create generator with custom algorithm and charset.
* @param algorithm hash algorithm (e.g., "SHA-256", "MD5")
* @param charset character encoding
*/
JdkSha256HexIdGenerator(String algorithm, Charset charset);
/**
* Generate ID from content hash.
* @param contents content to hash
* @return hash-based UUID string
*/
String generateId(Object... contents);
}import org.springframework.ai.document.Document;
import org.springframework.ai.document.id.IdGenerator;
import org.springframework.ai.document.id.RandomIdGenerator;
import org.springframework.ai.document.id.JdkSha256HexIdGenerator;
import java.nio.charset.StandardCharsets;
// Random ID generator
IdGenerator randomGen = new RandomIdGenerator();
String id1 = randomGen.generateId("content"); // e.g., "a1b2c3d4-..."
String id2 = randomGen.generateId("content"); // Different ID
// SHA-256 based generator (deterministic)
IdGenerator sha256Gen = new JdkSha256HexIdGenerator();
String id3 = sha256Gen.generateId("same content");
String id4 = sha256Gen.generateId("same content"); // Same ID as id3
String id5 = sha256Gen.generateId("different content"); // Different ID
// Custom algorithm generator
IdGenerator md5Gen = new JdkSha256HexIdGenerator("MD5", StandardCharsets.UTF_8);
String id6 = md5Gen.generateId("content");
// Use with Document builder
Document doc1 = Document.builder()
.idGenerator(sha256Gen)
.text("Document content")
.build();
Document doc2 = Document.builder()
.idGenerator(randomGen)
.text("Another document")
.build();
// Multiple content inputs for hash
String compositeId = sha256Gen.generateId("part1", "part2", "part3");RandomIdGenerator:
JdkSha256HexIdGenerator:
Common patterns for chaining readers, transformers, and writers.
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.ai.document.DocumentTransformer;
import org.springframework.ai.document.DocumentWriter;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.writer.FileDocumentWriter;
import org.springframework.core.io.ClassPathResource;
import java.util.List;
// 1. Read
DocumentReader reader = new TextReader(new ClassPathResource("input.txt"));
List<Document> documents = reader.get();
// 2. Transform
DocumentTransformer splitter = TokenTextSplitter.builder()
.withChunkSize(800)
.build();
List<Document> chunks = splitter.apply(documents);
// 3. Write
DocumentWriter writer = new FileDocumentWriter("output.txt");
writer.write(chunks);import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.transformer.ContentFormatTransformer;
import org.springframework.ai.document.DefaultContentFormatter;
import java.util.List;
import java.util.function.Function;
// Create transformers
TokenTextSplitter splitter = TokenTextSplitter.builder()
.withChunkSize(500)
.build();
ContentFormatTransformer formatter = new ContentFormatTransformer(
DefaultContentFormatter.defaultConfig()
);
// Compose transformers
Function<List<Document>, List<Document>> pipeline =
splitter.andThen(formatter);
// Apply pipeline
List<Document> input = // ... source documents
List<Document> output = pipeline.apply(input);import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.ai.reader.TextReader;
import org.springframework.core.io.Resource;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.stream.Stream;
// Read from multiple sources in parallel
List<Resource> resources = List.of(/* multiple resources */);
List<CompletableFuture<List<Document>>> futures = resources.stream()
.map(resource -> CompletableFuture.supplyAsync(() -> {
DocumentReader reader = new TextReader(resource);
return reader.get();
}))
.toList();
// Wait for all and flatten results
List<Document> allDocuments = futures.stream()
.map(CompletableFuture::join)
.flatMap(List::stream)
.toList();import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentTransformer;
import java.util.List;
class ConditionalTransformer implements DocumentTransformer {
private final DocumentTransformer transformer;
private final String metadataKey;
private final Object requiredValue;
public ConditionalTransformer(DocumentTransformer transformer,
String metadataKey,
Object requiredValue) {
this.transformer = transformer;
this.metadataKey = metadataKey;
this.requiredValue = requiredValue;
}
@Override
public List<Document> apply(List<Document> documents) {
// Filter documents matching condition
List<Document> matching = documents.stream()
.filter(doc -> requiredValue.equals(doc.getMetadata().get(metadataKey)))
.toList();
// Transform only matching documents
List<Document> transformed = transformer.apply(matching);
// Combine with non-matching documents
List<Document> nonMatching = documents.stream()
.filter(doc -> !requiredValue.equals(doc.getMetadata().get(metadataKey)))
.toList();
return Stream.concat(transformed.stream(), nonMatching.stream()).toList();
}
}
// Usage
DocumentTransformer conditionalSplitter = new ConditionalTransformer(
TokenTextSplitter.builder().build(),
"type",
"long-form"
);
List<Document> result = conditionalSplitter.apply(documents);Thread Safety:
DocumentReader implementations: Generally thread-safe (stateless)DocumentWriter implementations: Check specific implementation (FileDocumentWriter is NOT thread-safe for concurrent writes to same file)DocumentTransformer implementations: Stateless and thread-safeIdGenerator implementations: RandomIdGenerator and JdkSha256HexIdGenerator are thread-safePerformance:
Common Exceptions:
IOException: File not found, network errors, permission denied (readers/writers)IllegalArgumentException: Invalid parameters (null documents, negative sizes)RuntimeException: Unexpected processing errors (JSON parsing, encoding issues)Edge Cases:
// Empty document list
List<Document> empty = List.of();
DocumentTransformer transformer = // ... any transformer
List<Document> result = transformer.apply(empty); // Returns empty list
// Null handling
try {
transformer.apply(null); // Throws NullPointerException
} catch (NullPointerException e) {
// Handle null input
}
// Reader with missing resource
try {
DocumentReader reader = new TextReader(new ClassPathResource("missing.txt"));
List<Document> docs = reader.get(); // Throws IOException wrapped in RuntimeException
} catch (RuntimeException e) {
// Handle missing resource
}Install with Tessl CLI
npx tessl i tessl/maven-org-springframework-ai--spring-ai-commons