Common classes used across Spring AI providing document processing, text transformation, embedding utilities, observability support, and tokenization capabilities for AI application development
Tokenization provides token counting and estimation capabilities for text and media content using the JTokkit library.
The tokenization layer consists of:
Token counting is essential for managing context windows, batching documents for embedding, and estimating API costs.
Interface for estimating token count in text or messages.
package org.springframework.ai.tokenizer;
import org.springframework.ai.content.MediaContent;
interface TokenCountEstimator {
/**
* Estimate token count in text.
* @param text text to count tokens for
* @return estimated token count
*/
int estimate(String text);
/**
* Estimate token count in media content.
* Includes text, MIME type, and base64 encoded media data.
* @param content media content to count tokens for
* @return estimated token count
*/
int estimate(MediaContent content);
/**
* Estimate token count in multiple contents.
* Sum of tokens across all contents.
* @param messages iterable of media contents
* @return total estimated token count
*/
int estimate(Iterable<MediaContent> messages);
}import org.springframework.ai.tokenizer.TokenCountEstimator;
import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.content.MediaContent;
import java.util.List;
// Create token estimator
TokenCountEstimator estimator = new JTokkitTokenCountEstimator();
// Estimate tokens in text
String text = "Artificial Intelligence is transforming how we build applications.";
int tokenCount = estimator.estimate(text);
System.out.println("Token count: " + tokenCount);
// Estimate for longer text
String longText = """
Spring AI provides a high-level abstraction for integrating
AI capabilities into Spring applications. It supports multiple
AI model providers and offers consistent APIs across different
backends.
""";
int longTokenCount = estimator.estimate(longText);
System.out.println("Long text tokens: " + longTokenCount);
// Estimate for media content (requires MediaContent implementation)
// MediaContent message = ...;
// int messageTokens = estimator.estimate(message);
// Estimate for multiple messages
// List<MediaContent> messages = List.of(...);
// int totalTokens = estimator.estimate(messages);Token counter using the JTokkit library (OpenAI tokenizer implementation).
package org.springframework.ai.tokenizer;
import org.springframework.ai.content.MediaContent;
import com.knuddels.jtokkit.api.EncodingType;
class JTokkitTokenCountEstimator implements TokenCountEstimator {
/**
* Create with default encoding (CL100K_BASE).
* Used by GPT-3.5-turbo and GPT-4.
*/
JTokkitTokenCountEstimator();
/**
* Create with specific encoding type.
* @param tokenEncodingType JTokkit encoding type
*/
JTokkitTokenCountEstimator(EncodingType tokenEncodingType);
/**
* Estimate token count in text.
* @param text text to count tokens for
* @return token count
*/
int estimate(String text);
/**
* Estimate token count in media content.
* Includes text content, MIME type, and base64 encoded data.
* @param content media content to estimate
* @return token count
*/
int estimate(MediaContent content);
/**
* Estimate token count across multiple contents.
* @param contents iterable of media contents
* @return total token count
*/
int estimate(Iterable<MediaContent> contents);
}JTokkit supports multiple encoding types corresponding to different OpenAI models:
import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import com.knuddels.jtokkit.api.EncodingType;
// Default encoding (CL100K_BASE for GPT-3.5/GPT-4)
JTokkitTokenCountEstimator defaultEstimator = new JTokkitTokenCountEstimator();
String text = "Hello, how are you doing today?";
int tokens = defaultEstimator.estimate(text);
System.out.println("Tokens (CL100K_BASE): " + tokens);
// Specific encoding for different models
JTokkitTokenCountEstimator gpt4Estimator = new JTokkitTokenCountEstimator(
EncodingType.CL100K_BASE
);
JTokkitTokenCountEstimator codeEstimator = new JTokkitTokenCountEstimator(
EncodingType.P50K_BASE
);
JTokkitTokenCountEstimator gpt3Estimator = new JTokkitTokenCountEstimator(
EncodingType.R50K_BASE
);
String code = "function hello() { return 'world'; }";
int codeTokens = codeEstimator.estimate(code);
System.out.println("Code tokens: " + codeTokens);
// Different encodings may produce different counts
String sample = "Spring AI simplifies AI integration";
System.out.println("CL100K_BASE: " + gpt4Estimator.estimate(sample));
System.out.println("P50K_BASE: " + codeEstimator.estimate(sample));
System.out.println("R50K_BASE: " + gpt3Estimator.estimate(sample));import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;
// Create estimator
JTokkitTokenCountEstimator estimator = new JTokkitTokenCountEstimator(
EncodingType.CL100K_BASE
);
// Estimate single document
Document doc = new Document("This is a document with some content to analyze.");
int docTokens = estimator.estimate(doc.getText());
System.out.println("Document tokens: " + docTokens);
// Estimate multiple documents
List<Document> documents = List.of(
new Document("First document content"),
new Document("Second document with more content"),
new Document("Third document with even more content to process")
);
int totalTokens = 0;
for (Document document : documents) {
int tokens = estimator.estimate(document.getText());
totalTokens += tokens;
System.out.println("Doc " + document.getId() + ": " + tokens + " tokens");
}
System.out.println("Total tokens: " + totalTokens);
// Check if document fits in context window
int contextWindowSize = 4096; // Example: GPT-3.5-turbo
int promptTokens = 100;
int responseTokens = 500;
int availableForContext = contextWindowSize - promptTokens - responseTokens;
Document largeDoc = new Document("Very large document content...");
int largeDocTokens = estimator.estimate(largeDoc.getText());
if (largeDocTokens <= availableForContext) {
System.out.println("Document fits in context window");
} else {
System.out.println("Document needs to be split: " + largeDocTokens + " > " + availableForContext);
}import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.ArrayList;
import java.util.List;
/**
* Manage documents within model context window limits.
*/
class ContextWindowManager {
private final JTokkitTokenCountEstimator estimator;
private final int contextWindowSize;
private final int reserveTokens;
public ContextWindowManager(int contextWindowSize, int reserveTokens) {
this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
this.contextWindowSize = contextWindowSize;
this.reserveTokens = reserveTokens;
}
/**
* Select documents that fit within context window.
*/
public List<Document> selectDocumentsForContext(List<Document> documents) {
List<Document> selected = new ArrayList<>();
int availableTokens = contextWindowSize - reserveTokens;
int usedTokens = 0;
for (Document doc : documents) {
int docTokens = estimator.estimate(doc.getText());
if (usedTokens + docTokens <= availableTokens) {
selected.add(doc);
usedTokens += docTokens;
} else {
break; // Stop when we can't fit more
}
}
return selected;
}
/**
* Calculate how many documents can fit.
*/
public int calculateCapacity(List<Document> documents) {
int availableTokens = contextWindowSize - reserveTokens;
int usedTokens = 0;
int count = 0;
for (Document doc : documents) {
int docTokens = estimator.estimate(doc.getText());
if (usedTokens + docTokens <= availableTokens) {
usedTokens += docTokens;
count++;
} else {
break;
}
}
return count;
}
}
// Usage
ContextWindowManager manager = new ContextWindowManager(
4096, // GPT-3.5-turbo context window
600 // Reserve for prompt + response
);
List<Document> documents = List.of(/* many documents */);
// Get documents that fit
List<Document> selectedDocs = manager.selectDocumentsForContext(documents);
System.out.println("Selected " + selectedDocs.size() + " out of " + documents.size());
// Check capacity
int capacity = manager.calculateCapacity(documents);
System.out.println("Can fit " + capacity + " documents");import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;
/**
* Estimate API costs based on token usage.
*/
class CostEstimator {
private final JTokkitTokenCountEstimator estimator;
// Example pricing (USD per 1M tokens)
private static final double GPT4_INPUT_COST = 30.0;
private static final double GPT4_OUTPUT_COST = 60.0;
private static final double GPT35_INPUT_COST = 0.5;
private static final double GPT35_OUTPUT_COST = 1.5;
private static final double EMBEDDING_COST = 0.13;
public CostEstimator() {
this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
}
/**
* Estimate cost for chat completion.
*/
public double estimateChatCost(String prompt, int estimatedResponseTokens,
String model) {
int promptTokens = estimator.estimate(prompt);
double inputCost = (model.contains("gpt-4")) ? GPT4_INPUT_COST : GPT35_INPUT_COST;
double outputCost = (model.contains("gpt-4")) ? GPT4_OUTPUT_COST : GPT35_OUTPUT_COST;
double cost = (promptTokens / 1_000_000.0) * inputCost +
(estimatedResponseTokens / 1_000_000.0) * outputCost;
return cost;
}
/**
* Estimate cost for embedding documents.
*/
public double estimateEmbeddingCost(List<Document> documents) {
int totalTokens = 0;
for (Document doc : documents) {
totalTokens += estimator.estimate(doc.getText());
}
return (totalTokens / 1_000_000.0) * EMBEDDING_COST;
}
/**
* Estimate batch processing cost.
*/
public BatchCostEstimate estimateBatchCost(List<String> prompts,
int avgResponseTokens,
String model) {
int totalInputTokens = 0;
for (String prompt : prompts) {
totalInputTokens += estimator.estimate(prompt);
}
int totalOutputTokens = prompts.size() * avgResponseTokens;
double inputCost = (model.contains("gpt-4")) ? GPT4_INPUT_COST : GPT35_INPUT_COST;
double outputCost = (model.contains("gpt-4")) ? GPT4_OUTPUT_COST : GPT35_OUTPUT_COST;
double totalCost = (totalInputTokens / 1_000_000.0) * inputCost +
(totalOutputTokens / 1_000_000.0) * outputCost;
return new BatchCostEstimate(
prompts.size(),
totalInputTokens,
totalOutputTokens,
totalCost
);
}
record BatchCostEstimate(
int requests,
int inputTokens,
int outputTokens,
double costUSD
) {
@Override
public String toString() {
return String.format(
"Requests: %d | Input: %,d tokens | Output: %,d tokens | Cost: $%.4f",
requests, inputTokens, outputTokens, costUSD
);
}
}
}
// Usage
CostEstimator costEstimator = new CostEstimator();
// Estimate single chat
String prompt = "Explain quantum computing in simple terms";
double chatCost = costEstimator.estimateChatCost(prompt, 500, "gpt-4");
System.out.printf("Chat cost: $%.6f%n", chatCost);
// Estimate embedding cost
List<Document> docs = List.of(/* documents to embed */);
double embeddingCost = costEstimator.estimateEmbeddingCost(docs);
System.out.printf("Embedding cost: $%.4f%n", embeddingCost);
// Estimate batch cost
List<String> prompts = List.of(/* multiple prompts */);
CostEstimator.BatchCostEstimate batchEstimate =
costEstimator.estimateBatchCost(prompts, 300, "gpt-3.5-turbo");
System.out.println(batchEstimate);import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;
/**
* Select optimal chunk size based on document characteristics.
*/
class ChunkSizeOptimizer {
private final JTokkitTokenCountEstimator estimator;
public ChunkSizeOptimizer() {
this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
}
/**
* Recommend chunk size based on document and model constraints.
*/
public int recommendChunkSize(Document document,
int embeddingModelLimit,
int targetChunksPerDoc) {
int totalTokens = estimator.estimate(document.getText());
// Calculate size based on desired number of chunks
int sizeFromChunkCount = totalTokens / targetChunksPerDoc;
// Use smaller of model limit or calculated size
int recommendedSize = Math.min(embeddingModelLimit, sizeFromChunkCount);
// Ensure minimum viable chunk size
int minChunkSize = 100;
recommendedSize = Math.max(minChunkSize, recommendedSize);
return recommendedSize;
}
/**
* Analyze document and recommend splitting strategy.
*/
public SplittingStrategy analyzeSplittingStrategy(Document document,
int embeddingModelLimit) {
int totalTokens = estimator.estimate(document.getText());
if (totalTokens <= embeddingModelLimit) {
return new SplittingStrategy(false, embeddingModelLimit, 1, totalTokens);
}
// Calculate optimal chunk size and expected number of chunks
int optimalChunkSize = embeddingModelLimit - 50; // Small buffer
int expectedChunks = (int) Math.ceil((double) totalTokens / optimalChunkSize);
return new SplittingStrategy(true, optimalChunkSize, expectedChunks, totalTokens);
}
record SplittingStrategy(
boolean needsSplitting,
int recommendedChunkSize,
int expectedChunks,
int totalTokens
) {
@Override
public String toString() {
if (!needsSplitting) {
return String.format("No splitting needed (%,d tokens)", totalTokens);
}
return String.format(
"Split into ~%d chunks of %,d tokens each (total: %,d tokens)",
expectedChunks, recommendedChunkSize, totalTokens
);
}
}
}
// Usage
ChunkSizeOptimizer optimizer = new ChunkSizeOptimizer();
Document doc = new Document("Very long document content...");
// Get splitting recommendation
ChunkSizeOptimizer.SplittingStrategy strategy =
optimizer.analyzeSplittingStrategy(doc, 8191); // OpenAI embedding limit
System.out.println(strategy);
if (strategy.needsSplitting()) {
// Create splitter with recommended size
TokenTextSplitter splitter = TokenTextSplitter.builder()
.withChunkSize(strategy.recommendedChunkSize())
.build();
List<Document> chunks = splitter.split(doc);
System.out.println("Created " + chunks.size() + " chunks");
}import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import com.knuddels.jtokkit.api.EncodingType;
/**
* Allocate token budget across prompt components.
*/
class TokenBudgetAllocator {
private final JTokkitTokenCountEstimator estimator;
public TokenBudgetAllocator() {
this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
}
/**
* Calculate token allocation for RAG prompt.
*/
public TokenAllocation allocateForRAG(int contextWindow,
String systemPrompt,
String userQuery,
int expectedResponseTokens) {
int systemTokens = estimator.estimate(systemPrompt);
int queryTokens = estimator.estimate(userQuery);
int availableForContext = contextWindow - systemTokens - queryTokens
- expectedResponseTokens - 50; // Safety buffer
return new TokenAllocation(
contextWindow,
systemTokens,
queryTokens,
availableForContext,
expectedResponseTokens
);
}
record TokenAllocation(
int totalContext,
int systemPrompt,
int userQuery,
int availableForDocuments,
int reservedForResponse
) {
public int usedTokens() {
return systemPrompt + userQuery + reservedForResponse;
}
public double contextUtilization() {
return (double) usedTokens() / totalContext;
}
@Override
public String toString() {
return String.format("""
Token Budget Allocation:
- Total context window: %,d tokens
- System prompt: %,d tokens
- User query: %,d tokens
- Available for documents: %,d tokens
- Reserved for response: %,d tokens
- Utilization: %.1f%%
""",
totalContext, systemPrompt, userQuery, availableForDocuments,
reservedForResponse, contextUtilization() * 100
);
}
}
}
// Usage
TokenBudgetAllocator allocator = new TokenBudgetAllocator();
String systemPrompt = "You are a helpful AI assistant specializing in technical documentation.";
String userQuery = "Explain how to configure Spring AI with OpenAI";
TokenBudgetAllocator.TokenAllocation allocation = allocator.allocateForRAG(
4096, // GPT-3.5-turbo context
systemPrompt,
userQuery,
500 // Expected response size
);
System.out.println(allocation);
// Use available tokens for document selection
System.out.println("Can include " + allocation.availableForDocuments() + " tokens of context");import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;
import java.util.DoubleSummaryStatistics;
/**
* Collect token statistics for document analysis.
*/
class TokenStatisticsCollector {
private final JTokkitTokenCountEstimator estimator;
public TokenStatisticsCollector() {
this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
}
/**
* Collect comprehensive token statistics.
*/
public TokenStatistics analyze(List<Document> documents) {
DoubleSummaryStatistics stats = documents.stream()
.mapToDouble(doc -> estimator.estimate(doc.getText()))
.summaryStatistics();
return new TokenStatistics(
documents.size(),
(int) stats.getSum(),
(int) stats.getMin(),
(int) stats.getMax(),
stats.getAverage()
);
}
record TokenStatistics(
int documentCount,
int totalTokens,
int minTokens,
int maxTokens,
double avgTokens
) {
public String formatSummary() {
return String.format("""
Token Statistics:
- Documents: %,d
- Total tokens: %,d
- Average: %.1f tokens/doc
- Range: %,d - %,d tokens
""",
documentCount, totalTokens, avgTokens, minTokens, maxTokens
);
}
public boolean isUniform() {
// Check if document sizes are relatively uniform
return (maxTokens - minTokens) < (avgTokens * 0.5);
}
}
}
// Usage
TokenStatisticsCollector collector = new TokenStatisticsCollector();
List<Document> documents = List.of(
new Document("Short doc"),
new Document("Medium length document with more content"),
new Document("Long document with substantial content that spans multiple paragraphs")
);
TokenStatisticsCollector.TokenStatistics stats = collector.analyze(documents);
System.out.println(stats.formatSummary());
if (stats.isUniform()) {
System.out.println("Documents have uniform size - optimal for batching");
} else {
System.out.println("Documents have variable size - consider adaptive batching");
}JTokkitTokenCountEstimator is optimized for performance:
import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;
// Reuse estimator instance (encoding is cached)
JTokkitTokenCountEstimator estimator = new JTokkitTokenCountEstimator(
EncodingType.CL100K_BASE
);
// Efficient for repeated estimates
for (int i = 0; i < 10000; i++) {
String text = "Text to estimate " + i;
int tokens = estimator.estimate(text);
}
// Avoid creating new estimators repeatedly
// BAD: new JTokkitTokenCountEstimator() in loop
// GOOD: Create once, reuseThread Safety:
JTokkitTokenCountEstimator: Thread-safe with cached encoding registryPerformance:
Performance Benchmarks (approximate):
Common Exceptions:
NullPointerException: If text or content is nullIllegalArgumentException: If encoding type is invalidRuntimeException: Encoding errors (very rare)Edge Cases:
// Empty string
int tokens = estimator.estimate(""); // Returns 0
// Null text
try {
int tokens = estimator.estimate((String) null); // Throws NullPointerException
} catch (NullPointerException e) {
// Handle null input
}
// Very long text
String huge = "...10MB of text...";
int tokens = estimator.estimate(huge); // Works but may take time
// Special characters and emojis
int tokens = estimator.estimate("Hello 👋 World 🌍");
// Emojis may count as multiple tokens
// Different encodings produce different counts
JTokkitTokenCountEstimator cl100k = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
JTokkitTokenCountEstimator p50k = new JTokkitTokenCountEstimator(EncodingType.P50K_BASE);
String text = "Same text";
int count1 = cl100k.estimate(text); // May differ from count2
int count2 = p50k.estimate(text);Choosing the Right Encoding:
Important: Always match the encoding to your target model for accurate token counts.
Install with Tessl CLI
npx tessl i tessl/maven-org-springframework-ai--spring-ai-commons