CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-springframework-ai--spring-ai-commons

Common classes used across Spring AI providing document processing, text transformation, embedding utilities, observability support, and tokenization capabilities for AI application development

Overview
Eval results
Files

tokenization.mddocs/reference/

Tokenization

Tokenization provides token counting and estimation capabilities for text and media content using the JTokkit library.

Overview

The tokenization layer consists of:

  • TokenCountEstimator - Interface for estimating token counts
  • JTokkitTokenCountEstimator - JTokkit-based implementation supporting multiple encoding types

Token counting is essential for managing context windows, batching documents for embedding, and estimating API costs.

TokenCountEstimator Interface

Interface for estimating token count in text or messages.

package org.springframework.ai.tokenizer;

import org.springframework.ai.content.MediaContent;

interface TokenCountEstimator {
    /**
     * Estimate token count in text.
     * @param text text to count tokens for
     * @return estimated token count
     */
    int estimate(String text);

    /**
     * Estimate token count in media content.
     * Includes text, MIME type, and base64 encoded media data.
     * @param content media content to count tokens for
     * @return estimated token count
     */
    int estimate(MediaContent content);

    /**
     * Estimate token count in multiple contents.
     * Sum of tokens across all contents.
     * @param messages iterable of media contents
     * @return total estimated token count
     */
    int estimate(Iterable<MediaContent> messages);
}

Usage

import org.springframework.ai.tokenizer.TokenCountEstimator;
import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.content.MediaContent;
import java.util.List;

// Create token estimator
TokenCountEstimator estimator = new JTokkitTokenCountEstimator();

// Estimate tokens in text
String text = "Artificial Intelligence is transforming how we build applications.";
int tokenCount = estimator.estimate(text);
System.out.println("Token count: " + tokenCount);

// Estimate for longer text
String longText = """
    Spring AI provides a high-level abstraction for integrating
    AI capabilities into Spring applications. It supports multiple
    AI model providers and offers consistent APIs across different
    backends.
    """;
int longTokenCount = estimator.estimate(longText);
System.out.println("Long text tokens: " + longTokenCount);

// Estimate for media content (requires MediaContent implementation)
// MediaContent message = ...;
// int messageTokens = estimator.estimate(message);

// Estimate for multiple messages
// List<MediaContent> messages = List.of(...);
// int totalTokens = estimator.estimate(messages);

JTokkitTokenCountEstimator

Token counter using the JTokkit library (OpenAI tokenizer implementation).

package org.springframework.ai.tokenizer;

import org.springframework.ai.content.MediaContent;
import com.knuddels.jtokkit.api.EncodingType;

class JTokkitTokenCountEstimator implements TokenCountEstimator {
    /**
     * Create with default encoding (CL100K_BASE).
     * Used by GPT-3.5-turbo and GPT-4.
     */
    JTokkitTokenCountEstimator();

    /**
     * Create with specific encoding type.
     * @param tokenEncodingType JTokkit encoding type
     */
    JTokkitTokenCountEstimator(EncodingType tokenEncodingType);

    /**
     * Estimate token count in text.
     * @param text text to count tokens for
     * @return token count
     */
    int estimate(String text);

    /**
     * Estimate token count in media content.
     * Includes text content, MIME type, and base64 encoded data.
     * @param content media content to estimate
     * @return token count
     */
    int estimate(MediaContent content);

    /**
     * Estimate token count across multiple contents.
     * @param contents iterable of media contents
     * @return total token count
     */
    int estimate(Iterable<MediaContent> contents);
}

Encoding Types

JTokkit supports multiple encoding types corresponding to different OpenAI models:

  • CL100K_BASE (default): GPT-3.5-turbo, GPT-4, text-embedding-3-small/large
  • P50K_BASE: Code models, text-davinci-002/003
  • P50K_EDIT: Edit models
  • R50K_BASE: GPT-3 models (davinci, curie, babbage, ada)

Usage Examples

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import com.knuddels.jtokkit.api.EncodingType;

// Default encoding (CL100K_BASE for GPT-3.5/GPT-4)
JTokkitTokenCountEstimator defaultEstimator = new JTokkitTokenCountEstimator();

String text = "Hello, how are you doing today?";
int tokens = defaultEstimator.estimate(text);
System.out.println("Tokens (CL100K_BASE): " + tokens);

// Specific encoding for different models
JTokkitTokenCountEstimator gpt4Estimator = new JTokkitTokenCountEstimator(
    EncodingType.CL100K_BASE
);

JTokkitTokenCountEstimator codeEstimator = new JTokkitTokenCountEstimator(
    EncodingType.P50K_BASE
);

JTokkitTokenCountEstimator gpt3Estimator = new JTokkitTokenCountEstimator(
    EncodingType.R50K_BASE
);

String code = "function hello() { return 'world'; }";
int codeTokens = codeEstimator.estimate(code);
System.out.println("Code tokens: " + codeTokens);

// Different encodings may produce different counts
String sample = "Spring AI simplifies AI integration";
System.out.println("CL100K_BASE: " + gpt4Estimator.estimate(sample));
System.out.println("P50K_BASE: " + codeEstimator.estimate(sample));
System.out.println("R50K_BASE: " + gpt3Estimator.estimate(sample));

Token Estimation for Documents

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;

// Create estimator
JTokkitTokenCountEstimator estimator = new JTokkitTokenCountEstimator(
    EncodingType.CL100K_BASE
);

// Estimate single document
Document doc = new Document("This is a document with some content to analyze.");
int docTokens = estimator.estimate(doc.getText());
System.out.println("Document tokens: " + docTokens);

// Estimate multiple documents
List<Document> documents = List.of(
    new Document("First document content"),
    new Document("Second document with more content"),
    new Document("Third document with even more content to process")
);

int totalTokens = 0;
for (Document document : documents) {
    int tokens = estimator.estimate(document.getText());
    totalTokens += tokens;
    System.out.println("Doc " + document.getId() + ": " + tokens + " tokens");
}

System.out.println("Total tokens: " + totalTokens);

// Check if document fits in context window
int contextWindowSize = 4096;  // Example: GPT-3.5-turbo
int promptTokens = 100;
int responseTokens = 500;
int availableForContext = contextWindowSize - promptTokens - responseTokens;

Document largeDoc = new Document("Very large document content...");
int largeDocTokens = estimator.estimate(largeDoc.getText());

if (largeDocTokens <= availableForContext) {
    System.out.println("Document fits in context window");
} else {
    System.out.println("Document needs to be split: " + largeDocTokens + " > " + availableForContext);
}

Practical Use Cases

Context Window Management

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.ArrayList;
import java.util.List;

/**
 * Manage documents within model context window limits.
 */
class ContextWindowManager {
    private final JTokkitTokenCountEstimator estimator;
    private final int contextWindowSize;
    private final int reserveTokens;

    public ContextWindowManager(int contextWindowSize, int reserveTokens) {
        this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
        this.contextWindowSize = contextWindowSize;
        this.reserveTokens = reserveTokens;
    }

    /**
     * Select documents that fit within context window.
     */
    public List<Document> selectDocumentsForContext(List<Document> documents) {
        List<Document> selected = new ArrayList<>();
        int availableTokens = contextWindowSize - reserveTokens;
        int usedTokens = 0;

        for (Document doc : documents) {
            int docTokens = estimator.estimate(doc.getText());

            if (usedTokens + docTokens <= availableTokens) {
                selected.add(doc);
                usedTokens += docTokens;
            } else {
                break;  // Stop when we can't fit more
            }
        }

        return selected;
    }

    /**
     * Calculate how many documents can fit.
     */
    public int calculateCapacity(List<Document> documents) {
        int availableTokens = contextWindowSize - reserveTokens;
        int usedTokens = 0;
        int count = 0;

        for (Document doc : documents) {
            int docTokens = estimator.estimate(doc.getText());

            if (usedTokens + docTokens <= availableTokens) {
                usedTokens += docTokens;
                count++;
            } else {
                break;
            }
        }

        return count;
    }
}

// Usage
ContextWindowManager manager = new ContextWindowManager(
    4096,  // GPT-3.5-turbo context window
    600    // Reserve for prompt + response
);

List<Document> documents = List.of(/* many documents */);

// Get documents that fit
List<Document> selectedDocs = manager.selectDocumentsForContext(documents);
System.out.println("Selected " + selectedDocs.size() + " out of " + documents.size());

// Check capacity
int capacity = manager.calculateCapacity(documents);
System.out.println("Can fit " + capacity + " documents");

Cost Estimation

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;

/**
 * Estimate API costs based on token usage.
 */
class CostEstimator {
    private final JTokkitTokenCountEstimator estimator;

    // Example pricing (USD per 1M tokens)
    private static final double GPT4_INPUT_COST = 30.0;
    private static final double GPT4_OUTPUT_COST = 60.0;
    private static final double GPT35_INPUT_COST = 0.5;
    private static final double GPT35_OUTPUT_COST = 1.5;
    private static final double EMBEDDING_COST = 0.13;

    public CostEstimator() {
        this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
    }

    /**
     * Estimate cost for chat completion.
     */
    public double estimateChatCost(String prompt, int estimatedResponseTokens,
                                    String model) {
        int promptTokens = estimator.estimate(prompt);

        double inputCost = (model.contains("gpt-4")) ? GPT4_INPUT_COST : GPT35_INPUT_COST;
        double outputCost = (model.contains("gpt-4")) ? GPT4_OUTPUT_COST : GPT35_OUTPUT_COST;

        double cost = (promptTokens / 1_000_000.0) * inputCost +
                      (estimatedResponseTokens / 1_000_000.0) * outputCost;

        return cost;
    }

    /**
     * Estimate cost for embedding documents.
     */
    public double estimateEmbeddingCost(List<Document> documents) {
        int totalTokens = 0;

        for (Document doc : documents) {
            totalTokens += estimator.estimate(doc.getText());
        }

        return (totalTokens / 1_000_000.0) * EMBEDDING_COST;
    }

    /**
     * Estimate batch processing cost.
     */
    public BatchCostEstimate estimateBatchCost(List<String> prompts,
                                                int avgResponseTokens,
                                                String model) {
        int totalInputTokens = 0;

        for (String prompt : prompts) {
            totalInputTokens += estimator.estimate(prompt);
        }

        int totalOutputTokens = prompts.size() * avgResponseTokens;

        double inputCost = (model.contains("gpt-4")) ? GPT4_INPUT_COST : GPT35_INPUT_COST;
        double outputCost = (model.contains("gpt-4")) ? GPT4_OUTPUT_COST : GPT35_OUTPUT_COST;

        double totalCost = (totalInputTokens / 1_000_000.0) * inputCost +
                           (totalOutputTokens / 1_000_000.0) * outputCost;

        return new BatchCostEstimate(
            prompts.size(),
            totalInputTokens,
            totalOutputTokens,
            totalCost
        );
    }

    record BatchCostEstimate(
        int requests,
        int inputTokens,
        int outputTokens,
        double costUSD
    ) {
        @Override
        public String toString() {
            return String.format(
                "Requests: %d | Input: %,d tokens | Output: %,d tokens | Cost: $%.4f",
                requests, inputTokens, outputTokens, costUSD
            );
        }
    }
}

// Usage
CostEstimator costEstimator = new CostEstimator();

// Estimate single chat
String prompt = "Explain quantum computing in simple terms";
double chatCost = costEstimator.estimateChatCost(prompt, 500, "gpt-4");
System.out.printf("Chat cost: $%.6f%n", chatCost);

// Estimate embedding cost
List<Document> docs = List.of(/* documents to embed */);
double embeddingCost = costEstimator.estimateEmbeddingCost(docs);
System.out.printf("Embedding cost: $%.4f%n", embeddingCost);

// Estimate batch cost
List<String> prompts = List.of(/* multiple prompts */);
CostEstimator.BatchCostEstimate batchEstimate =
    costEstimator.estimateBatchCost(prompts, 300, "gpt-3.5-turbo");
System.out.println(batchEstimate);

Optimal Chunk Size Selection

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;

/**
 * Select optimal chunk size based on document characteristics.
 */
class ChunkSizeOptimizer {
    private final JTokkitTokenCountEstimator estimator;

    public ChunkSizeOptimizer() {
        this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
    }

    /**
     * Recommend chunk size based on document and model constraints.
     */
    public int recommendChunkSize(Document document,
                                   int embeddingModelLimit,
                                   int targetChunksPerDoc) {
        int totalTokens = estimator.estimate(document.getText());

        // Calculate size based on desired number of chunks
        int sizeFromChunkCount = totalTokens / targetChunksPerDoc;

        // Use smaller of model limit or calculated size
        int recommendedSize = Math.min(embeddingModelLimit, sizeFromChunkCount);

        // Ensure minimum viable chunk size
        int minChunkSize = 100;
        recommendedSize = Math.max(minChunkSize, recommendedSize);

        return recommendedSize;
    }

    /**
     * Analyze document and recommend splitting strategy.
     */
    public SplittingStrategy analyzeSplittingStrategy(Document document,
                                                       int embeddingModelLimit) {
        int totalTokens = estimator.estimate(document.getText());

        if (totalTokens <= embeddingModelLimit) {
            return new SplittingStrategy(false, embeddingModelLimit, 1, totalTokens);
        }

        // Calculate optimal chunk size and expected number of chunks
        int optimalChunkSize = embeddingModelLimit - 50;  // Small buffer
        int expectedChunks = (int) Math.ceil((double) totalTokens / optimalChunkSize);

        return new SplittingStrategy(true, optimalChunkSize, expectedChunks, totalTokens);
    }

    record SplittingStrategy(
        boolean needsSplitting,
        int recommendedChunkSize,
        int expectedChunks,
        int totalTokens
    ) {
        @Override
        public String toString() {
            if (!needsSplitting) {
                return String.format("No splitting needed (%,d tokens)", totalTokens);
            }
            return String.format(
                "Split into ~%d chunks of %,d tokens each (total: %,d tokens)",
                expectedChunks, recommendedChunkSize, totalTokens
            );
        }
    }
}

// Usage
ChunkSizeOptimizer optimizer = new ChunkSizeOptimizer();

Document doc = new Document("Very long document content...");

// Get splitting recommendation
ChunkSizeOptimizer.SplittingStrategy strategy =
    optimizer.analyzeSplittingStrategy(doc, 8191);  // OpenAI embedding limit

System.out.println(strategy);

if (strategy.needsSplitting()) {
    // Create splitter with recommended size
    TokenTextSplitter splitter = TokenTextSplitter.builder()
        .withChunkSize(strategy.recommendedChunkSize())
        .build();

    List<Document> chunks = splitter.split(doc);
    System.out.println("Created " + chunks.size() + " chunks");
}

Token Budget Allocation

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import com.knuddels.jtokkit.api.EncodingType;

/**
 * Allocate token budget across prompt components.
 */
class TokenBudgetAllocator {
    private final JTokkitTokenCountEstimator estimator;

    public TokenBudgetAllocator() {
        this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
    }

    /**
     * Calculate token allocation for RAG prompt.
     */
    public TokenAllocation allocateForRAG(int contextWindow,
                                          String systemPrompt,
                                          String userQuery,
                                          int expectedResponseTokens) {
        int systemTokens = estimator.estimate(systemPrompt);
        int queryTokens = estimator.estimate(userQuery);

        int availableForContext = contextWindow - systemTokens - queryTokens
                                  - expectedResponseTokens - 50;  // Safety buffer

        return new TokenAllocation(
            contextWindow,
            systemTokens,
            queryTokens,
            availableForContext,
            expectedResponseTokens
        );
    }

    record TokenAllocation(
        int totalContext,
        int systemPrompt,
        int userQuery,
        int availableForDocuments,
        int reservedForResponse
    ) {
        public int usedTokens() {
            return systemPrompt + userQuery + reservedForResponse;
        }

        public double contextUtilization() {
            return (double) usedTokens() / totalContext;
        }

        @Override
        public String toString() {
            return String.format("""
                Token Budget Allocation:
                - Total context window: %,d tokens
                - System prompt: %,d tokens
                - User query: %,d tokens
                - Available for documents: %,d tokens
                - Reserved for response: %,d tokens
                - Utilization: %.1f%%
                """,
                totalContext, systemPrompt, userQuery, availableForDocuments,
                reservedForResponse, contextUtilization() * 100
            );
        }
    }
}

// Usage
TokenBudgetAllocator allocator = new TokenBudgetAllocator();

String systemPrompt = "You are a helpful AI assistant specializing in technical documentation.";
String userQuery = "Explain how to configure Spring AI with OpenAI";

TokenBudgetAllocator.TokenAllocation allocation = allocator.allocateForRAG(
    4096,              // GPT-3.5-turbo context
    systemPrompt,
    userQuery,
    500                // Expected response size
);

System.out.println(allocation);

// Use available tokens for document selection
System.out.println("Can include " + allocation.availableForDocuments() + " tokens of context");

Token Statistics Collection

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.document.Document;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;
import java.util.DoubleSummaryStatistics;

/**
 * Collect token statistics for document analysis.
 */
class TokenStatisticsCollector {
    private final JTokkitTokenCountEstimator estimator;

    public TokenStatisticsCollector() {
        this.estimator = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
    }

    /**
     * Collect comprehensive token statistics.
     */
    public TokenStatistics analyze(List<Document> documents) {
        DoubleSummaryStatistics stats = documents.stream()
            .mapToDouble(doc -> estimator.estimate(doc.getText()))
            .summaryStatistics();

        return new TokenStatistics(
            documents.size(),
            (int) stats.getSum(),
            (int) stats.getMin(),
            (int) stats.getMax(),
            stats.getAverage()
        );
    }

    record TokenStatistics(
        int documentCount,
        int totalTokens,
        int minTokens,
        int maxTokens,
        double avgTokens
    ) {
        public String formatSummary() {
            return String.format("""
                Token Statistics:
                - Documents: %,d
                - Total tokens: %,d
                - Average: %.1f tokens/doc
                - Range: %,d - %,d tokens
                """,
                documentCount, totalTokens, avgTokens, minTokens, maxTokens
            );
        }

        public boolean isUniform() {
            // Check if document sizes are relatively uniform
            return (maxTokens - minTokens) < (avgTokens * 0.5);
        }
    }
}

// Usage
TokenStatisticsCollector collector = new TokenStatisticsCollector();

List<Document> documents = List.of(
    new Document("Short doc"),
    new Document("Medium length document with more content"),
    new Document("Long document with substantial content that spans multiple paragraphs")
);

TokenStatisticsCollector.TokenStatistics stats = collector.analyze(documents);
System.out.println(stats.formatSummary());

if (stats.isUniform()) {
    System.out.println("Documents have uniform size - optimal for batching");
} else {
    System.out.println("Documents have variable size - consider adaptive batching");
}

Performance Considerations

JTokkitTokenCountEstimator is optimized for performance:

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import com.knuddels.jtokkit.api.EncodingType;
import java.util.List;

// Reuse estimator instance (encoding is cached)
JTokkitTokenCountEstimator estimator = new JTokkitTokenCountEstimator(
    EncodingType.CL100K_BASE
);

// Efficient for repeated estimates
for (int i = 0; i < 10000; i++) {
    String text = "Text to estimate " + i;
    int tokens = estimator.estimate(text);
}

// Avoid creating new estimators repeatedly
// BAD: new JTokkitTokenCountEstimator() in loop
// GOOD: Create once, reuse

Thread Safety and Performance

Thread Safety:

  • JTokkitTokenCountEstimator: Thread-safe with cached encoding registry
  • Can be safely shared across threads and reused
  • Encoding registry is lazily initialized and cached

Performance:

  • Token counting: O(n) where n is text length
  • Encoding is cached after first use (fast subsequent calls)
  • Memory: Minimal overhead, encoding registry is shared
  • Optimization: Reuse estimator instances (avoid creating new instances repeatedly)

Performance Benchmarks (approximate):

  • ~1M tokens/second for typical text
  • First call may be slower (encoding initialization)
  • Subsequent calls are very fast (cached encoding)

Error Handling

Common Exceptions:

  • NullPointerException: If text or content is null
  • IllegalArgumentException: If encoding type is invalid
  • RuntimeException: Encoding errors (very rare)

Edge Cases:

// Empty string
int tokens = estimator.estimate("");  // Returns 0

// Null text
try {
    int tokens = estimator.estimate((String) null);  // Throws NullPointerException
} catch (NullPointerException e) {
    // Handle null input
}

// Very long text
String huge = "...10MB of text...";
int tokens = estimator.estimate(huge);  // Works but may take time

// Special characters and emojis
int tokens = estimator.estimate("Hello 👋 World 🌍");
// Emojis may count as multiple tokens

// Different encodings produce different counts
JTokkitTokenCountEstimator cl100k = new JTokkitTokenCountEstimator(EncodingType.CL100K_BASE);
JTokkitTokenCountEstimator p50k = new JTokkitTokenCountEstimator(EncodingType.P50K_BASE);
String text = "Same text";
int count1 = cl100k.estimate(text);  // May differ from count2
int count2 = p50k.estimate(text);

Encoding Type Selection

Choosing the Right Encoding:

  • CL100K_BASE (default): GPT-3.5-turbo, GPT-4, text-embedding-3-small/large
  • P50K_BASE: Code models, text-davinci-002/003
  • P50K_EDIT: Edit models
  • R50K_BASE: GPT-3 models (davinci, curie, babbage, ada)

Important: Always match the encoding to your target model for accurate token counts.

Best Practices

  1. Reuse Estimator Instances: Create once, reuse for efficiency
  2. Match Model Encoding: Use correct encoding type for your AI model
  3. Validate Before API Calls: Check token counts to avoid exceeding limits
  4. Budget Token Allocation: Reserve tokens for prompts and responses
  5. Monitor Token Usage: Track tokens for cost estimation
  6. Handle Nulls: Always check for null inputs
  7. Cache Results: For static text, cache token counts

Related Documentation

  • Document Model - Document structure
  • Text Splitting - Using token counts for splitting
  • Embedding - Token-based batching strategies
  • Content Formatting - Formatting affects token count

Install with Tessl CLI

npx tessl i tessl/maven-org-springframework-ai--spring-ai-commons@1.1.0

docs

index.md

README.md

tile.json