Spring AI Spring Boot Auto Configuration modules providing automatic setup for AI models, vector stores, MCP, and retry capabilities
Advanced scenarios, edge cases, and solutions for common issues.
When working with models that support large context windows (100K+ tokens).
Standard chunking strategies may not be optimal for models with large context windows.
@Service
public class LargeContextHandler {
private final ChatModel chatModel;
private final int maxContextTokens;
public LargeContextHandler(ChatModel chatModel) {
this.chatModel = chatModel;
this.maxContextTokens = 100000; // For Claude or GPT-4 Turbo
}
public String processLargeDocument(String document) {
// For large context models, send entire document instead of chunking
if (estimateTokens(document) <= maxContextTokens) {
return chatModel.call(
"Analyze this entire document: " + document
);
}
// Fallback to chunking if too large
return processInChunks(document);
}
private int estimateTokens(String text) {
// Rough estimate: 1 token ≈ 4 characters
return text.length() / 4;
}
}When hitting rate limits across all retry attempts.
Rate limits exhausted, causing service disruption.
@Service
public class RateLimitedChatService {
private final ChatModel chatModel;
private final RateLimiter rateLimiter;
private final Queue<CompletableFuture<String>> requestQueue;
public RateLimitedChatService(ChatModel chatModel) {
this.chatModel = chatModel;
this.rateLimiter = RateLimiter.create(10.0); // 10 requests/second
this.requestQueue = new ConcurrentLinkedQueue<>();
startQueueProcessor();
}
public CompletableFuture<String> chat(String message) {
CompletableFuture<String> future = new CompletableFuture<>();
requestQueue.offer(future);
// Process request when rate limit allows
processNextRequest(message, future);
return future;
}
private void processNextRequest(
String message,
CompletableFuture<String> future) {
rateLimiter.acquire(); // Block until rate limit allows
try {
String response = chatModel.call(message);
future.complete(response);
} catch (Exception e) {
future.completeExceptionally(e);
}
}
private void startQueueProcessor() {
// Background thread to process queue
Executors.newSingleThreadExecutor().submit(() -> {
while (true) {
CompletableFuture<String> future = requestQueue.poll();
if (future != null) {
// Process queued request
}
Thread.sleep(100);
}
});
}
}When conversation history exceeds model token limits.
Conversation history grows too large, causing token limit errors.
@Service
public class TokenAwareChatService {
private final ChatClient chatClient;
private final ChatMemory chatMemory;
private final int maxTokens = 8000;
public String chat(String sessionId, String message) {
List<Message> history = chatMemory.get(sessionId);
// Trim history if too long
List<Message> trimmedHistory = trimToTokenLimit(
history,
maxTokens - estimateTokens(message) - 1000 // Reserve for response
);
// Build prompt with trimmed history
String response = chatClient.prompt()
.user(message)
.messages(trimmedHistory)
.call()
.content();
chatMemory.add(sessionId, new UserMessage(message));
chatMemory.add(sessionId, new AssistantMessage(response));
return response;
}
private List<Message> trimToTokenLimit(
List<Message> messages,
int limit) {
List<Message> trimmed = new ArrayList<>();
int tokens = 0;
// Keep most recent messages within limit
for (int i = messages.size() - 1; i >= 0; i--) {
Message msg = messages.get(i);
int msgTokens = estimateTokens(msg.getContent());
if (tokens + msgTokens > limit) {
break;
}
trimmed.add(0, msg);
tokens += msgTokens;
}
return trimmed;
}
private int estimateTokens(String text) {
return text.length() / 4;
}
}When AI provider changes API without notice.
Provider API returns unexpected format, causing parsing errors.
@Service
public class ResilientChatService {
private final ChatModel chatModel;
public String chat(String message) {
try {
return chatModel.call(message);
} catch (JsonProcessingException e) {
log.error("API response format changed: {}", e.getMessage());
// Fallback: Return raw response or error message
return handleApiFormatChange(message, e);
} catch (Exception e) {
log.error("Unexpected error: {}", e.getMessage());
throw e;
}
}
private String handleApiFormatChange(
String message,
Exception e) {
// Log for investigation
log.error("API format issue", e);
// Notify monitoring system
alerting.sendAlert("API format changed", e);
// Return graceful error
return "I'm experiencing technical difficulties. " +
"Please try again later.";
}
}When switching embedding models with different dimensions.
Existing vectors have different dimensions than new embedding model.
@Service
public class EmbeddingMigrationService {
private final EmbeddingModel oldModel;
private final EmbeddingModel newModel;
private final VectorStore vectorStore;
public void migrateEmbeddings() {
// 1. Get all documents
List<Document> allDocs = vectorStore.similaritySearch(
SearchRequest.query("").withTopK(Integer.MAX_VALUE)
);
// 2. Delete old vectors
vectorStore.delete(
allDocs.stream()
.map(Document::getId)
.collect(Collectors.toList())
);
// 3. Re-embed with new model
List<String> texts = allDocs.stream()
.map(Document::getContent)
.collect(Collectors.toList());
EmbeddingResponse newEmbeddings = newModel.embedForResponse(texts);
// 4. Create new documents with new embeddings
List<Document> newDocs = new ArrayList<>();
for (int i = 0; i < allDocs.size(); i++) {
Document oldDoc = allDocs.get(i);
Document newDoc = new Document(oldDoc.getContent());
newDoc.getMetadata().putAll(oldDoc.getMetadata());
newDoc.setEmbedding(
newEmbeddings.getResults().get(i).getOutput()
);
newDocs.add(newDoc);
}
// 5. Add back to vector store
vectorStore.add(newDocs);
}
}When multiple services update vector store simultaneously.
Race conditions in vector store operations.
@Service
public class ConcurrentSafeVectorStore {
private final VectorStore vectorStore;
private final Lock lock = new ReentrantLock();
public void safeAdd(List<Document> documents) {
lock.lock();
try {
// Check for duplicates before adding
List<Document> uniqueDocs = documents.stream()
.filter(doc -> !exists(doc.getId()))
.collect(Collectors.toList());
if (!uniqueDocs.isEmpty()) {
vectorStore.add(uniqueDocs);
}
} finally {
lock.unlock();
}
}
public void safeDelete(List<String> ids) {
lock.lock();
try {
vectorStore.delete(ids);
} finally {
lock.unlock();
}
}
private boolean exists(String id) {
try {
List<Document> results = vectorStore.similaritySearch(
SearchRequest.query("")
.withFilterExpression(
Filter.expression("id == '" + id + "'")
)
.withTopK(1)
);
return !results.isEmpty();
} catch (Exception e) {
return false;
}
}
}When MCP server process crashes during operation.
Stdio MCP server process terminates unexpectedly.
@Service
public class ResilientMcpClient {
private final List<McpSyncClient> clients;
private final Map<String, Process> processes;
@Scheduled(fixedRate = 30000) // Check every 30 seconds
public void healthCheck() {
for (McpSyncClient client : clients) {
try {
client.ping();
} catch (Exception e) {
log.error("MCP client {} unhealthy, attempting restart",
client.getName());
restartClient(client);
}
}
}
private void restartClient(McpSyncClient client) {
try {
// Close existing client
client.close();
// Restart process (for stdio transport)
Process oldProcess = processes.get(client.getName());
if (oldProcess != null && oldProcess.isAlive()) {
oldProcess.destroy();
}
// Recreate client (implementation depends on transport type)
// ...
} catch (Exception e) {
log.error("Failed to restart MCP client", e);
}
}
}When custom classes aren't accessible in native image.
Reflection errors in GraalVM native image for custom tool types.
@Configuration
public class NativeImageConfiguration {
@Bean
public RuntimeHintsRegistrar customRuntimeHints() {
return (hints, classLoader) -> {
// Register custom tool classes for reflection
hints.reflection()
.registerType(MyCustomTool.class,
MemberCategory.INVOKE_PUBLIC_METHODS,
MemberCategory.INVOKE_PUBLIC_CONSTRUCTORS)
.registerType(MyCustomToolRequest.class,
MemberCategory.INVOKE_PUBLIC_METHODS)
.registerType(MyCustomToolResponse.class,
MemberCategory.INVOKE_PUBLIC_METHODS);
// Register custom resources
hints.resources()
.registerPattern("custom-prompts/*.txt")
.registerPattern("mcp-configs/*.json");
// Register for serialization
hints.serialization()
.registerType(MyCustomTool.class)
.registerType(MyCustomToolRequest.class)
.registerType(MyCustomToolResponse.class);
};
}
}Create META-INF/native-image/reflect-config.json:
[
{
"name": "com.example.MyCustomTool",
"allDeclaredConstructors": true,
"allPublicConstructors": true,
"allDeclaredMethods": true,
"allPublicMethods": true,
"allDeclaredFields": true
}
]When conversation memory grows unbounded.
Memory usage increases indefinitely with long conversations.
@Service
public class BoundedChatMemory {
private final ChatMemory chatMemory;
private final int maxMessagesPerSession = 100;
private final Duration sessionTtl = Duration.ofHours(24);
private final Map<String, Instant> sessionLastAccess;
public BoundedChatMemory(ChatMemory chatMemory) {
this.chatMemory = chatMemory;
this.sessionLastAccess = new ConcurrentHashMap<>();
startCleanupTask();
}
public void add(String sessionId, Message message) {
List<Message> history = chatMemory.get(sessionId);
// Limit message count
if (history.size() >= maxMessagesPerSession) {
// Remove oldest messages
int toRemove = history.size() - maxMessagesPerSession + 1;
history = history.subList(toRemove, history.size());
// Clear and re-add
chatMemory.clear(sessionId);
history.forEach(msg -> chatMemory.add(sessionId, msg));
}
chatMemory.add(sessionId, message);
sessionLastAccess.put(sessionId, Instant.now());
}
@Scheduled(fixedRate = 3600000) // Every hour
private void startCleanupTask() {
Instant cutoff = Instant.now().minus(sessionTtl);
sessionLastAccess.entrySet().removeIf(entry -> {
if (entry.getValue().isBefore(cutoff)) {
chatMemory.clear(entry.getKey());
return true;
}
return false;
});
}
}When vector store index becomes corrupted.
Vector store returns inconsistent or no results.
@Service
public class VectorStoreHealthCheck {
private final VectorStore vectorStore;
@Scheduled(cron = "0 0 2 * * *") // Daily at 2 AM
public void healthCheck() {
try {
// Test basic operations
Document testDoc = new Document("test");
testDoc.setId("health-check-" + UUID.randomUUID());
vectorStore.add(List.of(testDoc));
List<Document> results = vectorStore.similaritySearch(
SearchRequest.query("test").withTopK(1)
);
vectorStore.delete(List.of(testDoc.getId()));
if (results.isEmpty()) {
log.error("Vector store health check failed: no results");
alerting.sendAlert("Vector store health check failed");
}
} catch (Exception e) {
log.error("Vector store health check error", e);
alerting.sendAlert("Vector store error: " + e.getMessage());
}
}
public void rebuildIndex() {
// Implementation depends on vector store type
// For PGVector:
executeSql("REINDEX INDEX vector_index");
// For others, may need to recreate index
// or rebuild from source data
}
}Symptom: Expected beans not available in context.
Diagnosis:
# Enable autoconfiguration debug
logging.level.org.springframework.boot.autoconfigure=DEBUGSolution:
Symptom: 401/403 errors despite correct API key.
Solutions:
# Verify API key format (no whitespace)
spring.ai.openai.api-key=${OPENAI_API_KEY:}
# Check for provider-specific requirements
spring.ai.azure.openai.endpoint=https://RESOURCE.openai.azure.com
spring.ai.azure.openai.api-key=KEY
# Enable request logging
logging.level.org.springframework.web.client.RestTemplate=DEBUGSymptom: Schema creation errors on startup.
Solutions:
# Check database permissions
spring.ai.vectorstore.pgvector.initialize-schema=always
# Use platform-specific SQL
spring.ai.vectorstore.jdbc.platform=postgresql
spring.ai.vectorstore.jdbc.schema=classpath:custom-schema-postgresql.sql
# Disable if using migration tools
spring.ai.vectorstore.pgvector.initialize-schema=neverProblem: Embedding generation takes too long.
Solution:
@Service
public class BatchEmbeddingService {
private final EmbeddingModel embeddingModel;
public List<List<Double>> embedBatch(List<String> texts) {
// Batch embed instead of one-by-one
EmbeddingResponse response = embeddingModel.embedForResponse(texts);
return response.getResults().stream()
.map(result -> result.getOutput())
.collect(Collectors.toList());
}
}Problem: Vector search queries are slow.
Solutions:
# Use HNSW index for better performance
spring.ai.vectorstore.pgvector.index-type=HNSW
# Reduce dimensions if possible
spring.ai.vectorstore.milvus.embedding-dimension=768
# Use similarity threshold to limit results
# In code:
# SearchRequest.query(text)
# .withTopK(10)
# .withSimilarityThreshold(0.7)