CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/maven-org-springframework-ai--spring-ai-autoconfigure-model-chat-memory

Spring Boot auto-configuration for chat memory functionality in Spring AI applications

Overview
Eval results
Files

performance.mddocs/reference/

Performance Considerations

Performance optimization guidance for Spring AI Chat Memory.

Memory Usage

InMemoryChatMemoryRepository

  • Memory Growth: Linear with number of conversations × messages per conversation
  • Calculation: Approximate memory per message = 1-5 KB (depending on content size)
  • Example: 1000 conversations × 20 messages × 2 KB = ~40 MB

Best Practice: Use persistent repository for production to avoid memory constraints.

MessageWindowChatMemory

  • Window Size Impact: Directly affects memory per conversation
  • Recommendation:
    • Small (10-20): Quick interactions, minimal context
    • Medium (50-100): Standard conversations
    • Large (200+): Extended sessions, requires more memory

Database Performance

JDBC Repository

Optimization Tips:

  1. Index conversation_id column:

    CREATE INDEX idx_conversation_id ON chat_memory(conversation_id);
  2. Connection Pooling:

    spring.datasource.hikari.maximum-pool-size=20
    spring.datasource.hikari.minimum-idle=5
  3. Batch Operations: Use add(conversationId, List<Message>) instead of multiple single adds

  4. Partitioning: For very large datasets, consider table partitioning by conversation_id

Query Performance:

  • Single conversation retrieval: O(n) where n = messages in conversation
  • All conversation IDs: O(c) where c = total conversations
  • Delete conversation: O(n) where n = messages in conversation

MongoDB Repository

Optimization Tips:

  1. Enable Indexes:

    spring.ai.chat.memory.repository.mongo.create-indices=true
  2. TTL for Automatic Cleanup:

    spring.ai.chat.memory.repository.mongo.ttl=PT24H
  3. Write Concern: Adjust based on durability vs. performance needs

    mongoTemplate.setWriteConcern(WriteConcern.ACKNOWLEDGED);
  4. Read Preference: Use secondary reads for non-critical queries

Performance Characteristics:

  • Insert: O(1)
  • Find by conversation ID: O(1) with index
  • TTL cleanup: Automatic background process

Cassandra Repository

Optimization Tips:

  1. Partition Key Design: conversation_id as partition key provides optimal distribution

  2. Consistency Level: Adjust based on requirements

    .withConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM)
  3. TTL at Table Level:

    spring.ai.chat.memory.repository.cassandra.time-to-live=P30D

Performance Characteristics:

  • Write throughput: Very high (distributed writes)
  • Read latency: Low (partition-local reads)
  • Scalability: Horizontal (add more nodes)

Neo4j Repository

Optimization Tips:

  1. Indexes: Automatically created on session labels
  2. Cypher Query Optimization: Use EXPLAIN to analyze queries
  3. Connection Pooling: Configure driver connection pool

Performance Characteristics:

  • Graph traversal: Excellent for relationship queries
  • Simple CRUD: Moderate overhead compared to document stores
  • Best for: Complex relationship queries

Cosmos DB Repository

Optimization Tips:

  1. Partition Key: conversation_id provides good distribution
  2. Request Units (RUs): Monitor and adjust provisioned throughput
  3. Consistency Level: Choose based on requirements
    .consistencyLevel(ConsistencyLevel.SESSION)

Performance Characteristics:

  • Global distribution: Low latency worldwide
  • Scalability: Automatic (with cost implications)
  • SLA guarantees: 99.999% availability

Batch Operations

Efficient Message Addition

Bad Practice (Multiple network calls):

for (Message message : messages) {
    chatMemory.add(conversationId, message);  // N network calls
}

Good Practice (Single network call):

chatMemory.add(conversationId, messages);  // 1 network call

Performance Impact: 10-100x faster for batch operations

Concurrent Access

Thread Safety

All repository implementations should be thread-safe:

// Safe for concurrent access
conversationIds.parallelStream()
    .forEach(id -> chatMemory.add(id, new UserMessage("test")));

Lock Contention

For high-concurrency scenarios, consider conversation-level locking:

private final ConcurrentHashMap<String, ReentrantLock> locks = new ConcurrentHashMap<>();

public void addWithLock(String conversationId, Message message) {
    ReentrantLock lock = locks.computeIfAbsent(conversationId, k -> new ReentrantLock());
    lock.lock();
    try {
        chatMemory.add(conversationId, message);
    } finally {
        lock.unlock();
    }
}

Cleanup Strategies

Periodic Cleanup

@Scheduled(cron = "0 0 2 * * *")  // Daily at 2 AM
public void cleanupOldConversations() {
    List<String> ids = repository.findConversationIds();
    Instant cutoff = Instant.now().minus(Duration.ofDays(30));
    
    ids.parallelStream()
        .filter(id -> isOlderThan(id, cutoff))
        .forEach(repository::deleteByConversationId);
}

TTL-Based Cleanup

MongoDB:

spring.ai.chat.memory.repository.mongo.ttl=P7D

Cassandra:

spring.ai.chat.memory.repository.cassandra.time-to-live=P30D

Caching Strategies

Application-Level Caching

@Service
public class CachedChatMemoryService {
    
    @Autowired
    private ChatMemory chatMemory;
    
    private final LoadingCache<String, List<Message>> cache = Caffeine.newBuilder()
        .maximumSize(1000)
        .expireAfterWrite(Duration.ofMinutes(10))
        .build(conversationId -> chatMemory.get(conversationId));
    
    public List<Message> getCached(String conversationId) {
        return cache.get(conversationId);
    }
    
    public void invalidate(String conversationId) {
        cache.invalidate(conversationId);
    }
}

When to Use:

  • Read-heavy workloads
  • Frequently accessed conversations
  • Acceptable staleness window

When NOT to Use:

  • Write-heavy workloads
  • Real-time requirements
  • Multiple application instances (cache coherence issues)

Monitoring

Key Metrics to Track

  1. Conversation Count: repository.findConversationIds().size()
  2. Average Messages per Conversation: Monitor memory usage
  3. Repository Response Time: Track query latency
  4. Error Rate: Failed operations
  5. Memory Usage: Heap usage for in-memory repository

Spring Boot Actuator

@Component
public class ChatMemoryMetrics {
    
    @Autowired
    private ChatMemoryRepository repository;
    
    @Autowired
    private MeterRegistry meterRegistry;
    
    @Scheduled(fixedRate = 60000)
    public void recordMetrics() {
        int conversationCount = repository.findConversationIds().size();
        meterRegistry.gauge("chat.memory.conversations", conversationCount);
    }
}

Benchmarks

Typical Performance (approximate)

OperationIn-MemoryJDBCMongoDBCassandraNeo4jCosmos DB
Add Message< 1ms5-10ms5-10ms2-5ms10-20ms10-30ms
Get Messages< 1ms5-15ms5-10ms5-10ms15-30ms10-30ms
Clear Conversation< 1ms10-20ms5-10ms5-10ms20-40ms10-30ms

Note: Actual performance varies based on:

  • Network latency
  • Database load
  • Message size
  • Number of messages
  • Hardware specifications

Optimization Checklist

  • Use persistent repository for production
  • Set appropriate window size for use case
  • Enable database indexes
  • Configure connection pooling
  • Use batch operations where possible
  • Implement TTL or periodic cleanup
  • Monitor key metrics
  • Test under expected load
  • Consider caching for read-heavy workloads
  • Profile and optimize hot paths

Next Steps

  • Troubleshooting - Common issues and solutions
  • Configuration Guide - Detailed configuration
  • API Reference - Complete API documentation
tessl i tessl/maven-org-springframework-ai--spring-ai-autoconfigure-model-chat-memory@1.1.0

docs

index.md

tile.json