CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-bedrock

AWS Bedrock integration for LangChain4j enabling Java applications to interact with various LLM providers through a unified interface

Overview
Eval results
Files

overview.mddocs/features/prompt-caching/

Prompt Caching

AWS Bedrock prompt caching reduces latency and costs by caching frequently used content.

Benefits

  • Cost savings: Cache reads are 90% cheaper than regular input tokens
  • Latency reduction: Cached content processed faster
  • Easy integration: Two approaches - simple and granular

AWS Requirements

  • Minimum tokens: ~1,024 tokens required for cache activation
  • Cache TTL: 5 minutes (resets on each hit)
  • Maximum cache points: 4 per request
  • Supported models: Claude 3.x and Amazon Nova only
  • Cost model: Cache writes = normal input cost + write cost; Cache reads = 10% of normal cost

Two Approaches

1. Simple Placement (BedrockCachePointPlacement)

Automatic cache point insertion at predefined locations. Best for entirely static content.

public enum BedrockCachePointPlacement {
    AFTER_SYSTEM,        // After system messages
    AFTER_USER_MESSAGE,  // After first user message
    AFTER_TOOLS         // After tool definitions
}

Use when:

  • All cached content is completely static
  • Using standard SystemMessage instances
  • Want simple, automatic configuration

→ Simple Caching Guide

2. Granular Control (BedrockSystemMessage)

Fine-grained control over which content blocks are cached. Best for mixed static/dynamic content.

public class BedrockSystemMessage implements ChatMessage {
    public static final int MAX_CONTENT_BLOCKS = 10;
    public static final int MAX_CACHE_POINTS = 4;

    public List<BedrockSystemContent> contents();
    public boolean hasCachePoints();
    public int cachePointCount();

    public static Builder builder();
    public static BedrockSystemMessage from(String text);
}

Use when:

  • Mixing static (cacheable) and dynamic (non-cacheable) content
  • Need fine-grained control over cache boundaries
  • Want to minimize cache invalidation
  • Have multiple static sections separated by dynamic content

→ Granular Caching Guide

Quick Example: Simple Caching

import dev.langchain4j.model.bedrock.BedrockChatModel;
import dev.langchain4j.model.bedrock.BedrockChatRequestParameters;
import dev.langchain4j.model.bedrock.BedrockCachePointPlacement;

BedrockChatRequestParameters params = BedrockChatRequestParameters.builder()
    .promptCaching(BedrockCachePointPlacement.AFTER_SYSTEM)
    .build();

BedrockChatModel model = BedrockChatModel.builder()
    .modelId("anthropic.claude-3-5-sonnet-20241022-v2:0")
    .defaultRequestParameters(params)
    .build();

// Large system prompt (>1024 tokens)
String systemPrompt = loadLargePrompt();

// First request: cache write
ChatResponse response1 = model.chat(ChatRequest.builder()
    .messages(SystemMessage.from(systemPrompt), UserMessage.from("Question 1"))
    .build());

// Second request: cache hit (within 5 minutes)
ChatResponse response2 = model.chat(ChatRequest.builder()
    .messages(SystemMessage.from(systemPrompt), UserMessage.from("Question 2"))
    .build());

Quick Example: Granular Caching

import dev.langchain4j.model.bedrock.BedrockSystemMessage;

// Mix static and dynamic content
BedrockSystemMessage message = BedrockSystemMessage.builder()
    .addTextWithCachePoint(loadStaticKnowledgeBase())  // Static: cache it
    .addText("Current date: " + LocalDate.now())       // Dynamic: don't cache
    .addTextWithCachePoint(loadStaticInstructions())   // Static: cache it
    .build();

ChatResponse response = model.chat(ChatRequest.builder()
    .messages(message, UserMessage.from("Question"))
    .build());

Monitoring Cache Usage

import dev.langchain4j.model.bedrock.BedrockTokenUsage;

ChatResponse response = model.chat(request);

if (response.tokenUsage() instanceof BedrockTokenUsage usage) {
    Integer cacheWrite = usage.cacheWriteInputTokens();
    Integer cacheRead = usage.cacheReadInputTokens();

    if (cacheWrite != null) {
        System.out.println("Cache write: " + cacheWrite + " tokens");
    }
    if (cacheRead != null) {
        System.out.println("Cache hit: " + cacheRead + " tokens");
    }
}

Next Steps

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-bedrock

docs

index.md

README.md

tile.json