AWS Bedrock integration for LangChain4j enabling Java applications to interact with various LLM providers through a unified interface
Automatic cache point placement for entirely static content.
BedrockCachePointPlacement automatically inserts cache points at predefined locations. Use this when all cached content is completely static.
public enum BedrockCachePointPlacement {
AFTER_SYSTEM, // Cache after system messages
AFTER_USER_MESSAGE, // Cache after first user message
AFTER_TOOLS // Cache after tool definitions
}Important: Only works with standard SystemMessage. For BedrockSystemMessage, use granular caching instead.
Cache the system message content.
BedrockChatRequestParameters params = BedrockChatRequestParameters.builder()
.promptCaching(BedrockCachePointPlacement.AFTER_SYSTEM)
.build();
BedrockChatModel model = BedrockChatModel.builder()
.modelId("anthropic.claude-3-5-sonnet-20241022-v2:0")
.defaultRequestParameters(params)
.build();
String systemPrompt = loadLargeStaticPrompt(); // >1024 tokens
ChatResponse response = model.chat(ChatRequest.builder()
.messages(
SystemMessage.from(systemPrompt),
UserMessage.from("Question")
)
.build());Use when: You have a large, static system prompt that doesn't change between requests.
Cache system message plus the first user message.
BedrockChatRequestParameters params = BedrockChatRequestParameters.builder()
.promptCaching(BedrockCachePointPlacement.AFTER_USER_MESSAGE)
.build();
ChatResponse response = model.chat(ChatRequest.builder()
.messages(
SystemMessage.from("Large system context..."),
UserMessage.from("Initial user context...") // Also cached
)
.build());Use when: You have static system instructions plus an initial context message that's consistent across requests.
Cache tool definitions.
import dev.langchain4j.agent.tool.Tool;
class Calculator {
@Tool("Add two numbers")
int add(int a, int b) { return a + b; }
@Tool("Multiply two numbers")
int multiply(int a, int b) { return a * b; }
}
BedrockChatRequestParameters params = BedrockChatRequestParameters.builder()
.promptCaching(BedrockCachePointPlacement.AFTER_TOOLS)
.build();
ChatResponse response = model.chat(ChatRequest.builder()
.messages(UserMessage.from("What is 5 + 3?"))
.toolSpecifications(toolsFrom(new Calculator())) // Tool defs cached
.build());Use when: You have static tool definitions that don't change between requests.
Only one placement can be active at a time. To cache multiple sections, use granular caching.
import dev.langchain4j.model.bedrock.BedrockTokenUsage;
// First request
ChatResponse response1 = model.chat(request);
BedrockTokenUsage usage1 = (BedrockTokenUsage) response1.tokenUsage();
System.out.println("Cache write: " + usage1.cacheWriteInputTokens());
// Second request (within 5 minutes)
ChatResponse response2 = model.chat(request);
BedrockTokenUsage usage2 = (BedrockTokenUsage) response2.tokenUsage();
System.out.println("Cache read: " + usage2.cacheReadInputTokens());Next: Granular Caching for mixed static/dynamic content