CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-anthropic

This package provides an integration layer between the LangChain4j framework and Anthropic's Claude language models, enabling Java developers to seamlessly incorporate Anthropic's AI capabilities into their applications.

Overview
Eval results
Files

response-metadata.mddocs/

Response Metadata

The langchain4j-anthropic integration provides detailed response metadata including token usage with caching information and raw HTTP responses.

Capabilities

AnthropicChatResponseMetadata

Response metadata with Anthropic-specific information.

package dev.langchain4j.model.anthropic;

import dev.langchain4j.model.chat.response.ChatResponseMetadata;
import dev.langchain4j.http.client.SuccessfulHttpResponse;
import dev.langchain4j.http.client.sse.ServerSentEvent;
import dev.langchain4j.model.output.FinishReason;
import java.util.List;

/**
 * Response metadata with Anthropic-specific fields.
 * Thread-safe (immutable).
 *
 * @since 1.0.0
 */
public class AnthropicChatResponseMetadata extends ChatResponseMetadata {
    /**
     * Creates new builder for response metadata.
     *
     * @return new builder instance, never null
     */
    public static Builder builder();

    /**
     * Returns Anthropic-specific token usage with cache metrics.
     *
     * @return token usage with cache info, never null
     */
    public AnthropicTokenUsage tokenUsage();

    /**
     * Returns raw HTTP response for debugging.
     * Contains status code, headers, and body.
     *
     * @return HTTP response or null if not available
     */
    public SuccessfulHttpResponse rawHttpResponse();

    /**
     * Returns raw SSE events from streaming response.
     * Only available in streaming mode.
     *
     * @return list of SSE events or null if not streaming
     */
    public List<ServerSentEvent> rawServerSentEvents();

    /**
     * Creates builder initialized with this metadata.
     *
     * @return new builder with current values, never null
     */
    public Builder toBuilder();

    /**
     * Compares metadata for equality.
     *
     * @param o other object
     * @return true if equal
     */
    public boolean equals(Object o);

    /**
     * Returns hash code.
     *
     * @return hash code
     */
    public int hashCode();

    /**
     * Returns string representation.
     *
     * @return string form
     */
    public String toString();
}

Builder

Configure response metadata.

package dev.langchain4j.model.anthropic;

import dev.langchain4j.model.chat.response.ChatResponseMetadata;
import dev.langchain4j.model.TokenUsage;
import dev.langchain4j.model.output.FinishReason;
import dev.langchain4j.http.client.SuccessfulHttpResponse;
import dev.langchain4j.http.client.sse.ServerSentEvent;
import java.util.List;

/**
 * Builder for AnthropicChatResponseMetadata.
 */
public static class Builder extends ChatResponseMetadata.Builder<Builder> {
    /**
     * Sets response ID.
     *
     * @param id message ID from API, must not be null
     * @return this builder, never null
     * @default null
     */
    public Builder id(String id);

    /**
     * Sets model name used for generation.
     *
     * @param modelName model identifier, must not be null
     * @return this builder, never null
     * @default null
     */
    public Builder modelName(String modelName);

    /**
     * Sets token usage (use AnthropicTokenUsage for cache metrics).
     *
     * @param tokenUsage token usage stats, must not be null
     * @return this builder, never null
     * @default null
     */
    public Builder tokenUsage(TokenUsage tokenUsage);

    /**
     * Sets finish reason.
     *
     * @param finishReason why generation stopped, must not be null
     * @return this builder, never null
     * @default null
     */
    public Builder finishReason(FinishReason finishReason);

    /**
     * Sets raw HTTP response.
     *
     * @param rawHttpResponse HTTP response details, may be null
     * @return this builder, never null
     * @default null
     */
    public Builder rawHttpResponse(SuccessfulHttpResponse rawHttpResponse);

    /**
     * Sets raw SSE events (streaming only).
     *
     * @param rawServerSentEvents list of SSE events, may be null
     * @return this builder, never null
     * @default null
     */
    public Builder rawServerSentEvents(List<ServerSentEvent> rawServerSentEvents);

    /**
     * Builds the metadata instance.
     *
     * @return configured metadata, never null
     */
    public AnthropicChatResponseMetadata build();
}

Basic Usage

Access response metadata from chat responses.

import dev.langchain4j.model.anthropic.AnthropicChatResponseMetadata;
import dev.langchain4j.model.chat.response.ChatResponse;

ChatResponse response = model.chat(request);

// Cast to Anthropic-specific metadata
AnthropicChatResponseMetadata metadata =
    (AnthropicChatResponseMetadata) response.metadata();

// Access basic metadata
String messageId = metadata.id();
String modelName = metadata.modelName();
FinishReason finishReason = metadata.finishReason();

System.out.println("Message ID: " + messageId);
System.out.println("Model: " + modelName);
System.out.println("Finish reason: " + finishReason);

Error Handling:

  • ClassCastException if response not from Anthropic model
  • Fields may be null (check before using)

Null Safety:

  • metadata() never returns null
  • Individual fields (id, modelName, etc.) may be null
  • tokenUsage() never null (but fields within may be null)

Token Usage with Caching

Access detailed token usage including cache metrics.

import dev.langchain4j.model.anthropic.AnthropicTokenUsage;

AnthropicChatResponseMetadata metadata =
    (AnthropicChatResponseMetadata) response.metadata();

AnthropicTokenUsage usage = metadata.tokenUsage();

// Standard token counts
Integer inputTokens = usage.inputTokenCount();
Integer outputTokens = usage.outputTokenCount();
Integer totalTokens = usage.totalTokenCount();

// Cache-specific counts
Integer cacheCreationTokens = usage.cacheCreationInputTokens();
Integer cacheReadTokens = usage.cacheReadInputTokens();

System.out.println("Input tokens: " + inputTokens);
System.out.println("Output tokens: " + outputTokens);
System.out.println("Total tokens: " + totalTokens);

if (cacheCreationTokens != null) {
    System.out.println("Cache creation tokens: " + cacheCreationTokens);
}

if (cacheReadTokens != null) {
    System.out.println("Cache read tokens: " + cacheReadTokens);
}

Error Handling:

  • All token counts may be null (check before arithmetic)
  • totalTokenCount() calculated from input + output
  • Cache fields null if caching not enabled

Null Safety:

  • usage.inputTokenCount() may be null
  • usage.outputTokenCount() may be null
  • usage.cacheCreationInputTokens() may be null
  • usage.cacheReadInputTokens() may be null
  • Always check before using in calculations

Common Pitfalls:

❌ DON'T use null values directly

int total = usage.inputTokenCount() + usage.outputTokenCount();  // NullPointerException

✅ DO check for null

int input = usage.inputTokenCount() != null ? usage.inputTokenCount() : 0;
int output = usage.outputTokenCount() != null ? usage.outputTokenCount() : 0;
int total = input + output;

Raw HTTP Response

Access the raw HTTP response for debugging or logging.

import dev.langchain4j.http.client.SuccessfulHttpResponse;

AnthropicChatResponseMetadata metadata =
    (AnthropicChatResponseMetadata) response.metadata();

SuccessfulHttpResponse httpResponse = metadata.rawHttpResponse();

if (httpResponse != null) {
    int statusCode = httpResponse.statusCode();
    String body = httpResponse.body();
    Map<String, List<String>> headers = httpResponse.headers();

    System.out.println("Status: " + statusCode);
    System.out.println("Headers: " + headers);
    // Be careful logging body - may contain sensitive data
}

Error Handling:

  • rawHttpResponse() may return null
  • body() never null but may be empty
  • headers() never null but may be empty map

Null Safety:

  • rawHttpResponse() may be null (check first)
  • If not null, statusCode(), body(), headers() are never null

Common Pitfalls:

❌ DON'T log sensitive data

System.out.println("Body: " + httpResponse.body());  // May contain API keys, user data

✅ DO log selectively

System.out.println("Status: " + httpResponse.statusCode());
// Don't log body in production

Raw Server-Sent Events (Streaming)

Access raw SSE events from streaming responses.

import dev.langchain4j.http.client.sse.ServerSentEvent;
import java.util.List;

// In streaming mode
model.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String token) {
        System.out.print(token);
    }

    @Override
    public void onCompleteResponse(ChatResponse completeResponse) {
        AnthropicChatResponseMetadata metadata =
            (AnthropicChatResponseMetadata) completeResponse.metadata();

        List<ServerSentEvent> events = metadata.rawServerSentEvents();

        if (events != null) {
            System.out.println("\nReceived " + events.size() + " SSE events");

            for (ServerSentEvent event : events) {
                System.out.println("Event: " + event.name());
                System.out.println("  ID: " + event.id());
                System.out.println("  Data: " + event.data());
            }
        }
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Error Handling:

  • rawServerSentEvents() returns null in non-streaming mode
  • Event list never empty in streaming (at minimum has start/stop events)
  • Individual event fields may be null

Null Safety:

  • rawServerSentEvents() may be null (check first)
  • If not null, list is never null but may be empty
  • event.name() never null
  • event.id() may be null
  • event.data() never null

AnthropicTokenUsage

Token usage with cache-specific metrics.

Class Definition

package dev.langchain4j.model.anthropic;

import dev.langchain4j.model.TokenUsage;

/**
 * Token usage with Anthropic-specific cache metrics.
 * Thread-safe (immutable).
 *
 * @since 1.0.0
 */
public class AnthropicTokenUsage extends TokenUsage {
    /**
     * Creates new builder.
     *
     * @return new builder, never null
     */
    public static Builder builder();

    /**
     * Returns tokens used to create cache on first request.
     * Null if caching not enabled or cache already exists.
     *
     * @return cache creation tokens or null
     */
    public Integer cacheCreationInputTokens();

    /**
     * Returns tokens read from cache on subsequent requests.
     * Null if caching not enabled or cache not hit.
     *
     * @return cache read tokens or null
     */
    public Integer cacheReadInputTokens();

    /**
     * Adds token usage from another instance.
     * Handles null values gracefully (treats as 0).
     *
     * @param that other token usage to add, may be null
     * @return new token usage with combined counts, never null
     */
    public AnthropicTokenUsage add(TokenUsage that);

    /**
     * Returns string representation.
     *
     * @return string form
     */
    public String toString();
}

Builder

package dev.langchain4j.model.anthropic;

/**
 * Builder for AnthropicTokenUsage.
 */
public static class Builder {
    /**
     * Sets input token count (non-cached).
     *
     * @param inputTokenCount input tokens, may be null, must be >= 0 if set
     * @return this builder, never null
     * @throws IllegalArgumentException if negative
     * @default null
     */
    public Builder inputTokenCount(Integer inputTokenCount);

    /**
     * Sets output token count.
     *
     * @param outputTokenCount output tokens, may be null, must be >= 0 if set
     * @return this builder, never null
     * @throws IllegalArgumentException if negative
     * @default null
     */
    public Builder outputTokenCount(Integer outputTokenCount);

    /**
     * Sets cache creation input tokens.
     *
     * @param cacheCreationInputTokens cache write tokens, may be null, must be >= 0 if set
     * @return this builder, never null
     * @throws IllegalArgumentException if negative
     * @default null
     */
    public Builder cacheCreationInputTokens(Integer cacheCreationInputTokens);

    /**
     * Sets cache read input tokens.
     *
     * @param cacheReadInputTokens cache read tokens, may be null, must be >= 0 if set
     * @return this builder, never null
     * @throws IllegalArgumentException if negative
     * @default null
     */
    public Builder cacheReadInputTokens(Integer cacheReadInputTokens);

    /**
     * Builds token usage instance.
     *
     * @return configured token usage, never null
     */
    public AnthropicTokenUsage build();
}

Cache Metrics

Understanding cache token counts.

AnthropicTokenUsage usage = metadata.tokenUsage();

Integer cacheCreation = usage.cacheCreationInputTokens();
Integer cacheRead = usage.cacheReadInputTokens();
Integer regularInput = usage.inputTokenCount();

// First request with caching enabled
if (cacheCreation != null && cacheCreation > 0) {
    System.out.println("Created cache with " + cacheCreation + " tokens");
}

// Subsequent request using cache
if (cacheRead != null && cacheRead > 0) {
    System.out.println("Read " + cacheRead + " tokens from cache");
    System.out.println("Saved " + cacheRead + " input tokens!");
}

// Regular input tokens (not cached)
System.out.println("Regular input: " + regularInput + " tokens");

Cache Token Semantics:

  • cacheCreationInputTokens: Tokens written to cache (first request only)

    • Costs ~25% more than regular input tokens
    • Only non-zero on first request with new cache key
    • Null if caching not enabled
  • cacheReadInputTokens: Tokens read from cache (subsequent requests)

    • Costs ~90% less than regular input tokens
    • Non-zero on requests hitting existing cache
    • Null if cache miss or caching not enabled
  • inputTokenCount: Regular input tokens (not cached)

    • Always present for normal inputs
    • Does not include cached portions

Cache Lifecycle:

  • Request 1: cacheCreation > 0, cacheRead = null (creating cache)
  • Request 2-N: cacheCreation = null, cacheRead > 0 (using cache, within TTL)
  • After TTL: cacheCreation > 0 again (cache expired, recreating)

Aggregating Token Usage

Add token usage across multiple requests.

import dev.langchain4j.model.anthropic.AnthropicTokenUsage;

AnthropicTokenUsage total = AnthropicTokenUsage.builder()
    .inputTokenCount(0)
    .outputTokenCount(0)
    .cacheCreationInputTokens(0)
    .cacheReadInputTokens(0)
    .build();

for (ChatResponse response : responses) {
    AnthropicChatResponseMetadata metadata =
        (AnthropicChatResponseMetadata) response.metadata();
    total = total.add(metadata.tokenUsage());
}

System.out.println("Total usage: " + total);
System.out.println("Total input: " + total.inputTokenCount());
System.out.println("Total output: " + total.outputTokenCount());
System.out.println("Total cache creation: " + total.cacheCreationInputTokens());
System.out.println("Total cache reads: " + total.cacheReadInputTokens());

Error Handling:

  • add() handles null values gracefully (treats as 0)
  • Result always has non-null builder-set values
  • Use 0 as initial values for accurate aggregation

Common Pitfalls:

❌ DON'T start with null values

AnthropicTokenUsage total = AnthropicTokenUsage.builder().build();
// Fields are null, arithmetic will fail

✅ DO initialize with zeros

AnthropicTokenUsage total = AnthropicTokenUsage.builder()
    .inputTokenCount(0)
    .outputTokenCount(0)
    .cacheCreationInputTokens(0)
    .cacheReadInputTokens(0)
    .build();

Cost Calculation with Caching

Calculate costs considering cache discounts.

// Example pricing (check current Anthropic pricing)
double inputPricePer1k = 0.003;        // $0.003 per 1K input tokens
double outputPricePer1k = 0.015;       // $0.015 per 1K output tokens
double cacheWritePricePer1k = 0.0037;  // $0.0037 per 1K cache write tokens (25% premium)
double cacheReadPricePer1k = 0.0003;   // $0.0003 per 1K cache read tokens (90% discount)

AnthropicTokenUsage usage = metadata.tokenUsage();

double inputCost = (usage.inputTokenCount() != null ? usage.inputTokenCount() : 0) / 1000.0 * inputPricePer1k;
double outputCost = (usage.outputTokenCount() != null ? usage.outputTokenCount() : 0) / 1000.0 * outputPricePer1k;

double cacheWriteCost = 0;
if (usage.cacheCreationInputTokens() != null) {
    cacheWriteCost = (usage.cacheCreationInputTokens() / 1000.0) * cacheWritePricePer1k;
}

double cacheReadCost = 0;
if (usage.cacheReadInputTokens() != null) {
    cacheReadCost = (usage.cacheReadInputTokens() / 1000.0) * cacheReadPricePer1k;
}

double totalCost = inputCost + outputCost + cacheWriteCost + cacheReadCost;

System.out.println("Cost breakdown:");
System.out.println("  Input: $" + String.format("%.4f", inputCost));
System.out.println("  Output: $" + String.format("%.4f", outputCost));
System.out.println("  Cache write: $" + String.format("%.4f", cacheWriteCost));
System.out.println("  Cache read: $" + String.format("%.4f", cacheReadCost));
System.out.println("  Total: $" + String.format("%.4f", totalCost));

// Calculate savings from cache
if (usage.cacheReadInputTokens() != null && usage.cacheReadInputTokens() > 0) {
    double savingsWithoutCache = (usage.cacheReadInputTokens() / 1000.0) * inputPricePer1k;
    double savings = savingsWithoutCache - cacheReadCost;
    System.out.println("  Saved: $" + String.format("%.4f", savings));
}

Pricing Notes (as of 2025-02-27):

  • Prices vary by model tier
  • Check Anthropic's pricing page for current rates
  • Cache costs calculated separately from regular tokens
  • Break-even typically 2-3 requests for same cached content

Common Pitfalls:

❌ DON'T use outdated pricing

double price = 0.001;  // May be stale

✅ DO document pricing date and source

// Pricing as of 2025-02-27 from https://anthropic.com/pricing
double inputPrice = getCurrentPricing();

Extended Thinking in Responses

When extended thinking is enabled, Claude can include reasoning text in responses. Access thinking content through the AiMessage.

Accessing Thinking Content

import dev.langchain4j.data.message.AiMessage;

// Configure model with thinking enabled
AnthropicChatModel model = AnthropicChatModel.builder()
    .apiKey(apiKey)
    .modelName(AnthropicChatModelName.CLAUDE_OPUS_4_5_20251101)
    .thinkingType("enabled")
    .thinkingBudgetTokens(5000)
    .returnThinking(true)  // Must be true to receive thinking text
    .build();

ChatResponse response = model.chat(request);
AiMessage aiMessage = response.aiMessage();

// Get thinking content
String thinkingText = aiMessage.thinking();
if (thinkingText != null) {
    System.out.println("Claude's reasoning:");
    System.out.println(thinkingText);
}

// Get final answer
String answer = aiMessage.text();
System.out.println("\nFinal answer:");
System.out.println(answer);

Error Handling:

  • thinking() returns null if not enabled or not supported
  • No exception thrown; always check for null
  • text() never null (may be empty)

Null Safety:

  • thinking() may be null
  • text() never null
  • Check thinking() != null before using

Thinking in Streaming Mode

model.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialThinking(PartialThinking thinking) {
        // Thinking text streamed as it's generated
        System.out.print(thinking.text());
    }

    @Override
    public void onPartialResponse(String token) {
        // Answer text streamed after thinking completes
        System.out.print(token);
    }

    @Override
    public void onCompleteResponse(ChatResponse completeResponse) {
        // Full thinking available in final response
        String fullThinking = completeResponse.aiMessage().thinking();
        System.out.println("\n\nComplete thinking: " + fullThinking);
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Thinking Signature

Some thinking content includes a cryptographic signature for verification (advanced feature).

AiMessage aiMessage = response.aiMessage();

// Check if thinking has a signature
String thinking = aiMessage.thinking();
// Signature is embedded in the thinking content metadata
// Details about signature format are model-specific

Redacted Thinking

In some cases, thinking content may be redacted or unavailable.

AiMessage aiMessage = response.aiMessage();
String thinking = aiMessage.thinking();

if (thinking == null) {
    System.out.println("Thinking not available (may be redacted or disabled)");
} else if (thinking.contains("[REDACTED]")) {
    System.out.println("Thinking was partially redacted");
}

Finish Reasons

Understanding why the model stopped generating.

package dev.langchain4j.model.output;

/**
 * Reason why model stopped generating.
 */
public enum FinishReason {
    /** Model completed naturally (reached end of thought) */
    STOP,

    /** Response truncated due to max token limit */
    LENGTH,

    /** Model wants to execute a tool */
    TOOL_EXECUTION,

    /** Content filtered by safety systems */
    CONTENT_FILTER,

    /** Other/unknown reason */
    OTHER;
}

Usage Example:

import dev.langchain4j.model.output.FinishReason;

AnthropicChatResponseMetadata metadata =
    (AnthropicChatResponseMetadata) response.metadata();

FinishReason reason = metadata.finishReason();

switch (reason) {
    case STOP:
        System.out.println("Model completed naturally");
        break;
    case LENGTH:
        System.out.println("Response truncated (max tokens reached)");
        break;
    case TOOL_EXECUTION:
        System.out.println("Model wants to call a tool");
        break;
    case CONTENT_FILTER:
        System.out.println("Content filtered");
        break;
    default:
        System.out.println("Other reason: " + reason);
}

Finish Reason Meanings:

  • STOP: Normal completion, model reached natural endpoint
  • LENGTH: Truncated due to maxTokens limit (increase limit if needed)
  • TOOL_EXECUTION: Model requesting tool execution (not an error)
  • CONTENT_FILTER: Safety filter triggered (content policy violation)
  • OTHER: Unexpected reason (may indicate API issue)

Error Handling:

  • finishReason() may be null (check before switch)
  • LENGTH indicates incomplete response (consider increasing maxTokens)
  • CONTENT_FILTER may require prompt adjustment

Types

ChatResponseMetadata (base class)

package dev.langchain4j.model.chat.response;

import dev.langchain4j.model.TokenUsage;
import dev.langchain4j.model.output.FinishReason;

/**
 * Base class for response metadata.
 */
public abstract class ChatResponseMetadata {
    /**
     * Returns message ID.
     *
     * @return ID or null
     */
    public String id();

    /**
     * Returns model name.
     *
     * @return model identifier or null
     */
    public String modelName();

    /**
     * Returns token usage.
     *
     * @return usage stats, never null
     */
    public TokenUsage tokenUsage();

    /**
     * Returns finish reason.
     *
     * @return reason or null
     */
    public FinishReason finishReason();
}

SuccessfulHttpResponse

package dev.langchain4j.http.client;

import java.util.List;
import java.util.Map;

/**
 * Successful HTTP response details.
 */
public class SuccessfulHttpResponse {
    /**
     * Returns HTTP status code.
     *
     * @return status code (e.g., 200), always >= 200 and < 300
     */
    public int statusCode();

    /**
     * Returns response body.
     *
     * @return body string, never null (may be empty)
     */
    public String body();

    /**
     * Returns response headers.
     *
     * @return headers map (header name -> list of values), never null
     */
    public Map<String, List<String>> headers();
}

ServerSentEvent

package dev.langchain4j.http.client.sse;

/**
 * Server-sent event from streaming response.
 */
public class ServerSentEvent {
    /**
     * Returns event type name.
     *
     * @return event name (e.g., "message_start", "content_block_delta"), never null
     */
    public String name();

    /**
     * Returns event data payload.
     *
     * @return data string (typically JSON), never null
     */
    public String data();

    /**
     * Returns event ID if present.
     *
     * @return event ID or null
     */
    public String id();
}

Usage Patterns

Monitoring Token Usage

Track token usage across conversations.

public class TokenMonitor {
    private int totalInputTokens = 0;
    private int totalOutputTokens = 0;
    private int totalCacheReads = 0;

    public void recordUsage(ChatResponse response) {
        AnthropicChatResponseMetadata metadata =
            (AnthropicChatResponseMetadata) response.metadata();
        AnthropicTokenUsage usage = metadata.tokenUsage();

        totalInputTokens += usage.inputTokenCount() != null ? usage.inputTokenCount() : 0;
        totalOutputTokens += usage.outputTokenCount() != null ? usage.outputTokenCount() : 0;

        if (usage.cacheReadInputTokens() != null) {
            totalCacheReads += usage.cacheReadInputTokens();
        }
    }

    public void printStats() {
        System.out.println("Total input tokens: " + totalInputTokens);
        System.out.println("Total output tokens: " + totalOutputTokens);
        System.out.println("Total cache reads: " + totalCacheReads);
        System.out.println("Total tokens: " + (totalInputTokens + totalOutputTokens));
    }
}

Debugging with Raw Responses

Log raw responses for debugging.

public void debugResponse(ChatResponse response) {
    AnthropicChatResponseMetadata metadata =
        (AnthropicChatResponseMetadata) response.metadata();

    System.out.println("=== Response Debug Info ===");
    System.out.println("Message ID: " + metadata.id());
    System.out.println("Model: " + metadata.modelName());
    System.out.println("Finish reason: " + metadata.finishReason());

    SuccessfulHttpResponse httpResponse = metadata.rawHttpResponse();
    if (httpResponse != null) {
        System.out.println("HTTP Status: " + httpResponse.statusCode());
        System.out.println("Response headers:");
        httpResponse.headers().forEach((key, values) ->
            System.out.println("  " + key + ": " + values)
        );
    }

    List<ServerSentEvent> events = metadata.rawServerSentEvents();
    if (events != null) {
        System.out.println("SSE Events: " + events.size());
    }
}

Alerting on High Token Usage

Set up alerts for high token usage.

public void checkTokenLimits(ChatResponse response) {
    AnthropicChatResponseMetadata metadata =
        (AnthropicChatResponseMetadata) response.metadata();
    AnthropicTokenUsage usage = metadata.tokenUsage();

    Integer totalTokens = usage.totalTokenCount();
    int maxTokens = 100000;  // Alert threshold

    if (totalTokens != null && totalTokens > maxTokens) {
        System.err.println("WARNING: High token usage detected!");
        System.err.println("Tokens used: " + totalTokens);
        System.err.println("Message ID: " + metadata.id());
    }

    if (metadata.finishReason() == FinishReason.LENGTH) {
        System.err.println("WARNING: Response truncated due to token limit!");
    }
}

Notes

  • AnthropicChatResponseMetadata extends ChatResponseMetadata from langchain4j-core
  • Cache metrics are only present when caching is enabled
  • Raw HTTP response is available for both streaming and non-streaming modes
  • Raw SSE events are only available in streaming mode
  • cacheCreationInputTokens represents tokens written to cache (first request)
  • cacheReadInputTokens represents tokens read from cache (subsequent requests)
  • Token usage can be aggregated across requests using the add() method
  • Finish reason helps determine if response was complete or truncated
  • Be careful logging raw responses as they may contain sensitive data
  • Extended thinking requires returnThinking(true) in model configuration
  • Thinking text is accessed via AiMessage.thinking(), not through metadata
  • Thinking content may include signatures for verification (model-specific)
  • Redacted thinking appears when content is filtered or unavailable
  • Thinking tokens count toward the thinking budget configured via thinkingBudgetTokens()

Thread Safety:

  • AnthropicChatResponseMetadata is immutable and thread-safe
  • AnthropicTokenUsage is immutable and thread-safe
  • Safe for concurrent access from multiple threads

Resource Lifecycle:

  • Metadata objects are lightweight (no resource management needed)
  • Raw HTTP response body held in memory (consider for very large responses)
  • SSE events list held in memory (typically small)
  • No explicit cleanup required

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-anthropic@1.11.0

docs

chat-model.md

content-types.md

index.md

model-catalog.md

model-names.md

response-metadata.md

streaming-chat-model.md

token-count-estimator.md

tools.md

tile.json