CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j

Build LLM-powered applications in Java with support for chatbots, agents, RAG, tools, and much more

Overview
Eval results
Files

request-response.mddocs/

Chat Requests and Responses

Request and response types for chat model interactions. These classes provide structured interfaces for configuring LLM requests and handling responses, including streaming, token usage tracking, and completion reasons.

Capabilities

ChatRequest

Request object containing all inputs for a chat model call, including messages, model parameters, and tool specifications.

package dev.langchain4j.model.chat.request;

import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.agent.tool.ToolSpecification;
import java.util.List;

/**
 * Request object containing all inputs for a chat model call
 */
public class ChatRequest {
    /**
     * Get messages in the conversation
     * @return List of chat messages
     */
    public List<ChatMessage> messages();

    /**
     * Get request parameters
     * @return Chat request parameters
     */
    public ChatRequestParameters parameters();

    /**
     * Get model name
     * @return Model name
     */
    public String modelName();

    /**
     * Get temperature setting
     * @return Temperature value
     */
    public Double temperature();

    /**
     * Get topP setting
     * @return TopP value
     */
    public Double topP();

    /**
     * Get topK setting
     * @return TopK value
     */
    public Integer topK();

    /**
     * Get frequency penalty
     * @return Frequency penalty value
     */
    public Double frequencyPenalty();

    /**
     * Get presence penalty
     * @return Presence penalty value
     */
    public Double presencePenalty();

    /**
     * Get max output tokens
     * @return Max output tokens
     */
    public Integer maxOutputTokens();

    /**
     * Get stop sequences
     * @return List of stop sequences
     */
    public List<String> stopSequences();

    /**
     * Get tool specifications
     * @return List of tool specifications
     */
    public List<ToolSpecification> toolSpecifications();

    /**
     * Get tool choice setting
     * @return Tool choice
     */
    public ToolChoice toolChoice();

    /**
     * Get response format
     * @return Response format
     */
    public ResponseFormat responseFormat();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety

  • ChatRequest instances are immutable and thread-safe. Once built, they can be safely shared across threads.
  • Multiple threads can read from the same ChatRequest instance concurrently without synchronization.
  • The builder pattern is not thread-safe. Each thread should create its own builder instance or synchronize externally.

Common Pitfalls

  • Empty messages list: Do NOT pass an empty list or null to messages(). At least one message is required for chat completion.
  • Temperature range: While the API accepts values > 1.0, most providers expect 0.0-2.0. Values outside this range may be clamped or cause errors depending on the provider.
  • TopP and TopK conflicts: Do NOT set both topP and topK simultaneously unless your provider explicitly supports it. Many providers ignore one when both are present.
  • MaxOutputTokens must be positive: Passing 0 or negative values will cause validation errors or be rejected by the provider.
  • Stop sequences limitations: Most providers limit the number of stop sequences (typically 4-6). Exceeding this limit will cause request failures.
  • Tool specifications without tool choice: If you provide toolSpecifications, ensure toolChoice is set appropriately (typically AUTO or REQUIRED), otherwise tools may be ignored.

Edge Cases

  • Null parameter handling: Unset optional parameters (null values) fall back to model-level defaults or provider defaults.
  • Model name override: Setting modelName in the request overrides the model name configured at the chat model instance level.
  • Parameters object precedence: If both the parameters() object and individual parameter methods are set, individual parameters take precedence.
  • Large message histories: Requests with extensive message histories may exceed provider context limits, resulting in errors or automatic truncation depending on the provider implementation.
  • Unicode in stop sequences: Stop sequences with special Unicode characters may behave unexpectedly across providers. Test thoroughly if using non-ASCII characters.

Performance Notes

  • Temperature impact: Higher temperatures (0.7-1.0) increase response variability and latency slightly due to sampling computation.
  • MaxOutputTokens: Lower values reduce latency proportionally. Each token adds ~1-5ms depending on model size and hardware.
  • Tool specifications overhead: Each tool specification adds to the prompt length and context processing time. Minimize tool definitions to essential ones.
  • TopP vs TopK: TopP (nucleus sampling) is generally faster than TopK for large vocabularies.
  • Request serialization: Complex requests with many messages or tools increase serialization time. Reuse builders when possible.

Cost Considerations

  • Input tokens: Every ChatMessage in the request consumes input tokens. Message history accumulates quickly in multi-turn conversations.
  • Tool specifications token cost: Each ToolSpecification consumes input tokens (typically 50-200 tokens per tool depending on description complexity).
  • System message overhead: System messages are included in input token count and persist across all turns.
  • MaxOutputTokens budget: Set this conservatively to control costs. Overly generous limits can result in expensive responses.
  • Stop sequences: Using stop sequences can reduce costs by terminating generation early, but may truncate useful content.

Exception Handling

  • IllegalArgumentException: Thrown when required fields are missing during build() (e.g., null or empty messages list).
  • IllegalStateException: Thrown when builder is in an invalid state (rarely, usually due to internal consistency checks).
  • Provider-specific exceptions: Downstream providers may throw their own validation exceptions for parameter constraints (e.g., temperature out of range, unsupported tool choice values).

Related APIs

  • ChatRequestParameters - Interface for configuring default parameters
  • ChatMessage - Message types for conversation history (see messages.md)
  • ToolSpecification - Tool definition for function calling (see tools.md)
  • ResponseFormat - Control response structure (TEXT/JSON)
  • ChatModel.chat() - Execute the request

ChatRequest Builder

/**
 * Builder for ChatRequest
 */
public static class Builder {
    /**
     * Set messages
     * @param messages List of chat messages
     * @return Builder instance
     */
    public Builder messages(List<ChatMessage> messages);

    /**
     * Set messages (varargs)
     * @param messages Chat messages
     * @return Builder instance
     */
    public Builder messages(ChatMessage... messages);

    /**
     * Set parameters
     * @param parameters Chat request parameters
     * @return Builder instance
     */
    public Builder parameters(ChatRequestParameters parameters);

    /**
     * Set model name
     * @param modelName Model name
     * @return Builder instance
     */
    public Builder modelName(String modelName);

    /**
     * Set temperature (0.0 to 1.0+)
     * @param temperature Temperature value
     * @return Builder instance
     */
    public Builder temperature(Double temperature);

    /**
     * Set topP (nucleus sampling)
     * @param topP TopP value
     * @return Builder instance
     */
    public Builder topP(Double topP);

    /**
     * Set topK
     * @param topK TopK value
     * @return Builder instance
     */
    public Builder topK(Integer topK);

    /**
     * Set frequency penalty
     * @param frequencyPenalty Frequency penalty value
     * @return Builder instance
     */
    public Builder frequencyPenalty(Double frequencyPenalty);

    /**
     * Set presence penalty
     * @param presencePenalty Presence penalty value
     * @return Builder instance
     */
    public Builder presencePenalty(Double presencePenalty);

    /**
     * Set max output tokens
     * @param maxOutputTokens Max output tokens
     * @return Builder instance
     */
    public Builder maxOutputTokens(Integer maxOutputTokens);

    /**
     * Set stop sequences
     * @param stopSequences List of stop sequences
     * @return Builder instance
     */
    public Builder stopSequences(List<String> stopSequences);

    /**
     * Set tool specifications
     * @param toolSpecifications List of tool specifications
     * @return Builder instance
     */
    public Builder toolSpecifications(List<ToolSpecification> toolSpecifications);

    /**
     * Set tool specifications (varargs)
     * @param toolSpecifications Tool specifications
     * @return Builder instance
     */
    public Builder toolSpecifications(ToolSpecification... toolSpecifications);

    /**
     * Set tool choice
     * @param toolChoice Tool choice
     * @return Builder instance
     */
    public Builder toolChoice(ToolChoice toolChoice);

    /**
     * Set response format
     * @param responseFormat Response format
     * @return Builder instance
     */
    public Builder responseFormat(ResponseFormat responseFormat);

    /**
     * Build the request
     * @return ChatRequest instance
     */
    public ChatRequest build();
}

Thread Safety

  • Builder is NOT thread-safe. Do not share builder instances across threads without external synchronization.
  • Create separate builder instances per thread or use synchronization wrappers.

Common Pitfalls

  • Reusing builders: After calling build(), the builder state is retained. Subsequent modifications affect the next built instance. Use toBuilder() for clean copies.
  • Null vs empty collections: Passing null to collection methods (messages, toolSpecifications, stopSequences) may result in NullPointerException. Use empty lists instead.
  • Method chaining confusion: All builder methods return the builder instance. Forgetting to call build() at the end is a common mistake.
  • Varargs overload: The varargs messages(ChatMessage...) and toolSpecifications(ToolSpecification...) methods replace the entire collection, not append to it.

Edge Cases

  • Multiple calls to same setter: Last call wins. For example, calling temperature(0.5).temperature(0.9) results in temperature = 0.9.
  • Null parameter clearing: Setting a parameter to null explicitly removes it, falling back to defaults.
  • Builder from existing request: Use chatRequest.toBuilder() to create a builder pre-populated with current values for modifications.

Performance Notes

  • Builder allocation: Each builder instance allocates internal state. For high-throughput scenarios, consider object pooling or reuse patterns.
  • Collection copying: The builder copies collections defensively on build(). For large message histories, this adds overhead.
  • Validation on build: All validation occurs during build(), not during setter calls. Invalid configurations fail late.

Exception Handling

  • IllegalArgumentException: Thrown by build() when required fields are missing or invalid.
  • NullPointerException: Thrown if null is passed to methods expecting non-null values (e.g., individual message objects in varargs).

Related APIs

  • ChatRequest.toBuilder() - Create builder from existing request
  • ChatRequest.builder() - Create new builder
  • ChatRequestParameters - Alternative parameter configuration approach

ChatResponse

Response object containing outputs from a chat model call, including the AI message, token usage, and metadata.

package dev.langchain4j.model.chat.response;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.output.TokenUsage;
import dev.langchain4j.model.output.FinishReason;

/**
 * Response object containing all outputs from a chat model call
 */
public class ChatResponse {
    /**
     * Get AI-generated message
     * @return AI message
     */
    public AiMessage aiMessage();

    /**
     * Get response metadata
     * @return Chat response metadata
     */
    public ChatResponseMetadata metadata();

    /**
     * Get response ID
     * @return Response ID
     */
    public String id();

    /**
     * Get model name used
     * @return Model name
     */
    public String modelName();

    /**
     * Get token usage
     * @return Token usage
     */
    public TokenUsage tokenUsage();

    /**
     * Get finish reason
     * @return Finish reason
     */
    public FinishReason finishReason();

    /**
     * Create builder for modification
     * @return Builder with current values
     */
    public Builder toBuilder();

    /**
     * Create new builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety

  • ChatResponse instances are immutable and thread-safe. Can be safely shared across threads.
  • All getter methods are thread-safe and do not require synchronization.
  • The underlying AiMessage and TokenUsage objects are also immutable and thread-safe.

Common Pitfalls

  • Null tokenUsage: Not all providers return token usage information. Always check for null before accessing tokenUsage().
  • Null finishReason: Some providers may not populate finish reason. Handle null gracefully.
  • Response ID uniqueness: Response IDs are provider-generated and may not be unique across different model instances or time periods.
  • Model name mismatch: The returned modelName() may differ from requested model name if provider performs automatic model selection or fallback.

Edge Cases

  • Empty aiMessage text: The AiMessage may contain empty text if the response was purely tool calls or if generation was interrupted immediately.
  • Metadata availability: The metadata() object availability and contents vary by provider. Some providers return null or minimal metadata.
  • Tool calls without finish reason: When finish reason is TOOL_EXECUTION, the aiMessage may contain tool calls but little or no text content.
  • Streaming vs non-streaming responses: The final ChatResponse from streaming may have slightly different metadata than non-streaming calls to the same prompt.

Performance Notes

  • Response deserialization: ChatResponse creation involves JSON deserialization from provider API. Large responses (many tool calls, long text) increase latency.
  • TokenUsage calculation: Some providers compute token usage server-side (accurate but adds latency), others estimate client-side (fast but approximate).
  • Metadata parsing: Accessing metadata() may trigger lazy parsing of additional response fields. Cache the result if accessing multiple times.

Cost Considerations

  • TokenUsage tracking: Always capture and log tokenUsage() for cost monitoring and budget enforcement.
  • Output token multiplier: Output tokens typically cost 2-4x more than input tokens depending on the provider.
  • Tool execution responses: Responses triggering tool execution consume output tokens for the tool call serialization, which can be substantial for complex tool arguments.

Exception Handling

  • NullPointerException risk: Always null-check optional fields (tokenUsage(), finishReason(), metadata()) before accessing their properties.
  • Provider errors: Some providers embed error information in metadata rather than throwing exceptions. Check metadata for error indicators.
  • Malformed responses: In rare cases, provider API changes may result in deserialization issues. Handle potential parsing exceptions when accessing response fields.

Related APIs

  • AiMessage - The generated message content (see messages.md)
  • TokenUsage - Token consumption tracking
  • FinishReason - Completion status enum
  • ChatResponseMetadata - Additional provider-specific metadata
  • ChatModel.chat() - Returns ChatResponse
  • StreamingChatResponseHandler.onCompleteResponse() - Receives ChatResponse in streaming

ChatResponse Builder

/**
 * Builder for ChatResponse
 */
public static class Builder {
    /**
     * Set AI message
     * @param aiMessage AI message
     * @return Builder instance
     */
    public Builder aiMessage(AiMessage aiMessage);

    /**
     * Set metadata
     * @param metadata Chat response metadata
     * @return Builder instance
     */
    public Builder metadata(ChatResponseMetadata metadata);

    /**
     * Set response ID
     * @param id Response ID
     * @return Builder instance
     */
    public Builder id(String id);

    /**
     * Set model name
     * @param modelName Model name
     * @return Builder instance
     */
    public Builder modelName(String modelName);

    /**
     * Set token usage
     * @param tokenUsage Token usage
     * @return Builder instance
     */
    public Builder tokenUsage(TokenUsage tokenUsage);

    /**
     * Set finish reason
     * @param finishReason Finish reason
     * @return Builder instance
     */
    public Builder finishReason(FinishReason finishReason);

    /**
     * Build the response
     * @return ChatResponse instance
     */
    public ChatResponse build();
}

Thread Safety

  • Builder is NOT thread-safe. Do not share builder instances across threads.
  • Each thread should create its own builder instance or use external synchronization.

Common Pitfalls

  • Missing aiMessage: The aiMessage() field is typically required. Building without it may succeed but result in unusable response objects.
  • Inconsistent metadata: Manually constructing responses with inconsistent metadata (e.g., finish reason not matching actual message state) can cause downstream logic errors.
  • Builder reuse: Like ChatRequest builder, state persists after build(). Reusing the same builder for multiple responses requires caution.

Edge Cases

  • Partial response construction: When building partial responses (e.g., for testing), some fields may be intentionally null. Ensure downstream code handles this.
  • Custom response wrapping: The builder allows creation of custom ChatResponse objects, useful for mocking or adapting non-standard provider responses.

Performance Notes

  • Minimal overhead: ChatResponse builder operations are lightweight. The primary cost is in the underlying object creation.
  • Defensive copying: The builder may perform defensive copying of mutable fields. For high-throughput scenarios, minimize builder usage.

Exception Handling

  • IllegalStateException: Thrown if builder is in an invalid state during build() (rare, primarily for internal consistency checks).
  • NullPointerException: Thrown if required fields are accessed as null during build validation.

Related APIs

  • ChatResponse.toBuilder() - Create builder from existing response
  • ChatResponse.builder() - Create new builder

ChatRequestParameters

Interface for configuring chat model parameters. Can be set at model level as defaults or per-request.

package dev.langchain4j.model.chat.request;

import dev.langchain4j.agent.tool.ToolSpecification;
import java.util.List;

/**
 * Interface for chat request parameters
 * Configure temperature, max tokens, tools, response format, etc.
 */
public interface ChatRequestParameters {
    /**
     * Get model name
     * @return Model name
     */
    String modelName();

    /**
     * Get temperature
     * @return Temperature value
     */
    Double temperature();

    /**
     * Get topP
     * @return TopP value
     */
    Double topP();

    /**
     * Get topK
     * @return TopK value
     */
    Integer topK();

    /**
     * Get frequency penalty
     * @return Frequency penalty value
     */
    Double frequencyPenalty();

    /**
     * Get presence penalty
     * @return Presence penalty value
     */
    Double presencePenalty();

    /**
     * Get max output tokens
     * @return Max output tokens
     */
    Integer maxOutputTokens();

    /**
     * Get stop sequences
     * @return List of stop sequences
     */
    List<String> stopSequences();

    /**
     * Get tool specifications
     * @return List of tool specifications
     */
    List<ToolSpecification> toolSpecifications();

    /**
     * Get tool choice
     * @return Tool choice
     */
    ToolChoice toolChoice();

    /**
     * Get response format
     * @return Response format
     */
    ResponseFormat responseFormat();

    /**
     * Override these parameters with non-null values from other parameters
     * @param parameters Parameters to override with
     * @return New parameters with overrides applied
     */
    ChatRequestParameters overrideWith(ChatRequestParameters parameters);

    /**
     * Use these parameters as defaults, filling nulls with values from other parameters
     * @param parameters Default parameters
     * @return New parameters with defaults applied
     */
    ChatRequestParameters defaultedBy(ChatRequestParameters parameters);

    /**
     * Create builder
     * @return Builder instance
     */
    static DefaultChatRequestParameters.Builder<?> builder();
}

Thread Safety

  • ChatRequestParameters implementations are typically immutable and thread-safe.
  • The overrideWith() and defaultedBy() methods create new instances rather than modifying in place, ensuring thread safety.
  • Can be safely shared across threads as model-level default parameters.

Common Pitfalls

  • overrideWith vs defaultedBy confusion: overrideWith(other) means "other's non-null values replace mine". defaultedBy(other) means "other's values fill my nulls".
  • Null return values: All getter methods may return null for unset parameters. Always check for null before using values.
  • Parameter hierarchy: When using both model-level parameters and request-level parameters, the precedence is: request individual > request parameters > model parameters > provider defaults.
  • Mutability assumption: Do NOT assume you can modify returned collections (stopSequences, toolSpecifications). They are typically unmodifiable.

Edge Cases

  • Empty vs null collections: A null toolSpecifications means "not specified" (inherit from defaults), while an empty list means "explicitly no tools".
  • Parameter merging chains: Calling params1.overrideWith(params2).defaultedBy(params3) applies operations left-to-right. Order matters.
  • Provider compatibility: Not all providers support all parameters. Unsupported parameters are typically ignored silently.

Performance Notes

  • Parameter merging: overrideWith() and defaultedBy() create new parameter objects. For hot paths, cache merged results rather than recomputing.
  • Collection copying: Implementations may defensively copy collections on access. Cache returned lists if iterating multiple times.

Cost Considerations

  • Parameter impact on cost: Temperature, topP, and maxOutputTokens directly affect generation cost. Conservative maxOutputTokens prevents budget overruns.

Exception Handling

  • IllegalArgumentException: Some implementations validate parameter ranges on construction. Check builder documentation for specific constraints.
  • UnsupportedOperationException: Attempting to modify returned collections (stopSequences, toolSpecifications) will throw this exception.

Related APIs

  • ChatRequest.parameters() - Set parameters on a request
  • DefaultChatRequestParameters - Default implementation
  • ChatModel - Accepts parameters as default configuration

StreamingChatResponseHandler

Handler interface for receiving streaming responses from chat models token-by-token.

package dev.langchain4j.model.chat.response;

/**
 * Handler for streaming chat responses
 * Receives tokens and tool calls as they arrive
 */
public interface StreamingChatResponseHandler {
    /**
     * Called for each partial text response token
     * @param partialResponse Partial response text
     */
    void onPartialResponse(String partialResponse);

    /**
     * Called for each partial response with context
     * @param partialResponse Partial response object
     * @param context Response context with streaming handle
     */
    void onPartialResponse(PartialResponse partialResponse, PartialResponseContext context);

    /**
     * Called for each partial thinking/reasoning token
     * @param partialThinking Partial thinking text
     */
    void onPartialThinking(PartialThinking partialThinking);

    /**
     * Called for each partial thinking with context
     * @param partialThinking Partial thinking object
     * @param context Thinking context with streaming handle
     */
    void onPartialThinking(PartialThinking partialThinking, PartialThinkingContext context);

    /**
     * Called for each partial tool call chunk
     * @param partialToolCall Partial tool call
     */
    void onPartialToolCall(PartialToolCall partialToolCall);

    /**
     * Called for each partial tool call with context
     * @param partialToolCall Partial tool call object
     * @param context Tool call context with streaming handle
     */
    void onPartialToolCall(PartialToolCall partialToolCall, PartialToolCallContext context);

    /**
     * Called when a tool call is complete
     * @param completeToolCall Complete tool call
     */
    void onCompleteToolCall(CompleteToolCall completeToolCall);

    /**
     * Called when streaming is complete
     * @param completeResponse Complete response with full message
     */
    void onCompleteResponse(ChatResponse completeResponse);

    /**
     * Called if an error occurs
     * @param error Error that occurred
     */
    void onError(Throwable error);
}

Thread Safety

  • Handler callbacks are invoked sequentially by the streaming thread. The framework does NOT invoke multiple callbacks concurrently on the same handler instance.
  • However, your handler implementation must be thread-safe if you share state with other threads (e.g., updating UI, writing to shared collections).
  • Use concurrent data structures (ConcurrentHashMap, CopyOnWriteArrayList) or synchronization if modifying shared state from handler methods.
  • The streaming thread is typically a background I/O thread. Do NOT perform long-running operations in callbacks as this blocks streaming.

Common Pitfalls

  • Blocking operations in callbacks: Do NOT perform blocking I/O, sleep, or long computations in handler methods. This stalls the streaming pipeline. Offload heavy work to separate threads.
  • Exception swallowing: Exceptions thrown from handler methods may be caught and passed to onError(), but this terminates the stream. Handle errors internally when possible.
  • Partial response accumulation: onPartialResponse() receives incremental chunks, not complete sentences. You must accumulate them yourself to build complete text.
  • onCompleteResponse not called: If an error occurs or the connection drops, onCompleteResponse() may never be called. Always handle this scenario gracefully.
  • Context method calls after streaming ends: Do NOT call methods on PartialResponseContext or similar context objects after the streaming completes. They become invalid.

Edge Cases

  • Empty partial responses: Some chunks may be empty strings, especially at the start/end of streaming. Handle empty strings gracefully.
  • onPartialThinking support: Not all providers support thinking/reasoning tokens. This callback may never be invoked for many models.
  • Tool call streaming: Tool calls may arrive in multiple chunks via onPartialToolCall(). A complete tool call is signaled by onCompleteToolCall().
  • Multiple tool calls: A single response may include multiple tool calls. onCompleteToolCall() is invoked once per tool.
  • Interleaved text and tool calls: Some responses may stream text, then tool calls, then more text. Handle this interleaving carefully.

Performance Notes

  • Minimize work per callback: Handler methods are called for every token (potentially hundreds of times per second). Keep processing minimal.
  • String concatenation: For onPartialResponse(), use StringBuilder to accumulate text rather than string concatenation to avoid O(n²) complexity.
  • Context access overhead: Accessing context objects (PartialResponseContext, etc.) may involve lookups. Cache values if accessing multiple times within a callback.
  • Backpressure handling: The framework may implement backpressure if handler methods are slow. Ensure handlers complete quickly to maintain streaming throughput.

Cost Considerations

  • Streaming has same token cost: Streaming and non-streaming requests consume identical tokens. Streaming only affects delivery latency, not cost.
  • Early termination: If you detect sufficient response quality early, you can use context methods to cancel the stream, potentially saving output tokens (provider-dependent).

Exception Handling

  • onError() is terminal: Once onError() is called, no further callbacks occur. The stream is closed.
  • Throwable parameter: onError() receives Throwable (not Exception), so it can handle both checked and unchecked exceptions, including errors like OutOfMemoryError.
  • Network failures: Connection drops, timeouts, and provider errors typically invoke onError() with IOException or provider-specific exceptions.
  • Partial state cleanup: When onError() is called, ensure you clean up any accumulated partial state (partial text, incomplete tool calls).

Related APIs

  • StreamingChatModel.chat() - Accepts handler for streaming requests
  • PartialResponse, PartialThinking, PartialToolCall - Partial data types
  • PartialResponseContext, PartialThinkingContext, PartialToolCallContext - Context objects for streaming control
  • ChatResponse - Final complete response passed to onCompleteResponse()

Response Generic Wrapper

Generic wrapper for model responses with token usage and finish reason.

package dev.langchain4j.model.output;

import java.util.Map;

/**
 * Generic response wrapper containing content, token usage, and finish reason
 */
public class Response<T> {
    /**
     * Create response with content only
     * @param content Response content
     */
    public Response(T content);

    /**
     * Create response with content and token usage
     * @param content Response content
     * @param tokenUsage Token usage
     * @param finishReason Finish reason
     */
    public Response(T content, TokenUsage tokenUsage, FinishReason finishReason);

    /**
     * Create response with all fields
     * @param content Response content
     * @param tokenUsage Token usage
     * @param finishReason Finish reason
     * @param metadata Additional metadata
     */
    public Response(
        T content,
        TokenUsage tokenUsage,
        FinishReason finishReason,
        Map<String, Object> metadata
    );

    /**
     * Get response content
     * @return Content
     */
    public T content();

    /**
     * Get token usage
     * @return Token usage or null
     */
    public TokenUsage tokenUsage();

    /**
     * Get finish reason
     * @return Finish reason or null
     */
    public FinishReason finishReason();

    /**
     * Get metadata
     * @return Metadata map
     */
    public Map<String, Object> metadata();

    /**
     * Create response from content (factory method)
     * @param content Response content
     * @return Response instance
     */
    public static <T> Response<T> from(T content);

    /**
     * Create response from content and token usage (factory method)
     * @param content Response content
     * @param tokenUsage Token usage
     * @return Response instance
     */
    public static <T> Response<T> from(T content, TokenUsage tokenUsage);

    /**
     * Create response from content, usage, and finish reason (factory method)
     * @param content Response content
     * @param tokenUsage Token usage
     * @param finishReason Finish reason
     * @return Response instance
     */
    public static <T> Response<T> from(T content, TokenUsage tokenUsage, FinishReason finishReason);

    /**
     * Create response with all fields (factory method)
     * @param content Response content
     * @param tokenUsage Token usage
     * @param finishReason Finish reason
     * @param metadata Additional metadata
     * @return Response instance
     */
    public static <T> Response<T> from(
        T content,
        TokenUsage tokenUsage,
        FinishReason finishReason,
        Map<String, Object> metadata
    );
}

Thread Safety

  • Response<T> instances are immutable and thread-safe if the content type T is also immutable.
  • If T is mutable, the Response wrapper itself doesn't provide thread safety guarantees for the content.
  • The metadata map, if provided, is stored as-is. Use immutable maps or concurrent maps if sharing across threads.

Common Pitfalls

  • Null content: The content field can be null. Always check before accessing or casting.
  • Mutable content risk: If T is a mutable type and you pass it to Response, external modifications will affect the Response content. Use defensive copying if needed.
  • Mutable metadata map: Passing a mutable map to metadata allows external modification. Use Map.copyOf() for immutability.
  • Factory method vs constructor: Prefer factory methods (from()) over constructors for better readability and future extensibility.

Edge Cases

  • Null tokenUsage and finishReason: These fields are optional and may be null, especially for non-LLM model types (embeddings, moderation).
  • Empty metadata: If no metadata is provided, the map may be null or empty depending on constructor used. Always null-check before accessing.
  • Generic type erasure: At runtime, the generic type T is erased. You cannot perform instanceof checks on the generic type.

Performance Notes

  • Lightweight wrapper: Response<T> has minimal overhead beyond the contained objects.
  • Metadata map copying: If you pass a large metadata map, consider its memory footprint. Avoid excessive metadata for high-throughput scenarios.

Cost Considerations

  • TokenUsage tracking: Use tokenUsage() to track costs across different model types (chat, embeddings, etc.). Aggregate token usage for billing.

Exception Handling

  • NullPointerException risk: Accessing properties of null tokenUsage(), finishReason(), or metadata() will throw NPE. Always null-check optional fields.
  • ClassCastException: If you retrieve metadata values with incorrect types, casting may fail. Use safe type checking.

Related APIs

  • ChatResponse - Specialized response type for chat models
  • TokenUsage - Token consumption details
  • FinishReason - Completion status

TokenUsage

Tracks token consumption for model calls.

package dev.langchain4j.model.output;

/**
 * Tracks token usage for model calls
 */
public class TokenUsage {
    /**
     * Create empty token usage
     */
    public TokenUsage();

    /**
     * Create with input token count
     * @param inputTokenCount Input tokens used
     */
    public TokenUsage(Integer inputTokenCount);

    /**
     * Create with input and output counts
     * @param inputTokenCount Input tokens used
     * @param outputTokenCount Output tokens generated
     */
    public TokenUsage(Integer inputTokenCount, Integer outputTokenCount);

    /**
     * Create with all counts
     * @param inputTokenCount Input tokens used
     * @param outputTokenCount Output tokens generated
     * @param totalTokenCount Total tokens
     */
    public TokenUsage(Integer inputTokenCount, Integer outputTokenCount, Integer totalTokenCount);

    /**
     * Get input token count
     * @return Input tokens used
     */
    public Integer inputTokenCount();

    /**
     * Get output token count
     * @return Output tokens generated
     */
    public Integer outputTokenCount();

    /**
     * Get total token count
     * @return Total tokens
     */
    public Integer totalTokenCount();

    /**
     * Add two token usages together
     * @param first First token usage
     * @param second Second token usage
     * @return Combined token usage
     */
    public static TokenUsage sum(TokenUsage first, TokenUsage second);

    /**
     * Add another token usage to this one
     * @param that Token usage to add
     * @return Combined token usage
     */
    public TokenUsage add(TokenUsage that);
}

Thread Safety

  • TokenUsage instances are immutable and thread-safe.
  • The add() and sum() methods return new instances rather than modifying existing ones.
  • Safe to share across threads and accumulate from multiple concurrent requests.

Common Pitfalls

  • Null token counts: Individual count fields (input, output, total) may be null if not provided by the provider. Always null-check before arithmetic operations.
  • totalTokenCount inconsistency: Some providers set totalTokenCount independently rather than as inputTokenCount + outputTokenCount. Do not assume equality.
  • Adding null usages: Calling sum(null, usage) or usage.add(null) handles nulls gracefully (treats null as zero), but be aware of this behavior.
  • Integer overflow: For extremely long-running applications with millions of tokens, Integer overflow is theoretically possible. Consider using Long for aggregation if needed.

Edge Cases

  • Zero tokens: A response with zero output tokens (e.g., immediate stop, content filter) will have outputTokenCount = 0, not null.
  • Provider estimation: Some providers estimate token counts rather than computing exact values. Counts may vary slightly between calls with identical inputs.
  • Streaming vs non-streaming: Token counts should be identical for streaming and non-streaming calls, but some providers may compute them differently.

Performance Notes

  • Immutable arithmetic: Each add() or sum() call allocates a new TokenUsage instance. For high-frequency aggregation, this creates GC pressure. Consider batching additions.
  • Null checks: The add() and sum() methods perform null checks internally. Minimize calls in hot loops.

Cost Considerations

  • Input vs output token pricing: Output tokens typically cost 2-4x more than input tokens. Track separately for accurate cost estimation.
  • Total token budgets: Use totalTokenCount() for high-level budget enforcement. Track cumulative usage across conversation turns.
  • Provider-specific costs: Token counts are provider-agnostic, but costs vary widely. Always multiply by provider-specific pricing.

Exception Handling

  • NullPointerException: Accessing fields on a null TokenUsage instance throws NPE. Always check for null TokenUsage before accessing counts.
  • ArithmeticException: Null-safe arithmetic in add() and sum() treats null as zero, avoiding exceptions. However, manual null addition without these methods may throw.

Related APIs

  • ChatResponse.tokenUsage() - Get token usage from chat response
  • Response.tokenUsage() - Get token usage from generic response
  • ChatResponseMetadata - May contain additional token breakdown details

FinishReason

Enum indicating why model generation stopped.

package dev.langchain4j.model.output;

/**
 * Indicates why model generation stopped
 */
public enum FinishReason {
    /**
     * Model decided the response was complete
     */
    STOP,

    /**
     * Maximum token length was reached
     */
    LENGTH,

    /**
     * Model is requesting tool execution
     */
    TOOL_EXECUTION,

    /**
     * Content was filtered
     */
    CONTENT_FILTER,

    /**
     * Other reason
     */
    OTHER
}

Thread Safety

  • Enum instances are inherently thread-safe. Can be safely shared and compared across threads.

Common Pitfalls

  • Null finish reason: ChatResponse.finishReason() may return null for some providers. Always null-check before switching on the enum.
  • Assuming STOP means success: STOP indicates natural completion, but the response quality may still be poor (e.g., refusal, hallucination).
  • LENGTH vs STOP confusion: LENGTH means the model had more to say but was cut off. Consider retrying with higher maxOutputTokens or prompting for continuation.
  • TOOL_EXECUTION handling: When finish reason is TOOL_EXECUTION, you MUST execute the requested tools and submit results back to the model. Ignoring this breaks the agent loop.

Edge Cases

  • CONTENT_FILTER: Indicates provider safety filters blocked the response. The response content may be empty or a refusal message. Log these for monitoring.
  • OTHER: Catch-all for provider-specific reasons. Check ChatResponseMetadata for provider details.
  • Multiple tool calls with TOOL_EXECUTION: A single response may request multiple tool calls, all indicated by one TOOL_EXECUTION finish reason.

Performance Notes

  • Enum comparison: Enum equality checks are fast (reference comparison). Use == rather than .equals() for efficiency.

Cost Considerations

  • LENGTH finish reason cost implications: Responses terminated by LENGTH still consume full maxOutputTokens budget. Adjust limits to prevent wasteful token usage.

Exception Handling

  • Switch exhaustiveness: When switching on FinishReason, always include a default case to handle future enum additions or the OTHER case.

Related APIs

  • ChatResponse.finishReason() - Get finish reason from response
  • Response.finishReason() - Get finish reason from generic response

ToolChoice

Enum controlling whether and when the model should use tools.

package dev.langchain4j.model.chat.request;

/**
 * Controls tool usage behavior
 */
public enum ToolChoice {
    /**
     * Model can choose whether to use tools
     */
    AUTO,

    /**
     * Model must use one or more tools
     */
    REQUIRED,

    /**
     * Model cannot use tools
     */
    NONE
}

Thread Safety

  • Enum instances are thread-safe. Can be safely shared across threads.

Common Pitfalls

  • REQUIRED without tools: Setting ToolChoice.REQUIRED but providing empty or null toolSpecifications causes request failures. Always provide at least one tool.
  • NONE with tools provided: Setting ToolChoice.NONE while providing toolSpecifications wastes input tokens. The model ignores the tools but they still consume tokens.
  • AUTO does not guarantee tool use: AUTO allows but does not force tool usage. The model may choose to respond without tools even if they are available.
  • Provider support: Not all providers support all ToolChoice values. Some providers may ignore REQUIRED and treat it as AUTO.

Edge Cases

  • REQUIRED with single tool: When REQUIRED is set with exactly one tool, the model must call that specific tool. This effectively forces function calling.
  • REQUIRED with multiple tools: The model must call at least one tool, but can choose which one(s). It may call multiple tools in one response.
  • Switching mid-conversation: You can change ToolChoice between turns. For example, use REQUIRED for the first turn to force data gathering, then AUTO for follow-up.

Performance Notes

  • NONE eliminates tool processing: Setting NONE skips tool evaluation logic in the model, slightly reducing latency.
  • REQUIRED increases tool call probability: Using REQUIRED may increase latency if the model must reason about tool selection.

Cost Considerations

  • Tool specifications token cost: Even with ToolChoice.NONE, if you include toolSpecifications in the request, they consume input tokens. Remove them entirely when not needed.
  • REQUIRED forces tool calls: Tool call outputs consume tokens (potentially many for complex function arguments). Use REQUIRED judiciously to avoid unnecessary token usage.

Exception Handling

  • IllegalArgumentException: Some providers throw exceptions if REQUIRED is used with unsupported model versions or empty tool lists.
  • Provider fallback: If a provider doesn't support REQUIRED, it may silently fall back to AUTO. Check provider documentation.

Related APIs

  • ChatRequest.toolChoice() - Set tool choice on request
  • ChatRequestParameters.toolChoice() - Set default tool choice
  • ToolSpecification - Tool definitions (see tools.md)

ResponseFormat

Specifies the format of model responses (text or JSON schema).

package dev.langchain4j.model.chat.request;

import dev.langchain4j.model.chat.request.json.JsonSchema;

/**
 * Specifies response format
 */
public class ResponseFormat {
    /**
     * Text response format (default)
     */
    public static final ResponseFormat TEXT;

    /**
     * JSON response format
     */
    public static final ResponseFormat JSON;

    /**
     * Get response format type
     * @return Response format type
     */
    public ResponseFormatType type();

    /**
     * Get JSON schema (if type is JSON)
     * @return JSON schema or null
     */
    public JsonSchema jsonSchema();

    /**
     * Create builder
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety

  • ResponseFormat instances are immutable and thread-safe. Can be safely shared across threads.
  • The predefined constants TEXT and JSON are safe to use concurrently.

Common Pitfalls

  • JSON format without schema: Using ResponseFormat.JSON without a schema provides unstructured JSON. For structured output, provide a JsonSchema via the builder.
  • Schema complexity: Overly complex JSON schemas (deep nesting, many fields) may confuse the model or cause validation failures. Keep schemas simple.
  • Provider support: Not all providers support JSON response format. Check provider capabilities before using.
  • JSON parsing failures: Even with JSON format, the model may occasionally generate invalid JSON. Always validate and handle parse errors.

Edge Cases

  • JSON format with TEXT type: Do NOT manually set type = TEXT with a jsonSchema. This is inconsistent and may cause errors.
  • Schema-based validation: Some providers validate the response against the schema server-side; others don't. Client-side validation is recommended.
  • Empty JSON objects: The model may return {} if it cannot generate valid content matching the schema. Handle empty responses.

Performance Notes

  • JSON format latency: JSON responses may have slightly higher latency due to schema processing and validation.
  • Schema token cost: Complex JSON schemas consume input tokens. Minimize schema verbosity for cost efficiency.

Cost Considerations

  • Schema in every request: The JSON schema is sent with every request as part of the input tokens. For multi-turn conversations, this adds up quickly.
  • JSON responses are typically longer: JSON with field names and structure may consume more output tokens than plain text for equivalent information.

Exception Handling

  • IllegalArgumentException: Thrown by builder if both TEXT type and JSON schema are specified simultaneously.
  • JsonParseException: Downstream JSON parsing of the response content may fail if the model generates malformed JSON despite JSON format being specified.

Related APIs

  • ChatRequest.responseFormat() - Set response format on request
  • ChatRequestParameters.responseFormat() - Set default response format
  • JsonSchema - Define expected JSON structure
  • ResponseFormatType - Enum for format type

ResponseFormatType

package dev.langchain4j.model.chat.request;

/**
 * Type of response format
 */
public enum ResponseFormatType {
    TEXT,
    JSON
}

Thread Safety

  • Enum instances are thread-safe. Can be safely shared across threads.

Common Pitfalls

  • Direct usage: Typically, you should use the ResponseFormat.TEXT or ResponseFormat.JSON constants rather than constructing ResponseFormat with this enum directly.

Exception Handling

  • Switch exhaustiveness: Always include a default case when switching on ResponseFormatType to handle potential future additions.

Related APIs

  • ResponseFormat.type() - Get the format type
  • ResponseFormat - Response format wrapper

Parameter Tuning Guide

Understanding how to tune parameters for optimal results is crucial for production deployments.

Temperature

  • Range: 0.0 (deterministic) to 2.0 (highly random)
  • Recommended values:
    • 0.0-0.3: Factual Q&A, code generation, structured data extraction
    • 0.4-0.7: General conversation, balanced creativity
    • 0.8-1.2: Creative writing, brainstorming
    • 1.3-2.0: Experimental, highly varied outputs (rarely useful in production)
  • Tuning tip: Start at 0.7 and adjust based on output variability needs. Increment by 0.1.

TopP (Nucleus Sampling)

  • Range: 0.0 to 1.0
  • Recommended values:
    • 0.1-0.3: Very focused, deterministic-like outputs
    • 0.4-0.7: Balanced, most common for production
    • 0.8-0.95: More variety while maintaining quality
    • 0.95-1.0: Maximum diversity
  • Tuning tip: Use topP OR temperature, not both. TopP is generally more controllable. Start at 0.9.
  • Note: Do NOT set both topP and topK unless your provider explicitly supports combined sampling.

TopK

  • Range: 1 to vocabulary size (typically 10-100 used)
  • Recommended values:
    • 1-10: Highly constrained, deterministic
    • 20-40: Standard range for most tasks
    • 50-100: More creative outputs
  • Tuning tip: Less common than topP. Use for providers that don't support topP well. Start at 40.

Frequency Penalty

  • Range: -2.0 to 2.0 (provider-dependent, often 0.0 to 2.0)
  • Recommended values:
    • 0.0: No penalty (default)
    • 0.1-0.5: Slight reduction of repetition
    • 0.6-1.0: Moderate reduction, good for varied outputs
    • 1.0-2.0: Strong penalty, may produce unnatural text
  • Tuning tip: Use to combat repetitive responses. Start at 0.3 and increase if repetition persists.

Presence Penalty

  • Range: -2.0 to 2.0 (provider-dependent, often 0.0 to 2.0)
  • Recommended values:
    • 0.0: No penalty (default)
    • 0.1-0.5: Encourage new topics
    • 0.6-1.0: Strong encouragement for topic diversity
    • 1.0-2.0: May produce disjointed responses
  • Tuning tip: Use to encourage the model to explore new topics. Combine with frequency penalty for best results. Start at 0.3.

MaxOutputTokens

  • Range: 1 to model-specific maximum (typically 1024-32768)
  • Recommended values:
    • 100-200: Short answers, classifications
    • 300-500: Paragraph responses
    • 500-1000: Detailed explanations
    • 1000-4000: Long-form content, articles
    • 4000+: Very long documents (consider chunking)
  • Tuning tip: Set conservatively to control costs. The model stops at natural completion even if limit isn't reached. Monitor finishReason for LENGTH to detect truncation.

Stop Sequences

  • Use cases:
    • Structured output termination: Use custom markers like "###", "---" to signal sections
    • Dialog termination: Use "\nUser:", "\nAssistant:" to prevent run-on conversations
    • JSON completion: Use "}]" or "}" to stop at structure boundaries
  • Tuning tip: Limit to 4-6 sequences (provider limits). Test to ensure they don't trigger prematurely.

Tool Configuration

  • Tool specifications count: Minimize to essential tools. Each tool adds ~50-200 input tokens.
  • ToolChoice tuning:
    • Use AUTO for most scenarios
    • Use REQUIRED when you need guaranteed function calling (e.g., first turn of agent loop)
    • Use NONE when tools are not relevant for a specific turn (saves tokens)

Response Format

  • TEXT vs JSON: Use JSON only when structured output is critical. JSON adds token overhead and latency.
  • JSON schema simplicity: Keep schemas flat when possible. Deep nesting increases token costs and error rates.

Streaming Best Practices

Connection Management

  • Timeout configuration: Set appropriate read timeouts for streaming connections (recommended: 30-60 seconds for text, up to 5 minutes for long-form generation).
  • Retry logic: Implement exponential backoff for connection failures. Do NOT retry immediately on network errors.
  • Connection pooling: Reuse HTTP clients and connection pools across requests to minimize overhead.

Handler Implementation

  • Offload heavy processing: Use a separate thread or executor for expensive operations triggered by streaming callbacks.
  • Buffer management: Accumulate partial responses in a StringBuilder with reasonable initial capacity (recommended: 1024 characters).
  • Error recovery: Implement onError() to log failures, clean up state, and optionally retry.

User Experience

  • Immediate feedback: Display the first token within 500ms to provide responsiveness feedback to users.
  • Smooth rendering: Batch partial responses for UI updates (e.g., every 50ms or every 5 tokens) to avoid excessive reflows.
  • Progress indicators: Show token count or time elapsed during streaming for long responses.

State Management

  • Thread-safe accumulation: Use ConcurrentHashMap or synchronized collections if updating shared state from handler callbacks.
  • Cleanup on completion: Clear accumulated state in both onCompleteResponse() and onError() to prevent memory leaks.

Monitoring and Debugging

  • Token latency tracking: Measure time-to-first-token (TTFT) and tokens-per-second (TPS) for performance monitoring.
  • Partial response logging: Log partial responses at debug level for troubleshooting streaming issues.
  • Error categorization: Distinguish between network errors, provider errors, and application errors in onError() handling.

Testing Patterns

Unit Testing Requests

// Test request builder validation
@Test
void testChatRequestRequiresMessages() {
    assertThrows(IllegalArgumentException.class, () -> {
        ChatRequest.builder().build();
    });
}

// Test parameter precedence
@Test
void testRequestParameterOverride() {
    ChatRequestParameters defaults = ChatRequestParameters.builder()
        .temperature(0.5)
        .build();

    ChatRequest request = ChatRequest.builder()
        .messages(userMessage("test"))
        .parameters(defaults)
        .temperature(0.9)  // Should override parameters
        .build();

    assertEquals(0.9, request.temperature());
}

Unit Testing Responses

// Test response parsing
@Test
void testChatResponseWithTokenUsage() {
    ChatResponse response = ChatResponse.builder()
        .aiMessage(new AiMessage("test response"))
        .tokenUsage(new TokenUsage(10, 20, 30))
        .finishReason(FinishReason.STOP)
        .build();

    assertEquals(30, response.tokenUsage().totalTokenCount());
    assertEquals(FinishReason.STOP, response.finishReason());
}

Mock Streaming Handlers

// Mock handler for testing streaming logic
public class TestStreamingHandler implements StreamingChatResponseHandler {
    private final StringBuilder accumulated = new StringBuilder();
    private ChatResponse completeResponse;
    private Throwable error;

    @Override
    public void onPartialResponse(String partialResponse) {
        accumulated.append(partialResponse);
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        this.completeResponse = response;
    }

    @Override
    public void onError(Throwable error) {
        this.error = error;
    }

    public String getAccumulatedText() {
        return accumulated.toString();
    }

    public ChatResponse getCompleteResponse() {
        return completeResponse;
    }

    public Throwable getError() {
        return error;
    }
}

@Test
void testStreamingCompletion() {
    TestStreamingHandler handler = new TestStreamingHandler();
    // Simulate streaming
    handler.onPartialResponse("Hello ");
    handler.onPartialResponse("world");
    handler.onCompleteResponse(mockResponse());

    assertEquals("Hello world", handler.getAccumulatedText());
    assertNotNull(handler.getCompleteResponse());
}

Integration Testing with Mocks

// Mock chat model for integration tests
public class MockChatModel implements ChatModel {
    private final String mockedResponse;

    public MockChatModel(String mockedResponse) {
        this.mockedResponse = mockedResponse;
    }

    @Override
    public ChatResponse chat(ChatRequest request) {
        return ChatResponse.builder()
            .aiMessage(new AiMessage(mockedResponse))
            .tokenUsage(new TokenUsage(10, mockedResponse.length(), 10 + mockedResponse.length()))
            .finishReason(FinishReason.STOP)
            .build();
    }
}

@Test
void testChatWorkflow() {
    ChatModel model = new MockChatModel("Mocked AI response");
    ChatRequest request = ChatRequest.builder()
        .messages(userMessage("test"))
        .build();

    ChatResponse response = model.chat(request);
    assertEquals("Mocked AI response", response.aiMessage().text());
}

Testing Token Usage Aggregation

@Test
void testTokenUsageSum() {
    TokenUsage usage1 = new TokenUsage(100, 50);
    TokenUsage usage2 = new TokenUsage(200, 75);

    TokenUsage total = TokenUsage.sum(usage1, usage2);

    assertEquals(300, total.inputTokenCount());
    assertEquals(125, total.outputTokenCount());
}

@Test
void testTokenUsageSumWithNulls() {
    TokenUsage usage1 = new TokenUsage(100, 50);

    TokenUsage total = TokenUsage.sum(usage1, null);

    assertEquals(100, total.inputTokenCount());
    assertEquals(50, total.outputTokenCount());
}

Testing Finish Reasons

@Test
void testFinishReasonHandling() {
    ChatResponse response = mockResponseWithFinishReason(FinishReason.LENGTH);

    if (response.finishReason() == FinishReason.LENGTH) {
        // Handle truncation
        assertTrue(response.aiMessage().text().length() > 0);
    }
}

@Test
void testToolExecutionFinishReason() {
    ChatResponse response = mockResponseWithFinishReason(FinishReason.TOOL_EXECUTION);

    assertEquals(FinishReason.TOOL_EXECUTION, response.finishReason());
    assertFalse(response.aiMessage().toolCalls().isEmpty());
}

Related APIs Cross-Reference

From ChatRequest

  • Messages: See messages.md for ChatMessage, UserMessage, SystemMessage, AiMessage
  • Tools: See tools.md for ToolSpecification, ToolExecutionRequest
  • Models: See chat-models.md for ChatModel, StreamingChatModel

From ChatResponse

  • Messages: See messages.md for AiMessage, ToolExecutionResultMessage
  • Metadata: See chat-models.md for provider-specific metadata details

From StreamingChatResponseHandler

  • Partial types: See streaming documentation for PartialResponse, PartialThinking, PartialToolCall
  • Context types: See streaming documentation for PartialResponseContext, PartialThinkingContext, PartialToolCallContext

From TokenUsage

  • Cost calculation: See provider documentation for token pricing details
  • Embeddings: See embeddings.md for embedding model token usage

From ToolChoice

  • Tool specifications: See tools.md for ToolSpecification and tool definition best practices
  • Agent loops: See agents.md for agent execution patterns with tool calls

From ResponseFormat

  • JSON schemas: See structured-output.md for JsonSchema definition and usage
  • Structured output: See structured-output.md for advanced structured output patterns

Production Checklist

Before Deployment

  • Tune parameters (temperature, maxOutputTokens) for your use case
  • Set conservative maxOutputTokens to control costs
  • Implement token usage logging and monitoring
  • Configure timeouts for streaming connections
  • Add retry logic with exponential backoff
  • Test error handling paths (network failures, provider errors)
  • Validate JSON schema responses if using ResponseFormat.JSON
  • Implement rate limiting to respect provider quotas

Monitoring in Production

  • Track token usage per request and cumulative
  • Monitor finish reason distribution (watch for LENGTH, CONTENT_FILTER spikes)
  • Measure time-to-first-token (TTFT) for streaming
  • Alert on elevated error rates
  • Log full request/response pairs at debug level for investigation
  • Track cost per user/session for budget enforcement

Cost Optimization

  • Use stop sequences to terminate early when possible
  • Minimize tool specifications to essential ones
  • Remove unnecessary system messages
  • Use streaming to provide early feedback without increasing costs
  • Batch requests where applicable
  • Cache responses for repeated queries

Security and Reliability

  • Validate all user-provided content before including in messages
  • Sanitize tool call arguments before execution
  • Implement circuit breakers for provider outages
  • Use exponential backoff with jitter for retries
  • Handle CONTENT_FILTER finish reason gracefully
  • Audit and rotate API keys regularly

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j

docs

ai-services.md

chains.md

classification.md

data-types.md

document-processing.md

embedding-store.md

guardrails.md

index.md

memory.md

messages.md

models.md

output-parsing.md

prompts.md

rag.md

request-response.md

spi.md

tools.md

README.md

tile.json