tessl/maven-dev-langchain4j--langchain4j

Build LLM-powered applications in Java with support for chatbots, agents, RAG, tools, and much more

Overview

Eval results

Files

AI Services

Name: tessl/maven-dev-langchain4j--langchain4j
Author: tessl

High-level API for creating AI-powered services by defining Java interfaces. AiServices provides implementations that automatically handle chat models, streaming, memory management, RAG, tools, guardrails, and various output types.

Capabilities

AiServices Class

The primary entry point for building AI services from Java interfaces.

package dev.langchain4j.service;

/**
 * Abstract class for building AI services from Java interfaces.
 * Supports system/user message templates, chat memory, RAG, tools, streaming,
 * moderation, and various return types.
 */
public abstract class AiServices<T> {
    /**
     * Create a simple AI service with a chat model
     * @param aiService Interface defining the AI service API
     * @param chatModel Chat model to use
     * @return Implementation of the AI service interface
     */
    public static <T> T create(Class<T> aiService, ChatModel chatModel);

    /**
     * Create a simple AI service with a streaming chat model
     * @param aiService Interface defining the AI service API
     * @param streamingChatModel Streaming chat model to use
     * @return Implementation of the AI service interface
     */
    public static <T> T create(Class<T> aiService, StreamingChatModel streamingChatModel);

    /**
     * Begin building an AI service with full configuration options
     * @param aiService Interface defining the AI service API
     * @return Builder for configuring the AI service
     */
    public static <T> AiServices<T> builder(Class<T> aiService);
}

Thread Safety

The AiServices.create() and AiServices.builder().build() methods return thread-safe proxy instances
Multiple threads can safely invoke methods on the same AI service instance concurrently
ChatMemory: When using shared chatMemory(), ensure the ChatMemory implementation is thread-safe (MessageWindowChatMemory is thread-safe)
ChatMemoryProvider: When using chatMemoryProvider(), the provider itself must be thread-safe and return thread-safe memory instances per memoryId
Tools: Tool objects passed to tools() must be thread-safe if concurrent invocations are expected
Listeners: All registered listeners must be thread-safe as they may be invoked from multiple threads

Common Pitfalls

DO NOT pass null for the interface class - throws IllegalArgumentException
DO NOT use non-interface classes - only Java interfaces are supported
DO NOT call create() with both chatModel and streamingChatModel null - at least one must be provided
DO NOT forget to call build() on the builder - the builder itself is not an AI service instance
DO NOT reuse builder instances across multiple threads - builders are not thread-safe
DO NOT mutate tool objects after passing them to the builder - this can cause unpredictable behavior

Edge Cases

If the interface has no methods, the service builds successfully but has no functionality
If the interface extends other interfaces, all inherited methods are supported
Default methods in the interface are ignored - only abstract methods are implemented
Generic type parameters in the interface are supported but type erasure applies at runtime
If interface methods have conflicting annotations (@SystemMessage at both type and method level), method-level takes precedence

Performance Notes

Creating an AI service instance is expensive (reflection, proxy generation) - reuse instances when possible
Consider caching AI service instances in singleton or application scope
Builder configuration is validated at build() time, not during individual setter calls
The proxy implementation uses JDK dynamic proxies, which have minimal overhead per invocation

Cost Considerations

Each method invocation typically results in one or more API calls to the underlying chat model
Token usage depends on: system message size, user message size, chat history, RAG content, and tool definitions
Using chatMemory() or chatMemoryProvider() increases token usage with each additional message in history
RAG integration (contentRetriever()) adds retrieved content tokens to each request
Tools add function definitions to every request, increasing input token count
Streaming does not reduce token costs - it only changes response delivery mechanism

Exception Handling

IllegalArgumentException: Thrown if aiService is null, not an interface, or if chatModel/streamingChatModel are both null
IllegalConfigurationException: Thrown at build() time if configuration is invalid
RuntimeException: Method invocations may throw if the underlying model throws
Tool execution errors are handled by toolExecutionErrorHandler() if configured
Moderation failures throw ModerationException if @Moderate annotation is used

Related APIs

Chat Models - Required for non-streaming services
Streaming Chat Models - Required for streaming services
Chat Memory - For conversation history
Tools - For function calling
RAG - For retrieval-augmented generation
Guardrails - For input/output validation

AiServices Builder

Complete builder API for configuring AI services with all available options.

/**
 * Builder for configuring AI services
 */
public class Builder<T> {
    /**
     * Configure chat model
     * @param chatModel Chat model to use
     * @return Builder instance
     */
    public Builder<T> chatModel(ChatModel chatModel);

    /**
     * Configure streaming chat model
     * @param streamingChatModel Streaming chat model to use
     * @return Builder instance
     */
    public Builder<T> streamingChatModel(StreamingChatModel streamingChatModel);

    /**
     * Set system message for all invocations
     * @param systemMessage System message text
     * @return Builder instance
     */
    public Builder<T> systemMessage(String systemMessage);

    /**
     * Set system message provider function
     * @param systemMessageProvider Function to provide system message
     * @return Builder instance
     */
    public Builder<T> systemMessageProvider(Function<Object, String> systemMessageProvider);

    /**
     * Set user message for all invocations
     * @param userMessage User message text
     * @return Builder instance
     */
    public Builder<T> userMessage(String userMessage);

    /**
     * Set user message provider function
     * @param userMessageProvider Function to provide user message
     * @return Builder instance
     */
    public Builder<T> userMessageProvider(Function<Object, String> userMessageProvider);

    /**
     * Set shared chat memory
     * @param chatMemory Chat memory instance
     * @return Builder instance
     */
    public Builder<T> chatMemory(ChatMemory chatMemory);

    /**
     * Set chat memory provider for per-user/conversation memory
     * @param chatMemoryProvider Chat memory provider
     * @return Builder instance
     */
    public Builder<T> chatMemoryProvider(ChatMemoryProvider chatMemoryProvider);

    /**
     * Set chat request transformer
     * @param chatRequestTransformer Transformer to modify requests
     * @return Builder instance
     */
    public Builder<T> chatRequestTransformer(UnaryOperator<ChatRequest> chatRequestTransformer);

    /**
     * Set chat request transformer with memory ID
     * @param chatRequestTransformer Transformer with memory ID parameter
     * @return Builder instance
     */
    public Builder<T> chatRequestTransformer(
        BiFunction<ChatRequest, Object, ChatRequest> chatRequestTransformer
    );

    /**
     * Set moderation model for content moderation
     * @param moderationModel Moderation model to use
     * @return Builder instance
     */
    public Builder<T> moderationModel(ModerationModel moderationModel);

    /**
     * Configure tools (objects with @Tool annotated methods)
     * @param objectsWithTools Objects containing tool methods
     * @return Builder instance
     */
    public Builder<T> tools(Object... objectsWithTools);

    /**
     * Configure tools from collection
     * @param objectsWithTools Collection of objects containing tool methods
     * @return Builder instance
     */
    public Builder<T> tools(Collection<Object> objectsWithTools);

    /**
     * Configure tools programmatically
     * @param tools Map of tool specifications to executors
     * @return Builder instance
     */
    public Builder<T> tools(Map<ToolSpecification, ToolExecutor> tools);

    /**
     * Configure tools with immediate return names
     * @param tools Map of tool specifications to executors
     * @param immediateReturnToolNames Set of tool names that return immediately
     * @return Builder instance
     */
    public Builder<T> tools(
        Map<ToolSpecification, ToolExecutor> tools,
        Set<String> immediateReturnToolNames
    );

    /**
     * Configure tool provider for dynamic tool selection
     * @param toolProvider Tool provider instance
     * @return Builder instance
     */
    public Builder<T> toolProvider(ToolProvider toolProvider);

    /**
     * Enable concurrent tool execution with default executor
     * @return Builder instance
     */
    public Builder<T> executeToolsConcurrently();

    /**
     * Enable concurrent tool execution with custom executor
     * @param executor Executor for concurrent tool execution
     * @return Builder instance
     */
    public Builder<T> executeToolsConcurrently(Executor executor);

    /**
     * Set max sequential tool invocations (default: 100)
     * @param maxSequentialToolsInvocations Maximum number of sequential tool invocations
     * @return Builder instance
     */
    public Builder<T> maxSequentialToolsInvocations(int maxSequentialToolsInvocations);

    /**
     * Set before tool execution callback
     * @param beforeToolExecution Callback to invoke before tool execution
     * @return Builder instance
     */
    public Builder<T> beforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecution);

    /**
     * Set after tool execution callback
     * @param afterToolExecution Callback to invoke after tool execution
     * @return Builder instance
     */
    public Builder<T> afterToolExecution(Consumer<ToolExecution> afterToolExecution);

    /**
     * Set strategy for handling hallucinated tool names
     * @param hallucinatedToolNameStrategy Strategy function
     * @return Builder instance
     */
    public Builder<T> hallucinatedToolNameStrategy(
        Function<ToolExecutionRequest, ToolExecutionResultMessage> hallucinatedToolNameStrategy
    );

    /**
     * Set handler for tool argument errors (JSON parsing, type mismatches)
     * @param handler Tool arguments error handler
     * @return Builder instance
     */
    public Builder<T> toolArgumentsErrorHandler(ToolArgumentsErrorHandler handler);

    /**
     * Set handler for tool execution errors
     * @param handler Tool execution error handler
     * @return Builder instance
     */
    public Builder<T> toolExecutionErrorHandler(ToolExecutionErrorHandler handler);

    /**
     * Configure content retriever for RAG
     * @param contentRetriever Content retriever instance
     * @return Builder instance
     */
    public Builder<T> contentRetriever(ContentRetriever contentRetriever);

    /**
     * Configure retrieval augmentor for RAG
     * @param retrievalAugmentor Retrieval augmentor instance
     * @return Builder instance
     */
    public Builder<T> retrievalAugmentor(RetrievalAugmentor retrievalAugmentor);

    /**
     * Register AI service listener
     * @param listener Listener to register
     * @return Builder instance
     */
    public <I> Builder<T> registerListener(AiServiceListener<I> listener);

    /**
     * Register multiple AI service listeners
     * @param listeners Listeners to register
     * @return Builder instance
     */
    public Builder<T> registerListeners(AiServiceListener<?>... listeners);

    /**
     * Register listener collection
     * @param listeners Collection of listeners to register
     * @return Builder instance
     */
    public Builder<T> registerListeners(Collection<? extends AiServiceListener<?>> listeners);

    /**
     * Unregister AI service listener
     * @param listener Listener to unregister
     * @return Builder instance
     */
    public <I> Builder<T> unregisterListener(AiServiceListener<I> listener);

    /**
     * Unregister multiple listeners
     * @param listeners Listeners to unregister
     * @return Builder instance
     */
    public Builder<T> unregisterListeners(AiServiceListener<?>... listeners);

    /**
     * Configure input guardrails
     * @param inputGuardrailsConfig Input guardrails configuration
     * @return Builder instance
     */
    public Builder<T> inputGuardrailsConfig(InputGuardrailsConfig inputGuardrailsConfig);

    /**
     * Configure output guardrails
     * @param outputGuardrailsConfig Output guardrails configuration
     * @return Builder instance
     */
    public Builder<T> outputGuardrailsConfig(OutputGuardrailsConfig outputGuardrailsConfig);

    /**
     * Set input guardrail classes
     * @param guardrailClasses List of guardrail classes
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrailClasses(List<Class<? extends I>> guardrailClasses);

    /**
     * Set input guardrail classes (varargs)
     * @param guardrailClasses Guardrail classes
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrailClasses(Class<? extends I>... guardrailClasses);

    /**
     * Set input guardrails
     * @param guardrails List of guardrails
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrails(List<I> guardrails);

    /**
     * Set input guardrails (varargs)
     * @param guardrails Guardrails
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrails(I... guardrails);

    /**
     * Set output guardrail classes
     * @param guardrailClasses List of guardrail classes
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrailClasses(List<Class<? extends O>> guardrailClasses);

    /**
     * Set output guardrail classes (varargs)
     * @param guardrailClasses Guardrail classes
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrailClasses(Class<? extends O>... guardrailClasses);

    /**
     * Set output guardrails
     * @param guardrails List of guardrails
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrails(List<O> guardrails);

    /**
     * Set output guardrails (varargs)
     * @param guardrails Guardrails
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrails(O... guardrails);

    /**
     * Configure whether to store RAG-augmented messages in chat memory
     * Default is true
     * @param storeRetrievedContentInChatMemory Whether to store retrieved content
     * @return Builder instance
     */
    public Builder<T> storeRetrievedContentInChatMemory(
        boolean storeRetrievedContentInChatMemory
    );

    /**
     * Build the AI service
     * @return Implementation of the AI service interface
     */
    public T build();
}

Thread Safety

Builder instances are NOT thread-safe - do not share builders across threads
Create one builder instance per thread or synchronize access externally
Once build() is called, the resulting AI service proxy is thread-safe
All configuration methods return the same builder instance for chaining

Common Pitfalls

DO NOT call build() multiple times on the same builder - behavior is undefined
DO NOT configure both chatModel() and streamingChatModel() - only one should be set
DO NOT pass null to configuration methods - most will throw NullPointerException
DO NOT configure both chatMemory() and chatMemoryProvider() - only one should be set
DO NOT configure both contentRetriever() and retrievalAugmentor() - only one should be set
DO NOT set systemMessage()/systemMessageProvider() and use @SystemMessage annotation - method annotation takes precedence
DO NOT forget to configure at least one chat model before calling build()

Edge Cases

Setting maxSequentialToolsInvocations(0) effectively disables tool execution
Negative values for maxSequentialToolsInvocations throw IllegalArgumentException at build time
Empty tool lists are allowed and result in no tool availability
Registering the same listener multiple times results in multiple invocations
Unregistering a listener that was never registered is a no-op
executeToolsConcurrently() with null executor uses ForkJoinPool.commonPool()
Setting both input and output guardrails creates a full validation pipeline

Performance Notes

Builder construction is lightweight - configuration objects are stored by reference
Most builder methods perform no validation - validation happens at build() time
Listeners are stored in order of registration and invoked sequentially
Tool specifications are analyzed at build time, not at invocation time
Chat request transformers add overhead to every invocation - keep them lightweight

Cost Considerations

System/user messages set via builder apply to all invocations - tokens add up quickly
Larger maxSequentialToolsInvocations allows more tool rounds but increases API calls
executeToolsConcurrently() does not reduce API calls - it only parallelizes tool execution
Moderation adds an extra API call to the moderation model for each invocation
Listeners do not directly incur costs but may slow down request processing

Exception Handling

IllegalArgumentException: Thrown for invalid configuration values (negative limits, null required parameters)
IllegalConfigurationException: Thrown at build() if configuration is inconsistent or incomplete
NullPointerException: Thrown by some methods if null is passed where non-null is required
Configuration errors are eagerly detected at build() time when possible

Related APIs

Chat Models - For chatModel() configuration
Streaming Chat Models - For streamingChatModel() configuration
Chat Memory - For chatMemory() and chatMemoryProvider() configuration
Tools - For tools() and related tool configuration
RAG - For contentRetriever() and retrievalAugmentor() configuration
Guardrails - For guardrails configuration
Models - For listener registration

Annotations

Annotations for configuring AI service methods and parameters.

package dev.langchain4j.service;

/**
 * Specifies complete system message or template to be used on each invocation
 * Can contain template variables resolved with values from @V annotated parameters
 * Takes precedence over systemMessageProvider
 */
@Target({TYPE, METHOD})
@Retention(RUNTIME)
public @interface SystemMessage {
    /**
     * Prompt template (single or multiple lines)
     * @return Template lines
     */
    String[] value();

    /**
     * Delimiter for joining multiple lines (default: "\n")
     * @return Delimiter string
     */
    String delimiter() default "\n";

    /**
     * Resource path to read prompt template
     * @return Resource path
     */
    String fromResource() default "";
}

/**
 * Specifies complete user message or template to be used on each invocation
 * Can contain template variables resolved with values from @V annotated parameters
 * Can be used on methods or parameters
 * Takes precedence over userMessageProvider
 */
@Target({METHOD, PARAMETER})
@Retention(RUNTIME)
public @interface UserMessage {
    /**
     * Prompt template (single or multiple lines)
     * @return Template lines
     */
    String[] value();

    /**
     * Delimiter for joining multiple lines (default: "\n")
     * @return Delimiter string
     */
    String delimiter() default "\n";

    /**
     * Resource path to read prompt template
     * @return Resource path
     */
    String fromResource() default "";
}

/**
 * Annotation for method parameters to mark them as prompt template variables
 * Value will be injected into templates defined in @UserMessage, @SystemMessage,
 * and systemMessageProvider
 * Not necessary when "-parameters" compilation option is enabled or when using
 * Quarkus/Spring Boot
 */
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface V {
    /**
     * Name of variable/placeholder in prompt template
     * @return Variable name
     */
    String value();
}

/**
 * Annotation for method parameters to specify memory ID for finding memory
 * belonging to user/conversation
 * Parameter can be of any type with proper equals()/hashCode() implementation
 */
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface MemoryId {
}

/**
 * Annotation for method parameters to inject value into 'name' field of UserMessage
 */
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface UserName {
}

/**
 * Annotation for methods to enable automatic content moderation
 * When annotated, method invocation will call both LLM and moderation model in parallel
 * If content is flagged, ModerationException is thrown
 */
@Target(METHOD)
@Retention(RUNTIME)
public @interface Moderate {
}

Thread Safety

Annotations are read at build time during proxy generation, not at runtime
No thread safety concerns as annotations are immutable metadata
Template variable substitution is performed per-invocation and is thread-safe

Common Pitfalls

DO NOT use both value() and fromResource() in same annotation - fromResource takes precedence
DO NOT use undefined variables in templates - they will be left as-is (e.g., "{{undefined}}")
DO NOT forget @V annotation when not using -parameters compiler flag - parameter names won't be available
DO NOT use @MemoryId on multiple parameters in the same method - only first one is used
DO NOT use @UserName on multiple parameters - only first one is used
DO NOT apply @SystemMessage at both type and method level expecting concatenation - method-level overrides completely
DO NOT use invalid delimiter strings - any string is valid but unusual delimiters may break templates

Edge Cases

Empty value() array results in empty message
Empty string in value() array contributes empty line (or delimiter if multiple)
fromResource() with non-existent resource throws IllegalConfigurationException at build time
Template variables with special characters (e.g., "{{var.name}}") are supported
Nested braces in templates (e.g., "{{var{{nested}}}}") may cause parsing issues
@UserMessage on parameter overrides method-level @UserMessage for that invocation
@Moderate with no configured moderationModel() throws IllegalConfigurationException at build time

Performance Notes

Template parsing and variable resolution happens per-invocation - keep templates simple
Resource loading via fromResource() is cached after first access
Variable substitution uses string replacement - large templates with many variables have overhead
@Moderate adds parallel API call to moderation model - expect 2x latency for single-threaded usage

Cost Considerations

Larger @SystemMessage templates consume more input tokens on every invocation
Template variables that expand to large strings increase token costs
@Moderate doubles API costs (one chat model call + one moderation model call)
Multi-line messages use delimiter which adds tokens (default "\n" adds minimal tokens)

Exception Handling

IllegalConfigurationException: Thrown at build time if fromResource() path is invalid
ModerationException: Thrown at invocation time if @Moderate flags content
IllegalArgumentException: May be thrown at invocation time if template variables can't be resolved
Missing @V annotation causes variable name to be unavailable unless using -parameters compiler flag

Related APIs

Prompt Templates - For advanced templating
Chat Memory - Used with @MemoryId annotation
Guardrails - Used with @Moderate annotation

Result Type

The Result type provides access to additional information from AI service invocations.

package dev.langchain4j.service;

/**
 * Represents the result of AI Service invocation containing actual content
 * and additional information (token usage, finish reason, sources from RAG,
 * tool executions, intermediate/final responses)
 */
public class Result<T> {
    /**
     * Constructor
     * @param content The actual content/result
     * @param tokenUsage Aggregate token usage
     * @param sources Sources from RAG retrieval
     * @param finishReason Finish reason from model
     * @param toolExecutions All tool executions that occurred
     */
    public Result(
        T content,
        TokenUsage tokenUsage,
        List<Content> sources,
        FinishReason finishReason,
        List<ToolExecution> toolExecutions
    );

    /**
     * Create builder
     * @return Builder instance
     */
    public static <T> ResultBuilder<T> builder();

    /**
     * Get content
     * @return The actual content/result
     */
    public T content();

    /**
     * Get aggregate token usage
     * @return Token usage across all requests
     */
    public TokenUsage tokenUsage();

    /**
     * Get sources from RAG
     * @return List of retrieved content sources
     */
    public List<Content> sources();

    /**
     * Get finish reason
     * @return Finish reason from model
     */
    public FinishReason finishReason();

    /**
     * Get all tool executions
     * @return List of all tool executions
     */
    public List<ToolExecution> toolExecutions();

    /**
     * Get intermediate responses (with tool execution requests)
     * @return List of intermediate chat responses
     */
    public List<ChatResponse> intermediateResponses();

    /**
     * Get final response (without tool execution requests)
     * @return Final chat response
     */
    public ChatResponse finalResponse();
}

Thread Safety

Result instances are immutable and fully thread-safe
Can be safely shared across threads and cached
Collections returned by methods are unmodifiable views

Common Pitfalls

DO NOT assume tokenUsage() is non-null - some models don't report token usage
DO NOT assume sources() is non-empty - only populated when using RAG
DO NOT assume toolExecutions() is non-empty - only populated when tools are used
DO NOT modify returned lists - they are unmodifiable and will throw UnsupportedOperationException
DO NOT assume content() is non-null - it can be null if the model returned no content
DO NOT ignore finishReason() - STOP vs LENGTH vs CONTENT_FILTER indicate different outcomes

Edge Cases

If no tools were executed, toolExecutions() returns empty list (not null)
If RAG is not configured, sources() returns empty list (not null)
If model doesn't support token reporting, tokenUsage() may return null
intermediateResponses() includes all responses with tool execution requests
finalResponse() is the last response without tool execution requests
For simple invocations without tools, intermediateResponses() is empty and finalResponse() is the only response

Performance Notes

Creating Result instances is lightweight - all collections are stored by reference
Accessing fields has constant time complexity
Result objects are immutable so JVM can optimize access patterns
Consider extracting frequently accessed fields to local variables in hot paths

Cost Considerations

tokenUsage() shows aggregate cost across all requests in the invocation (including tool rounds)
Multiple tool execution rounds multiply token costs - monitor toolExecutions().size()
intermediateResponses() size indicates number of model calls beyond the final one
RAG sources don't directly incur model costs but increase input token count

Exception Handling

Result construction does not throw exceptions
Accessing null content requires null-checking by caller
Unmodifiable collections throw UnsupportedOperationException on modification attempts

Related APIs

TokenUsage - For token usage details
Content - For RAG sources
ToolExecution - For tool execution details
ChatResponse - For response details
FinishReason - For finish reason values

TokenStream

Interface for streaming responses from AI services.

package dev.langchain4j.service;

/**
 * Represents token stream from model to subscribe and receive updates
 * when new partial response is available, when streaming finishes, or when error occurs
 * Intended as return type in AI Service
 */
public interface TokenStream {
    /**
     * Handle partial text responses
     * @param partialResponseHandler Consumer for partial response strings
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialResponse(Consumer<String> partialResponseHandler);

    /**
     * Handle partial responses with context (experimental)
     * @param handler BiConsumer for partial response with context
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialResponseWithContext(
        BiConsumer<PartialResponse, PartialResponseContext> handler
    );

    /**
     * Handle partial thinking/reasoning text (experimental)
     * @param partialThinkingHandler Consumer for partial thinking
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialThinking(Consumer<PartialThinking> partialThinkingHandler);

    /**
     * Handle partial thinking with context (experimental)
     * @param handler BiConsumer for partial thinking with context
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialThinkingWithContext(
        BiConsumer<PartialThinking, PartialThinkingContext> handler
    );

    /**
     * Handle partial tool calls (experimental)
     * @param partialToolCallHandler Consumer for partial tool calls
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialToolCall(Consumer<PartialToolCall> partialToolCallHandler);

    /**
     * Handle partial tool calls with context (experimental)
     * @param handler BiConsumer for partial tool calls with context
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialToolCallWithContext(
        BiConsumer<PartialToolCall, PartialToolCallContext> handler
    );

    /**
     * Handle retrieved contents from RAG
     * @param contentHandler Consumer for retrieved content list
     * @return TokenStream instance for chaining
     */
    TokenStream onRetrieved(Consumer<List<Content>> contentHandler);

    /**
     * Handle intermediate chat responses (with tool execution requests)
     * @param intermediateResponseHandler Consumer for intermediate responses
     * @return TokenStream instance for chaining
     */
    TokenStream onIntermediateResponse(Consumer<ChatResponse> intermediateResponseHandler);

    /**
     * Handle before tool execution
     * @param beforeToolExecutionHandler Consumer for before tool execution context
     * @return TokenStream instance for chaining
     */
    TokenStream beforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecutionHandler);

    /**
     * Handle after tool execution
     * @param toolExecuteHandler Consumer for tool execution results
     * @return TokenStream instance for chaining
     */
    TokenStream onToolExecuted(Consumer<ToolExecution> toolExecuteHandler);

    /**
     * Handle final chat response
     * @param completeResponseHandler Consumer for complete response
     * @return TokenStream instance for chaining
     */
    TokenStream onCompleteResponse(Consumer<ChatResponse> completeResponseHandler);

    /**
     * Handle errors
     * @param errorHandler Consumer for throwable errors
     * @return TokenStream instance for chaining
     */
    TokenStream onError(Consumer<Throwable> errorHandler);

    /**
     * Ignore all errors (logged as WARN)
     * @return TokenStream instance for chaining
     */
    TokenStream ignoreErrors();

    /**
     * Start processing and send request to LLM
     * Must be called after registering handlers
     */
    void start();
}

Thread Safety

TokenStream instances are NOT thread-safe for configuration
Handler registration methods must be called from a single thread before start()
Once start() is called, handlers may be invoked from different threads
All registered handlers must be thread-safe

Common Pitfalls

DO NOT forget to call start() - handlers won't execute without it
DO NOT register handlers after calling start() - behavior is undefined
DO NOT call start() multiple times - throws IllegalStateException
DO NOT block in handlers - this will slow down streaming for the user
DO NOT throw exceptions from handlers - wrap in try-catch and handle errors
DO NOT register onError() and ignoreErrors() together - ignoreErrors takes precedence
DO NOT perform long-running operations in handlers - offload to separate threads

Edge Cases

If no onPartialResponse() handler is registered, partial responses are discarded
If no onError() handler is registered and ignoreErrors() is not called, errors propagate to caller
Registering multiple handlers of the same type results in all being called in order
Empty partial responses may be delivered (zero-length strings)
onCompleteResponse() is always called after all partial responses (unless error occurs)
Tool execution callbacks are invoked even if tools fail
RAG onRetrieved() is called before first partial response

Performance Notes

Streaming reduces perceived latency - users see responses as they're generated
Handler execution is synchronous with response processing - keep handlers fast
Consider using reactive/async patterns in handlers for non-blocking processing
Multiple handlers add overhead - combine logic when possible
Partial responses may arrive at high frequency - avoid expensive operations per token

Cost Considerations

Streaming does not reduce token costs - same as non-streaming
Streaming may enable early cancellation in interactive scenarios, saving costs
No additional API calls are made for streaming vs non-streaming
Tool execution in streaming follows same cost model as non-streaming

Exception Handling

IllegalStateException: Thrown if start() is called multiple times
RuntimeException: Errors from model or tools are delivered to onError() handler
If no onError() handler is registered, exceptions propagate to caller
ignoreErrors() suppresses all errors (logged at WARN level)
Handler exceptions are caught and logged but don't stop streaming

Related APIs

Streaming Chat Models - Required for streaming
PartialResponse - For partial response details
ChatResponse - For complete response details
ToolExecution - For tool execution details

Exceptions

Exceptions thrown by AI services.

package dev.langchain4j.service;

/**
 * Exception thrown when AI service is misconfigured
 */
public class IllegalConfigurationException extends LangChain4jException {
    /**
     * Constructor
     * @param message Error message
     */
    public IllegalConfigurationException(String message);

    /**
     * Constructor with cause
     * @param message Error message
     * @param cause Underlying cause
     */
    public IllegalConfigurationException(String message, Throwable cause);
}

/**
 * Exception thrown when moderation model flags content
 */
public class ModerationException extends LangChain4jException {
    /**
     * Constructor
     * @param message Error message
     */
    public ModerationException(String message);

    /**
     * Constructor with cause
     * @param message Error message
     * @param cause Underlying cause
     */
    public ModerationException(String message, Throwable cause);
}

Thread Safety

Exception instances are immutable and thread-safe
Can be safely caught and rethrown across thread boundaries
Stack traces are captured at construction time

Common Pitfalls

DO NOT catch these as generic Exception - catch specific types for proper error handling
DO NOT ignore IllegalConfigurationException - it indicates a programming error that must be fixed
DO NOT suppress ModerationException without logging - it indicates policy violations
DO NOT retry on IllegalConfigurationException - fix the configuration instead

Edge Cases

IllegalConfigurationException is thrown at build time, not invocation time
ModerationException is thrown during invocation if content is flagged
Both exceptions extend LangChain4jException for generic catch blocks
Cause chain may include underlying model provider exceptions

Performance Notes

Exception construction includes stack trace generation - avoid creating exceptions in hot paths
Catching specific exception types is faster than catching generic Exception

Cost Considerations

IllegalConfigurationException incurs no API costs (thrown before API calls)
ModerationException occurs after moderation API call but before main model call

Exception Handling

Catch IllegalConfigurationException at application startup to fail fast
Catch ModerationException at invocation time to handle policy violations
Log both exception types with full stack traces for debugging
Consider wrapping in application-specific exceptions for cleaner API

Related APIs

Guardrails - For moderation and validation configuration

Usage Examples

Basic AI Service

import dev.langchain4j.service.AiServices;

interface Assistant {
    String chat(String message);
}

// Create simple AI service
Assistant assistant = AiServices.create(Assistant.class, chatModel);

try {
    String response = assistant.chat("What is the capital of France?");
    System.out.println(response);
} catch (RuntimeException e) {
    System.err.println("AI service invocation failed: " + e.getMessage());
    throw e;
}

AI Service with Templates

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;

interface Chef {
    @SystemMessage("You are a professional chef with expertise in {{cuisine}} cuisine.")
    @UserMessage("Create a recipe for {{dish}} using {{ingredient}}.")
    String createRecipe(@V("cuisine") String cuisine,
                       @V("dish") String dish,
                       @V("ingredient") String ingredient);
}

Chef chef = AiServices.create(Chef.class, chatModel);

try {
    String recipe = chef.createRecipe("Italian", "pasta", "tomatoes");
    System.out.println(recipe);
} catch (IllegalArgumentException e) {
    System.err.println("Invalid template variables: " + e.getMessage());
}

AI Service with Memory

import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.MemoryId;

interface Assistant {
    String chat(@MemoryId String userId, String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemoryProvider(memoryId -> MessageWindowChatMemory.withMaxMessages(10))
    .build();

try {
    // Different conversations for different users
    String response1 = assistant.chat("user1", "My name is Alice");
    String response2 = assistant.chat("user2", "My name is Bob");
    String response3 = assistant.chat("user1", "What is my name?"); // Will respond "Alice"

    System.out.println("User1 response: " + response3);
} catch (RuntimeException e) {
    System.err.println("Conversation failed: " + e.getMessage());
    // Handle error appropriately
}

AI Service with Tools

import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.service.AiServices;

class WeatherService {
    @Tool("Get current weather for a location")
    String getWeather(String location) {
        // Implementation with error handling
        try {
            // Call weather API
            return "Sunny, 72°F";
        } catch (Exception e) {
            return "Weather data unavailable for " + location;
        }
    }
}

interface Assistant {
    String chat(String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new WeatherService())
    .toolExecutionErrorHandler((toolExecutionRequest, throwable) -> {
        System.err.println("Tool execution failed: " + throwable.getMessage());
        return "Tool execution failed: " + throwable.getMessage();
    })
    .build();

try {
    String response = assistant.chat("What's the weather in New York?");
    System.out.println(response);
} catch (RuntimeException e) {
    System.err.println("AI service with tools failed: " + e.getMessage());
}

Streaming AI Service

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.TokenStream;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.atomic.AtomicReference;

interface Assistant {
    TokenStream chat(String message);
}

Assistant assistant = AiServices.create(Assistant.class, streamingChatModel);

CompletableFuture<String> future = new CompletableFuture<>();
AtomicReference<StringBuilder> responseBuilder = new AtomicReference<>(new StringBuilder());

assistant.chat("Tell me a story")
    .onPartialResponse(token -> {
        System.out.print(token);
        responseBuilder.get().append(token);
    })
    .onCompleteResponse(response -> {
        System.out.println("\nDone!");
        future.complete(responseBuilder.get().toString());
    })
    .onError(throwable -> {
        System.err.println("\nError: " + throwable.getMessage());
        throwable.printStackTrace();
        future.completeExceptionally(throwable);
    })
    .start();

try {
    String fullResponse = future.get(); // Wait for completion
} catch (Exception e) {
    System.err.println("Streaming failed: " + e.getMessage());
}

AI Service with Result Type

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.Result;

interface Assistant {
    Result<String> chat(String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new WeatherService())
    .build();

try {
    Result<String> result = assistant.chat("What's the weather?");

    System.out.println("Content: " + result.content());

    if (result.tokenUsage() != null) {
        System.out.println("Token usage: " + result.tokenUsage());
        System.out.println("Input tokens: " + result.tokenUsage().inputTokenCount());
        System.out.println("Output tokens: " + result.tokenUsage().outputTokenCount());
    }

    if (!result.toolExecutions().isEmpty()) {
        System.out.println("Tool executions: " + result.toolExecutions().size());
        result.toolExecutions().forEach(te ->
            System.out.println("  - " + te.toolName() + ": " + te.result())
        );
    }

    System.out.println("Finish reason: " + result.finishReason());
} catch (RuntimeException e) {
    System.err.println("AI service invocation failed: " + e.getMessage());
}

AI Service with RAG

import dev.langchain4j.service.AiServices;

interface Assistant {
    String chat(String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .contentRetriever(contentRetriever)
    .build();

try {
    String response = assistant.chat("What does the documentation say about X?");
    System.out.println(response);
} catch (RuntimeException e) {
    System.err.println("RAG query failed: " + e.getMessage());
}

Testing Patterns

Mocking AI Services with Mockito

import org.junit.jupiter.api.Test;
import org.mockito.Mockito;
import static org.mockito.ArgumentMatchers.*;
import static org.junit.jupiter.api.Assertions.*;

interface Assistant {
    String chat(String message);
}

@Test
void testWithMockedAssistant() {
    // Create mock
    Assistant assistant = Mockito.mock(Assistant.class);

    // Define behavior
    Mockito.when(assistant.chat(anyString()))
        .thenReturn("Mocked response");

    // Test
    String response = assistant.chat("Hello");
    assertEquals("Mocked response", response);

    // Verify
    Mockito.verify(assistant, Mockito.times(1)).chat("Hello");
}

Testing with Fake Chat Model

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.output.Response;
import org.junit.jupiter.api.Test;

class FakeChatModel implements ChatLanguageModel {
    private final String fixedResponse;

    public FakeChatModel(String fixedResponse) {
        this.fixedResponse = fixedResponse;
    }

    @Override
    public Response<AiMessage> generate(List<ChatMessage> messages) {
        return Response.from(AiMessage.from(fixedResponse));
    }
}

@Test
void testWithFakeChatModel() {
    ChatLanguageModel model = new FakeChatModel("Test response");

    Assistant assistant = AiServices.create(Assistant.class, model);
    String response = assistant.chat("Any message");

    assertEquals("Test response", response);
}

Testing Memory Isolation

import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

interface Assistant {
    String chat(@MemoryId String userId, String message);
}

@Test
void testMemoryIsolation() {
    ChatLanguageModel model = new FakeChatModel("Echo");

    Assistant assistant = AiServices.builder(Assistant.class)
        .chatModel(model)
        .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(10))
        .build();

    // Verify different users have isolated memory
    assistant.chat("user1", "Message from user1");
    assistant.chat("user2", "Message from user2");

    // Memory should be separate per user
    // Add assertions based on your implementation
}

Testing Tool Execution

import dev.langchain4j.agent.tool.Tool;
import org.junit.jupiter.api.Test;
import org.mockito.Mockito;
import static org.mockito.Mockito.*;

class WeatherService {
    @Tool("Get weather")
    String getWeather(String location) {
        return "Sunny";
    }
}

@Test
void testToolExecution() {
    WeatherService weatherService = Mockito.spy(new WeatherService());

    Assistant assistant = AiServices.builder(Assistant.class)
        .chatModel(chatModel)
        .tools(weatherService)
        .build();

    assistant.chat("What's the weather in Paris?");

    // Verify tool was called
    verify(weatherService, atLeastOnce()).getWeather(anyString());
}

Testing Error Handling

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

@Test
void testToolExecutionError() {
    class FailingTool {
        @Tool("Failing tool")
        String fail(String input) {
            throw new RuntimeException("Tool failure");
        }
    }

    AtomicBoolean errorHandlerCalled = new AtomicBoolean(false);

    Assistant assistant = AiServices.builder(Assistant.class)
        .chatModel(chatModel)
        .tools(new FailingTool())
        .toolExecutionErrorHandler((request, throwable) -> {
            errorHandlerCalled.set(true);
            return "Error handled: " + throwable.getMessage();
        })
        .build();

    String response = assistant.chat("Trigger tool");

    assertTrue(errorHandlerCalled.get(), "Error handler should be called");
}

Integration Testing with TestContainers

import org.junit.jupiter.api.Test;
import org.testcontainers.containers.GenericContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;

@Testcontainers
class AiServiceIntegrationTest {
    @Container
    private static GenericContainer<?> ollama = new GenericContainer<>("ollama/ollama:latest")
        .withExposedPorts(11434);

    @Test
    void testWithRealModel() {
        String baseUrl = "http://" + ollama.getHost() + ":" + ollama.getFirstMappedPort();

        ChatLanguageModel model = OllamaChatModel.builder()
            .baseUrl(baseUrl)
            .modelName("llama2")
            .build();

        Assistant assistant = AiServices.create(Assistant.class, model);
        String response = assistant.chat("Hello");

        assertNotNull(response);
        assertFalse(response.isEmpty());
    }
}

Error Recovery Patterns

Retry with Exponential Backoff

import dev.langchain4j.service.AiServices;
import java.time.Duration;

interface Assistant {
    String chat(String message);
}

class RetryableAiService {
    private final Assistant assistant;
    private final int maxRetries;
    private final Duration initialDelay;

    public RetryableAiService(Assistant assistant, int maxRetries, Duration initialDelay) {
        this.assistant = assistant;
        this.maxRetries = maxRetries;
        this.initialDelay = initialDelay;
    }

    public String chatWithRetry(String message) {
        int attempt = 0;
        Duration delay = initialDelay;

        while (attempt < maxRetries) {
            try {
                return assistant.chat(message);
            } catch (RuntimeException e) {
                attempt++;
                if (attempt >= maxRetries) {
                    throw new RuntimeException("Failed after " + maxRetries + " attempts", e);
                }

                System.err.println("Attempt " + attempt + " failed, retrying in " + delay.toMillis() + "ms");

                try {
                    Thread.sleep(delay.toMillis());
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException("Interrupted during retry", ie);
                }

                delay = delay.multipliedBy(2); // Exponential backoff
            }
        }

        throw new RuntimeException("Should not reach here");
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
RetryableAiService retryable = new RetryableAiService(assistant, 3, Duration.ofSeconds(1));
String response = retryable.chatWithRetry("Hello");

Circuit Breaker Pattern

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;

class CircuitBreaker {
    private enum State { CLOSED, OPEN, HALF_OPEN }

    private final Assistant assistant;
    private final int failureThreshold;
    private final Duration timeout;
    private final AtomicInteger failureCount = new AtomicInteger(0);
    private final AtomicReference<State> state = new AtomicReference<>(State.CLOSED);
    private final AtomicReference<Instant> lastFailureTime = new AtomicReference<>();

    public CircuitBreaker(Assistant assistant, int failureThreshold, Duration timeout) {
        this.assistant = assistant;
        this.failureThreshold = failureThreshold;
        this.timeout = timeout;
    }

    public String chat(String message) {
        if (state.get() == State.OPEN) {
            if (Instant.now().isAfter(lastFailureTime.get().plus(timeout))) {
                state.set(State.HALF_OPEN);
                System.out.println("Circuit breaker: transitioning to HALF_OPEN");
            } else {
                throw new RuntimeException("Circuit breaker is OPEN");
            }
        }

        try {
            String response = assistant.chat(message);
            onSuccess();
            return response;
        } catch (RuntimeException e) {
            onFailure();
            throw e;
        }
    }

    private void onSuccess() {
        failureCount.set(0);
        state.set(State.CLOSED);
    }

    private void onFailure() {
        int failures = failureCount.incrementAndGet();
        lastFailureTime.set(Instant.now());

        if (failures >= failureThreshold) {
            state.set(State.OPEN);
            System.err.println("Circuit breaker: transitioning to OPEN after " + failures + " failures");
        }
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
CircuitBreaker circuitBreaker = new CircuitBreaker(assistant, 3, Duration.ofMinutes(1));

try {
    String response = circuitBreaker.chat("Hello");
} catch (RuntimeException e) {
    System.err.println("Circuit breaker prevented call or call failed: " + e.getMessage());
}

Fallback Response Pattern

interface Assistant {
    String chat(String message);
}

class FallbackAiService {
    private final Assistant primary;
    private final Assistant fallback;

    public FallbackAiService(Assistant primary, Assistant fallback) {
        this.primary = primary;
        this.fallback = fallback;
    }

    public String chat(String message) {
        try {
            return primary.chat(message);
        } catch (RuntimeException e) {
            System.err.println("Primary service failed: " + e.getMessage());
            System.err.println("Falling back to secondary service");

            try {
                return fallback.chat(message);
            } catch (RuntimeException fallbackException) {
                System.err.println("Fallback service also failed: " + fallbackException.getMessage());
                return "I apologize, but I'm currently unable to process your request. Please try again later.";
            }
        }
    }
}

// Usage
Assistant primary = AiServices.create(Assistant.class, primaryChatModel);
Assistant fallback = AiServices.create(Assistant.class, fallbackChatModel);
FallbackAiService service = new FallbackAiService(primary, fallback);

String response = service.chat("Hello");

Timeout Pattern

import java.util.concurrent.*;

class TimeoutAiService {
    private final Assistant assistant;
    private final Duration timeout;
    private final ExecutorService executor;

    public TimeoutAiService(Assistant assistant, Duration timeout) {
        this.assistant = assistant;
        this.timeout = timeout;
        this.executor = Executors.newCachedThreadPool();
    }

    public String chatWithTimeout(String message) {
        Future<String> future = executor.submit(() -> assistant.chat(message));

        try {
            return future.get(timeout.toMillis(), TimeUnit.MILLISECONDS);
        } catch (TimeoutException e) {
            future.cancel(true);
            throw new RuntimeException("Request timed out after " + timeout.toMillis() + "ms", e);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new RuntimeException("Request was interrupted", e);
        } catch (ExecutionException e) {
            throw new RuntimeException("Request failed", e.getCause());
        }
    }

    public void shutdown() {
        executor.shutdown();
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
TimeoutAiService timeoutService = new TimeoutAiService(assistant, Duration.ofSeconds(30));

try {
    String response = timeoutService.chatWithTimeout("Hello");
} catch (RuntimeException e) {
    System.err.println("Request failed or timed out: " + e.getMessage());
} finally {
    timeoutService.shutdown();
}

Graceful Degradation with Caching

import java.util.concurrent.ConcurrentHashMap;
import java.util.Map;

class CachedAiService {
    private final Assistant assistant;
    private final Map<String, String> cache = new ConcurrentHashMap<>();
    private final boolean useCacheOnError;

    public CachedAiService(Assistant assistant, boolean useCacheOnError) {
        this.assistant = assistant;
        this.useCacheOnError = useCacheOnError;
    }

    public String chat(String message) {
        // Check cache first
        String cached = cache.get(message);

        try {
            String response = assistant.chat(message);
            cache.put(message, response);
            return response;
        } catch (RuntimeException e) {
            if (useCacheOnError && cached != null) {
                System.err.println("Service failed, returning cached response: " + e.getMessage());
                return cached + " [cached]";
            }
            throw e;
        }
    }

    public void clearCache() {
        cache.clear();
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
CachedAiService cachedService = new CachedAiService(assistant, true);

String response = cachedService.chat("What is 2+2?");
// On subsequent failures, cached response is returned

Integration Patterns

Spring Boot Integration

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.beans.factory.annotation.Value;

@Configuration
public class AiServiceConfig {

    @Bean
    public ChatLanguageModel chatLanguageModel(
            @Value("${openai.api.key}") String apiKey,
            @Value("${openai.model.name:gpt-4}") String modelName) {
        return OpenAiChatModel.builder()
            .apiKey(apiKey)
            .modelName(modelName)
            .temperature(0.7)
            .timeout(Duration.ofSeconds(60))
            .build();
    }

    @Bean
    public ChatMemoryProvider chatMemoryProvider() {
        return memoryId -> MessageWindowChatMemory.withMaxMessages(10);
    }

    @Bean
    public Assistant assistant(
            ChatLanguageModel chatModel,
            ChatMemoryProvider chatMemoryProvider,
            List<Object> tools) {
        return AiServices.builder(Assistant.class)
            .chatModel(chatModel)
            .chatMemoryProvider(chatMemoryProvider)
            .tools(tools)
            .build();
    }

    @Bean
    public Object weatherTool() {
        return new WeatherService();
    }
}

// Controller
@RestController
@RequestMapping("/api/chat")
public class ChatController {

    private final Assistant assistant;

    public ChatController(Assistant assistant) {
        this.assistant = assistant;
    }

    @PostMapping
    public ResponseEntity<ChatResponse> chat(
            @RequestParam String userId,
            @RequestBody ChatRequest request) {
        try {
            String response = assistant.chat(userId, request.getMessage());
            return ResponseEntity.ok(new ChatResponse(response));
        } catch (Exception e) {
            return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(new ChatResponse("Error: " + e.getMessage()));
        }
    }
}

Spring Boot with Result Type

import dev.langchain4j.service.Result;
import org.springframework.web.bind.annotation.*;

interface AssistantWithMetadata {
    Result<String> chat(@MemoryId String userId, String message);
}

@Configuration
public class AiServiceWithResultConfig {

    @Bean
    public AssistantWithMetadata assistantWithMetadata(
            ChatLanguageModel chatModel,
            ChatMemoryProvider chatMemoryProvider) {
        return AiServices.builder(AssistantWithMetadata.class)
            .chatModel(chatModel)
            .chatMemoryProvider(chatMemoryProvider)
            .build();
    }
}

@RestController
@RequestMapping("/api/chat")
public class ChatControllerWithMetadata {

    private final AssistantWithMetadata assistant;

    public ChatControllerWithMetadata(AssistantWithMetadata assistant) {
        this.assistant = assistant;
    }

    @PostMapping("/detailed")
    public ResponseEntity<DetailedChatResponse> chatWithMetadata(
            @RequestParam String userId,
            @RequestBody ChatRequest request) {
        try {
            Result<String> result = assistant.chat(userId, request.getMessage());

            return ResponseEntity.ok(DetailedChatResponse.builder()
                .content(result.content())
                .tokenUsage(result.tokenUsage())
                .finishReason(result.finishReason())
                .toolExecutions(result.toolExecutions())
                .build());
        } catch (Exception e) {
            return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(DetailedChatResponse.error(e.getMessage()));
        }
    }
}

Quarkus Integration

import javax.enterprise.context.ApplicationScoped;
import javax.enterprise.inject.Produces;
import org.eclipse.microprofile.config.inject.ConfigProperty;

@ApplicationScoped
public class AiServiceProducer {

    @ConfigProperty(name = "openai.api.key")
    String apiKey;

    @ConfigProperty(name = "openai.model.name", defaultValue = "gpt-4")
    String modelName;

    @Produces
    @ApplicationScoped
    public ChatLanguageModel chatLanguageModel() {
        return OpenAiChatModel.builder()
            .apiKey(apiKey)
            .modelName(modelName)
            .build();
    }

    @Produces
    @ApplicationScoped
    public Assistant assistant(ChatLanguageModel chatModel) {
        return AiServices.builder(Assistant.class)
            .chatModel(chatModel)
            .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(10))
            .build();
    }
}

// Resource
@Path("/chat")
@ApplicationScoped
public class ChatResource {

    @Inject
    Assistant assistant;

    @POST
    @Produces(MediaType.APPLICATION_JSON)
    @Consumes(MediaType.APPLICATION_JSON)
    public Response chat(@QueryParam("userId") String userId, ChatRequest request) {
        try {
            String response = assistant.chat(userId, request.getMessage());
            return Response.ok(new ChatResponse(response)).build();
        } catch (Exception e) {
            return Response
                .serverError()
                .entity(Map.of("error", e.getMessage()))
                .build();
        }
    }
}

Reactive Streaming with Project Reactor

import reactor.core.publisher.Flux;
import reactor.core.publisher.Sinks;

interface StreamingAssistant {
    TokenStream chat(String message);
}

@Service
public class ReactiveAiService {

    private final StreamingAssistant assistant;

    public ReactiveAiService(StreamingChatModel streamingChatModel) {
        this.assistant = AiServices.create(StreamingAssistant.class, streamingChatModel);
    }

    public Flux<String> chatReactive(String message) {
        Sinks.Many<String> sink = Sinks.many().multicast().onBackpressureBuffer();

        assistant.chat(message)
            .onPartialResponse(sink::tryEmitNext)
            .onCompleteResponse(response -> sink.tryEmitComplete())
            .onError(throwable -> sink.tryEmitError(throwable))
            .start();

        return sink.asFlux();
    }
}

// Controller
@RestController
@RequestMapping("/api/stream")
public class StreamingChatController {

    private final ReactiveAiService aiService;

    public StreamingChatController(ReactiveAiService aiService) {
        this.aiService = aiService;
    }

    @GetMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestParam String message) {
        return aiService.chatReactive(message)
            .onErrorResume(throwable -> {
                return Flux.just("Error: " + throwable.getMessage());
            });
    }
}

Kubernetes Deployment Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: ai-service-config
data:
  application.properties: |
    openai.api.key=${OPENAI_API_KEY}
    openai.model.name=gpt-4
    openai.timeout=60s
    chat.memory.max.messages=10

---
apiVersion: v1
kind: Secret
metadata:
  name: ai-service-secrets
type: Opaque
stringData:
  openai-api-key: "your-api-key-here"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-service
  template:
    metadata:
      labels:
        app: ai-service
    spec:
      containers:
      - name: ai-service
        image: your-registry/ai-service:latest
        ports:
        - containerPort: 8080
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: ai-service-secrets
              key: openai-api-key
        volumeMounts:
        - name: config
          mountPath: /config
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 20
          periodSeconds: 5
      volumes:
      - name: config
        configMap:
          name: ai-service-config

---
apiVersion: v1
kind: Service
metadata:
  name: ai-service
spec:
  selector:
    app: ai-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

Observability with Micrometer

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.stereotype.Service;

@Service
public class ObservableAiService {

    private final Assistant assistant;
    private final MeterRegistry meterRegistry;

    public ObservableAiService(Assistant assistant, MeterRegistry meterRegistry) {
        this.assistant = assistant;
        this.meterRegistry = meterRegistry;
    }

    public String chat(String userId, String message) {
        Timer.Sample sample = Timer.start(meterRegistry);

        try {
            String response = assistant.chat(userId, message);

            sample.stop(Timer.builder("ai.service.request")
                .tag("status", "success")
                .tag("user", userId)
                .register(meterRegistry));

            meterRegistry.counter("ai.service.requests.total",
                "status", "success").increment();

            return response;
        } catch (Exception e) {
            sample.stop(Timer.builder("ai.service.request")
                .tag("status", "error")
                .tag("user", userId)
                .tag("error", e.getClass().getSimpleName())
                .register(meterRegistry));

            meterRegistry.counter("ai.service.requests.total",
                "status", "error",
                "error", e.getClass().getSimpleName()).increment();

            throw e;
        }
    }
}

Related APIs

Chat Models - Required for non-streaming AI services
Streaming Chat Models - Required for streaming AI services
Chat Memory - For managing conversation history
Tools - For function calling and tool execution
RAG - For retrieval-augmented generation
Guardrails - For input/output validation and moderation
Prompt Templates - For advanced prompt engineering
Token Usage - For monitoring API costs
Models - For observability and lifecycle hooks

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j@1.11.0

docs

document-processing.md

tessl/maven-dev-langchain4j--langchain4j

ai-services.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

AI Services

Capabilities

AiServices Class

Thread Safety

Common Pitfalls

Edge Cases

Performance Notes

Cost Considerations

Exception Handling

Related APIs

AiServices Builder

Thread Safety

Common Pitfalls

Edge Cases

Performance Notes

Cost Considerations

Exception Handling

Related APIs

Annotations

Thread Safety

Common Pitfalls

Edge Cases

Performance Notes

Cost Considerations

Exception Handling

Related APIs

Result Type

Thread Safety

Common Pitfalls

Edge Cases

Performance Notes

Cost Considerations

Exception Handling

Related APIs

TokenStream

Thread Safety

Common Pitfalls

Edge Cases

Performance Notes

Cost Considerations

Exception Handling

Related APIs

Exceptions

Thread Safety

Common Pitfalls

Edge Cases

Performance Notes

Cost Considerations

Exception Handling

Related APIs

Usage Examples

Basic AI Service

AI Service with Templates

AI Service with Memory

AI Service with Tools

Streaming AI Service

AI Service with Result Type

AI Service with RAG

Testing Patterns

Mocking AI Services with Mockito

Testing with Fake Chat Model

Testing Memory Isolation

Testing Tool Execution

Testing Error Handling

Integration Testing with TestContainers

Error Recovery Patterns

Retry with Exponential Backoff

Circuit Breaker Pattern

Fallback Response Pattern

Timeout Pattern

Graceful Degradation with Caching

Integration Patterns

Spring Boot Integration

Spring Boot with Result Type

Quarkus Integration

Reactive Streaming with Project Reactor

Kubernetes Deployment Configuration

Observability with Micrometer

ai-services.mddocs/