CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j

Build LLM-powered applications in Java with support for chatbots, agents, RAG, tools, and much more

Overview
Eval results
Files

ai-services.mddocs/

AI Services

High-level API for creating AI-powered services by defining Java interfaces. AiServices provides implementations that automatically handle chat models, streaming, memory management, RAG, tools, guardrails, and various output types.

Capabilities

AiServices Class

The primary entry point for building AI services from Java interfaces.

package dev.langchain4j.service;

/**
 * Abstract class for building AI services from Java interfaces.
 * Supports system/user message templates, chat memory, RAG, tools, streaming,
 * moderation, and various return types.
 */
public abstract class AiServices<T> {
    /**
     * Create a simple AI service with a chat model
     * @param aiService Interface defining the AI service API
     * @param chatModel Chat model to use
     * @return Implementation of the AI service interface
     */
    public static <T> T create(Class<T> aiService, ChatModel chatModel);

    /**
     * Create a simple AI service with a streaming chat model
     * @param aiService Interface defining the AI service API
     * @param streamingChatModel Streaming chat model to use
     * @return Implementation of the AI service interface
     */
    public static <T> T create(Class<T> aiService, StreamingChatModel streamingChatModel);

    /**
     * Begin building an AI service with full configuration options
     * @param aiService Interface defining the AI service API
     * @return Builder for configuring the AI service
     */
    public static <T> AiServices<T> builder(Class<T> aiService);
}

Thread Safety

  • The AiServices.create() and AiServices.builder().build() methods return thread-safe proxy instances
  • Multiple threads can safely invoke methods on the same AI service instance concurrently
  • ChatMemory: When using shared chatMemory(), ensure the ChatMemory implementation is thread-safe (MessageWindowChatMemory is thread-safe)
  • ChatMemoryProvider: When using chatMemoryProvider(), the provider itself must be thread-safe and return thread-safe memory instances per memoryId
  • Tools: Tool objects passed to tools() must be thread-safe if concurrent invocations are expected
  • Listeners: All registered listeners must be thread-safe as they may be invoked from multiple threads

Common Pitfalls

  • DO NOT pass null for the interface class - throws IllegalArgumentException
  • DO NOT use non-interface classes - only Java interfaces are supported
  • DO NOT call create() with both chatModel and streamingChatModel null - at least one must be provided
  • DO NOT forget to call build() on the builder - the builder itself is not an AI service instance
  • DO NOT reuse builder instances across multiple threads - builders are not thread-safe
  • DO NOT mutate tool objects after passing them to the builder - this can cause unpredictable behavior

Edge Cases

  • If the interface has no methods, the service builds successfully but has no functionality
  • If the interface extends other interfaces, all inherited methods are supported
  • Default methods in the interface are ignored - only abstract methods are implemented
  • Generic type parameters in the interface are supported but type erasure applies at runtime
  • If interface methods have conflicting annotations (@SystemMessage at both type and method level), method-level takes precedence

Performance Notes

  • Creating an AI service instance is expensive (reflection, proxy generation) - reuse instances when possible
  • Consider caching AI service instances in singleton or application scope
  • Builder configuration is validated at build() time, not during individual setter calls
  • The proxy implementation uses JDK dynamic proxies, which have minimal overhead per invocation

Cost Considerations

  • Each method invocation typically results in one or more API calls to the underlying chat model
  • Token usage depends on: system message size, user message size, chat history, RAG content, and tool definitions
  • Using chatMemory() or chatMemoryProvider() increases token usage with each additional message in history
  • RAG integration (contentRetriever()) adds retrieved content tokens to each request
  • Tools add function definitions to every request, increasing input token count
  • Streaming does not reduce token costs - it only changes response delivery mechanism

Exception Handling

  • IllegalArgumentException: Thrown if aiService is null, not an interface, or if chatModel/streamingChatModel are both null
  • IllegalConfigurationException: Thrown at build() time if configuration is invalid
  • RuntimeException: Method invocations may throw if the underlying model throws
  • Tool execution errors are handled by toolExecutionErrorHandler() if configured
  • Moderation failures throw ModerationException if @Moderate annotation is used

Related APIs

  • Chat Models - Required for non-streaming services
  • Streaming Chat Models - Required for streaming services
  • Chat Memory - For conversation history
  • Tools - For function calling
  • RAG - For retrieval-augmented generation
  • Guardrails - For input/output validation

AiServices Builder

Complete builder API for configuring AI services with all available options.

/**
 * Builder for configuring AI services
 */
public class Builder<T> {
    /**
     * Configure chat model
     * @param chatModel Chat model to use
     * @return Builder instance
     */
    public Builder<T> chatModel(ChatModel chatModel);

    /**
     * Configure streaming chat model
     * @param streamingChatModel Streaming chat model to use
     * @return Builder instance
     */
    public Builder<T> streamingChatModel(StreamingChatModel streamingChatModel);

    /**
     * Set system message for all invocations
     * @param systemMessage System message text
     * @return Builder instance
     */
    public Builder<T> systemMessage(String systemMessage);

    /**
     * Set system message provider function
     * @param systemMessageProvider Function to provide system message
     * @return Builder instance
     */
    public Builder<T> systemMessageProvider(Function<Object, String> systemMessageProvider);

    /**
     * Set user message for all invocations
     * @param userMessage User message text
     * @return Builder instance
     */
    public Builder<T> userMessage(String userMessage);

    /**
     * Set user message provider function
     * @param userMessageProvider Function to provide user message
     * @return Builder instance
     */
    public Builder<T> userMessageProvider(Function<Object, String> userMessageProvider);

    /**
     * Set shared chat memory
     * @param chatMemory Chat memory instance
     * @return Builder instance
     */
    public Builder<T> chatMemory(ChatMemory chatMemory);

    /**
     * Set chat memory provider for per-user/conversation memory
     * @param chatMemoryProvider Chat memory provider
     * @return Builder instance
     */
    public Builder<T> chatMemoryProvider(ChatMemoryProvider chatMemoryProvider);

    /**
     * Set chat request transformer
     * @param chatRequestTransformer Transformer to modify requests
     * @return Builder instance
     */
    public Builder<T> chatRequestTransformer(UnaryOperator<ChatRequest> chatRequestTransformer);

    /**
     * Set chat request transformer with memory ID
     * @param chatRequestTransformer Transformer with memory ID parameter
     * @return Builder instance
     */
    public Builder<T> chatRequestTransformer(
        BiFunction<ChatRequest, Object, ChatRequest> chatRequestTransformer
    );

    /**
     * Set moderation model for content moderation
     * @param moderationModel Moderation model to use
     * @return Builder instance
     */
    public Builder<T> moderationModel(ModerationModel moderationModel);

    /**
     * Configure tools (objects with @Tool annotated methods)
     * @param objectsWithTools Objects containing tool methods
     * @return Builder instance
     */
    public Builder<T> tools(Object... objectsWithTools);

    /**
     * Configure tools from collection
     * @param objectsWithTools Collection of objects containing tool methods
     * @return Builder instance
     */
    public Builder<T> tools(Collection<Object> objectsWithTools);

    /**
     * Configure tools programmatically
     * @param tools Map of tool specifications to executors
     * @return Builder instance
     */
    public Builder<T> tools(Map<ToolSpecification, ToolExecutor> tools);

    /**
     * Configure tools with immediate return names
     * @param tools Map of tool specifications to executors
     * @param immediateReturnToolNames Set of tool names that return immediately
     * @return Builder instance
     */
    public Builder<T> tools(
        Map<ToolSpecification, ToolExecutor> tools,
        Set<String> immediateReturnToolNames
    );

    /**
     * Configure tool provider for dynamic tool selection
     * @param toolProvider Tool provider instance
     * @return Builder instance
     */
    public Builder<T> toolProvider(ToolProvider toolProvider);

    /**
     * Enable concurrent tool execution with default executor
     * @return Builder instance
     */
    public Builder<T> executeToolsConcurrently();

    /**
     * Enable concurrent tool execution with custom executor
     * @param executor Executor for concurrent tool execution
     * @return Builder instance
     */
    public Builder<T> executeToolsConcurrently(Executor executor);

    /**
     * Set max sequential tool invocations (default: 100)
     * @param maxSequentialToolsInvocations Maximum number of sequential tool invocations
     * @return Builder instance
     */
    public Builder<T> maxSequentialToolsInvocations(int maxSequentialToolsInvocations);

    /**
     * Set before tool execution callback
     * @param beforeToolExecution Callback to invoke before tool execution
     * @return Builder instance
     */
    public Builder<T> beforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecution);

    /**
     * Set after tool execution callback
     * @param afterToolExecution Callback to invoke after tool execution
     * @return Builder instance
     */
    public Builder<T> afterToolExecution(Consumer<ToolExecution> afterToolExecution);

    /**
     * Set strategy for handling hallucinated tool names
     * @param hallucinatedToolNameStrategy Strategy function
     * @return Builder instance
     */
    public Builder<T> hallucinatedToolNameStrategy(
        Function<ToolExecutionRequest, ToolExecutionResultMessage> hallucinatedToolNameStrategy
    );

    /**
     * Set handler for tool argument errors (JSON parsing, type mismatches)
     * @param handler Tool arguments error handler
     * @return Builder instance
     */
    public Builder<T> toolArgumentsErrorHandler(ToolArgumentsErrorHandler handler);

    /**
     * Set handler for tool execution errors
     * @param handler Tool execution error handler
     * @return Builder instance
     */
    public Builder<T> toolExecutionErrorHandler(ToolExecutionErrorHandler handler);

    /**
     * Configure content retriever for RAG
     * @param contentRetriever Content retriever instance
     * @return Builder instance
     */
    public Builder<T> contentRetriever(ContentRetriever contentRetriever);

    /**
     * Configure retrieval augmentor for RAG
     * @param retrievalAugmentor Retrieval augmentor instance
     * @return Builder instance
     */
    public Builder<T> retrievalAugmentor(RetrievalAugmentor retrievalAugmentor);

    /**
     * Register AI service listener
     * @param listener Listener to register
     * @return Builder instance
     */
    public <I> Builder<T> registerListener(AiServiceListener<I> listener);

    /**
     * Register multiple AI service listeners
     * @param listeners Listeners to register
     * @return Builder instance
     */
    public Builder<T> registerListeners(AiServiceListener<?>... listeners);

    /**
     * Register listener collection
     * @param listeners Collection of listeners to register
     * @return Builder instance
     */
    public Builder<T> registerListeners(Collection<? extends AiServiceListener<?>> listeners);

    /**
     * Unregister AI service listener
     * @param listener Listener to unregister
     * @return Builder instance
     */
    public <I> Builder<T> unregisterListener(AiServiceListener<I> listener);

    /**
     * Unregister multiple listeners
     * @param listeners Listeners to unregister
     * @return Builder instance
     */
    public Builder<T> unregisterListeners(AiServiceListener<?>... listeners);

    /**
     * Configure input guardrails
     * @param inputGuardrailsConfig Input guardrails configuration
     * @return Builder instance
     */
    public Builder<T> inputGuardrailsConfig(InputGuardrailsConfig inputGuardrailsConfig);

    /**
     * Configure output guardrails
     * @param outputGuardrailsConfig Output guardrails configuration
     * @return Builder instance
     */
    public Builder<T> outputGuardrailsConfig(OutputGuardrailsConfig outputGuardrailsConfig);

    /**
     * Set input guardrail classes
     * @param guardrailClasses List of guardrail classes
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrailClasses(List<Class<? extends I>> guardrailClasses);

    /**
     * Set input guardrail classes (varargs)
     * @param guardrailClasses Guardrail classes
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrailClasses(Class<? extends I>... guardrailClasses);

    /**
     * Set input guardrails
     * @param guardrails List of guardrails
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrails(List<I> guardrails);

    /**
     * Set input guardrails (varargs)
     * @param guardrails Guardrails
     * @return Builder instance
     */
    public <I> Builder<T> inputGuardrails(I... guardrails);

    /**
     * Set output guardrail classes
     * @param guardrailClasses List of guardrail classes
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrailClasses(List<Class<? extends O>> guardrailClasses);

    /**
     * Set output guardrail classes (varargs)
     * @param guardrailClasses Guardrail classes
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrailClasses(Class<? extends O>... guardrailClasses);

    /**
     * Set output guardrails
     * @param guardrails List of guardrails
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrails(List<O> guardrails);

    /**
     * Set output guardrails (varargs)
     * @param guardrails Guardrails
     * @return Builder instance
     */
    public <O> Builder<T> outputGuardrails(O... guardrails);

    /**
     * Configure whether to store RAG-augmented messages in chat memory
     * Default is true
     * @param storeRetrievedContentInChatMemory Whether to store retrieved content
     * @return Builder instance
     */
    public Builder<T> storeRetrievedContentInChatMemory(
        boolean storeRetrievedContentInChatMemory
    );

    /**
     * Build the AI service
     * @return Implementation of the AI service interface
     */
    public T build();
}

Thread Safety

  • Builder instances are NOT thread-safe - do not share builders across threads
  • Create one builder instance per thread or synchronize access externally
  • Once build() is called, the resulting AI service proxy is thread-safe
  • All configuration methods return the same builder instance for chaining

Common Pitfalls

  • DO NOT call build() multiple times on the same builder - behavior is undefined
  • DO NOT configure both chatModel() and streamingChatModel() - only one should be set
  • DO NOT pass null to configuration methods - most will throw NullPointerException
  • DO NOT configure both chatMemory() and chatMemoryProvider() - only one should be set
  • DO NOT configure both contentRetriever() and retrievalAugmentor() - only one should be set
  • DO NOT set systemMessage()/systemMessageProvider() and use @SystemMessage annotation - method annotation takes precedence
  • DO NOT forget to configure at least one chat model before calling build()

Edge Cases

  • Setting maxSequentialToolsInvocations(0) effectively disables tool execution
  • Negative values for maxSequentialToolsInvocations throw IllegalArgumentException at build time
  • Empty tool lists are allowed and result in no tool availability
  • Registering the same listener multiple times results in multiple invocations
  • Unregistering a listener that was never registered is a no-op
  • executeToolsConcurrently() with null executor uses ForkJoinPool.commonPool()
  • Setting both input and output guardrails creates a full validation pipeline

Performance Notes

  • Builder construction is lightweight - configuration objects are stored by reference
  • Most builder methods perform no validation - validation happens at build() time
  • Listeners are stored in order of registration and invoked sequentially
  • Tool specifications are analyzed at build time, not at invocation time
  • Chat request transformers add overhead to every invocation - keep them lightweight

Cost Considerations

  • System/user messages set via builder apply to all invocations - tokens add up quickly
  • Larger maxSequentialToolsInvocations allows more tool rounds but increases API calls
  • executeToolsConcurrently() does not reduce API calls - it only parallelizes tool execution
  • Moderation adds an extra API call to the moderation model for each invocation
  • Listeners do not directly incur costs but may slow down request processing

Exception Handling

  • IllegalArgumentException: Thrown for invalid configuration values (negative limits, null required parameters)
  • IllegalConfigurationException: Thrown at build() if configuration is inconsistent or incomplete
  • NullPointerException: Thrown by some methods if null is passed where non-null is required
  • Configuration errors are eagerly detected at build() time when possible

Related APIs

  • Chat Models - For chatModel() configuration
  • Streaming Chat Models - For streamingChatModel() configuration
  • Chat Memory - For chatMemory() and chatMemoryProvider() configuration
  • Tools - For tools() and related tool configuration
  • RAG - For contentRetriever() and retrievalAugmentor() configuration
  • Guardrails - For guardrails configuration
  • Models - For listener registration

Annotations

Annotations for configuring AI service methods and parameters.

package dev.langchain4j.service;

/**
 * Specifies complete system message or template to be used on each invocation
 * Can contain template variables resolved with values from @V annotated parameters
 * Takes precedence over systemMessageProvider
 */
@Target({TYPE, METHOD})
@Retention(RUNTIME)
public @interface SystemMessage {
    /**
     * Prompt template (single or multiple lines)
     * @return Template lines
     */
    String[] value();

    /**
     * Delimiter for joining multiple lines (default: "\n")
     * @return Delimiter string
     */
    String delimiter() default "\n";

    /**
     * Resource path to read prompt template
     * @return Resource path
     */
    String fromResource() default "";
}

/**
 * Specifies complete user message or template to be used on each invocation
 * Can contain template variables resolved with values from @V annotated parameters
 * Can be used on methods or parameters
 * Takes precedence over userMessageProvider
 */
@Target({METHOD, PARAMETER})
@Retention(RUNTIME)
public @interface UserMessage {
    /**
     * Prompt template (single or multiple lines)
     * @return Template lines
     */
    String[] value();

    /**
     * Delimiter for joining multiple lines (default: "\n")
     * @return Delimiter string
     */
    String delimiter() default "\n";

    /**
     * Resource path to read prompt template
     * @return Resource path
     */
    String fromResource() default "";
}

/**
 * Annotation for method parameters to mark them as prompt template variables
 * Value will be injected into templates defined in @UserMessage, @SystemMessage,
 * and systemMessageProvider
 * Not necessary when "-parameters" compilation option is enabled or when using
 * Quarkus/Spring Boot
 */
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface V {
    /**
     * Name of variable/placeholder in prompt template
     * @return Variable name
     */
    String value();
}

/**
 * Annotation for method parameters to specify memory ID for finding memory
 * belonging to user/conversation
 * Parameter can be of any type with proper equals()/hashCode() implementation
 */
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface MemoryId {
}

/**
 * Annotation for method parameters to inject value into 'name' field of UserMessage
 */
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface UserName {
}

/**
 * Annotation for methods to enable automatic content moderation
 * When annotated, method invocation will call both LLM and moderation model in parallel
 * If content is flagged, ModerationException is thrown
 */
@Target(METHOD)
@Retention(RUNTIME)
public @interface Moderate {
}

Thread Safety

  • Annotations are read at build time during proxy generation, not at runtime
  • No thread safety concerns as annotations are immutable metadata
  • Template variable substitution is performed per-invocation and is thread-safe

Common Pitfalls

  • DO NOT use both value() and fromResource() in same annotation - fromResource takes precedence
  • DO NOT use undefined variables in templates - they will be left as-is (e.g., "{{undefined}}")
  • DO NOT forget @V annotation when not using -parameters compiler flag - parameter names won't be available
  • DO NOT use @MemoryId on multiple parameters in the same method - only first one is used
  • DO NOT use @UserName on multiple parameters - only first one is used
  • DO NOT apply @SystemMessage at both type and method level expecting concatenation - method-level overrides completely
  • DO NOT use invalid delimiter strings - any string is valid but unusual delimiters may break templates

Edge Cases

  • Empty value() array results in empty message
  • Empty string in value() array contributes empty line (or delimiter if multiple)
  • fromResource() with non-existent resource throws IllegalConfigurationException at build time
  • Template variables with special characters (e.g., "{{var.name}}") are supported
  • Nested braces in templates (e.g., "{{var{{nested}}}}") may cause parsing issues
  • @UserMessage on parameter overrides method-level @UserMessage for that invocation
  • @Moderate with no configured moderationModel() throws IllegalConfigurationException at build time

Performance Notes

  • Template parsing and variable resolution happens per-invocation - keep templates simple
  • Resource loading via fromResource() is cached after first access
  • Variable substitution uses string replacement - large templates with many variables have overhead
  • @Moderate adds parallel API call to moderation model - expect 2x latency for single-threaded usage

Cost Considerations

  • Larger @SystemMessage templates consume more input tokens on every invocation
  • Template variables that expand to large strings increase token costs
  • @Moderate doubles API costs (one chat model call + one moderation model call)
  • Multi-line messages use delimiter which adds tokens (default "\n" adds minimal tokens)

Exception Handling

  • IllegalConfigurationException: Thrown at build time if fromResource() path is invalid
  • ModerationException: Thrown at invocation time if @Moderate flags content
  • IllegalArgumentException: May be thrown at invocation time if template variables can't be resolved
  • Missing @V annotation causes variable name to be unavailable unless using -parameters compiler flag

Related APIs


Result Type

The Result type provides access to additional information from AI service invocations.

package dev.langchain4j.service;

/**
 * Represents the result of AI Service invocation containing actual content
 * and additional information (token usage, finish reason, sources from RAG,
 * tool executions, intermediate/final responses)
 */
public class Result<T> {
    /**
     * Constructor
     * @param content The actual content/result
     * @param tokenUsage Aggregate token usage
     * @param sources Sources from RAG retrieval
     * @param finishReason Finish reason from model
     * @param toolExecutions All tool executions that occurred
     */
    public Result(
        T content,
        TokenUsage tokenUsage,
        List<Content> sources,
        FinishReason finishReason,
        List<ToolExecution> toolExecutions
    );

    /**
     * Create builder
     * @return Builder instance
     */
    public static <T> ResultBuilder<T> builder();

    /**
     * Get content
     * @return The actual content/result
     */
    public T content();

    /**
     * Get aggregate token usage
     * @return Token usage across all requests
     */
    public TokenUsage tokenUsage();

    /**
     * Get sources from RAG
     * @return List of retrieved content sources
     */
    public List<Content> sources();

    /**
     * Get finish reason
     * @return Finish reason from model
     */
    public FinishReason finishReason();

    /**
     * Get all tool executions
     * @return List of all tool executions
     */
    public List<ToolExecution> toolExecutions();

    /**
     * Get intermediate responses (with tool execution requests)
     * @return List of intermediate chat responses
     */
    public List<ChatResponse> intermediateResponses();

    /**
     * Get final response (without tool execution requests)
     * @return Final chat response
     */
    public ChatResponse finalResponse();
}

Thread Safety

  • Result instances are immutable and fully thread-safe
  • Can be safely shared across threads and cached
  • Collections returned by methods are unmodifiable views

Common Pitfalls

  • DO NOT assume tokenUsage() is non-null - some models don't report token usage
  • DO NOT assume sources() is non-empty - only populated when using RAG
  • DO NOT assume toolExecutions() is non-empty - only populated when tools are used
  • DO NOT modify returned lists - they are unmodifiable and will throw UnsupportedOperationException
  • DO NOT assume content() is non-null - it can be null if the model returned no content
  • DO NOT ignore finishReason() - STOP vs LENGTH vs CONTENT_FILTER indicate different outcomes

Edge Cases

  • If no tools were executed, toolExecutions() returns empty list (not null)
  • If RAG is not configured, sources() returns empty list (not null)
  • If model doesn't support token reporting, tokenUsage() may return null
  • intermediateResponses() includes all responses with tool execution requests
  • finalResponse() is the last response without tool execution requests
  • For simple invocations without tools, intermediateResponses() is empty and finalResponse() is the only response

Performance Notes

  • Creating Result instances is lightweight - all collections are stored by reference
  • Accessing fields has constant time complexity
  • Result objects are immutable so JVM can optimize access patterns
  • Consider extracting frequently accessed fields to local variables in hot paths

Cost Considerations

  • tokenUsage() shows aggregate cost across all requests in the invocation (including tool rounds)
  • Multiple tool execution rounds multiply token costs - monitor toolExecutions().size()
  • intermediateResponses() size indicates number of model calls beyond the final one
  • RAG sources don't directly incur model costs but increase input token count

Exception Handling

  • Result construction does not throw exceptions
  • Accessing null content requires null-checking by caller
  • Unmodifiable collections throw UnsupportedOperationException on modification attempts

Related APIs


TokenStream

Interface for streaming responses from AI services.

package dev.langchain4j.service;

/**
 * Represents token stream from model to subscribe and receive updates
 * when new partial response is available, when streaming finishes, or when error occurs
 * Intended as return type in AI Service
 */
public interface TokenStream {
    /**
     * Handle partial text responses
     * @param partialResponseHandler Consumer for partial response strings
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialResponse(Consumer<String> partialResponseHandler);

    /**
     * Handle partial responses with context (experimental)
     * @param handler BiConsumer for partial response with context
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialResponseWithContext(
        BiConsumer<PartialResponse, PartialResponseContext> handler
    );

    /**
     * Handle partial thinking/reasoning text (experimental)
     * @param partialThinkingHandler Consumer for partial thinking
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialThinking(Consumer<PartialThinking> partialThinkingHandler);

    /**
     * Handle partial thinking with context (experimental)
     * @param handler BiConsumer for partial thinking with context
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialThinkingWithContext(
        BiConsumer<PartialThinking, PartialThinkingContext> handler
    );

    /**
     * Handle partial tool calls (experimental)
     * @param partialToolCallHandler Consumer for partial tool calls
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialToolCall(Consumer<PartialToolCall> partialToolCallHandler);

    /**
     * Handle partial tool calls with context (experimental)
     * @param handler BiConsumer for partial tool calls with context
     * @return TokenStream instance for chaining
     */
    TokenStream onPartialToolCallWithContext(
        BiConsumer<PartialToolCall, PartialToolCallContext> handler
    );

    /**
     * Handle retrieved contents from RAG
     * @param contentHandler Consumer for retrieved content list
     * @return TokenStream instance for chaining
     */
    TokenStream onRetrieved(Consumer<List<Content>> contentHandler);

    /**
     * Handle intermediate chat responses (with tool execution requests)
     * @param intermediateResponseHandler Consumer for intermediate responses
     * @return TokenStream instance for chaining
     */
    TokenStream onIntermediateResponse(Consumer<ChatResponse> intermediateResponseHandler);

    /**
     * Handle before tool execution
     * @param beforeToolExecutionHandler Consumer for before tool execution context
     * @return TokenStream instance for chaining
     */
    TokenStream beforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecutionHandler);

    /**
     * Handle after tool execution
     * @param toolExecuteHandler Consumer for tool execution results
     * @return TokenStream instance for chaining
     */
    TokenStream onToolExecuted(Consumer<ToolExecution> toolExecuteHandler);

    /**
     * Handle final chat response
     * @param completeResponseHandler Consumer for complete response
     * @return TokenStream instance for chaining
     */
    TokenStream onCompleteResponse(Consumer<ChatResponse> completeResponseHandler);

    /**
     * Handle errors
     * @param errorHandler Consumer for throwable errors
     * @return TokenStream instance for chaining
     */
    TokenStream onError(Consumer<Throwable> errorHandler);

    /**
     * Ignore all errors (logged as WARN)
     * @return TokenStream instance for chaining
     */
    TokenStream ignoreErrors();

    /**
     * Start processing and send request to LLM
     * Must be called after registering handlers
     */
    void start();
}

Thread Safety

  • TokenStream instances are NOT thread-safe for configuration
  • Handler registration methods must be called from a single thread before start()
  • Once start() is called, handlers may be invoked from different threads
  • All registered handlers must be thread-safe

Common Pitfalls

  • DO NOT forget to call start() - handlers won't execute without it
  • DO NOT register handlers after calling start() - behavior is undefined
  • DO NOT call start() multiple times - throws IllegalStateException
  • DO NOT block in handlers - this will slow down streaming for the user
  • DO NOT throw exceptions from handlers - wrap in try-catch and handle errors
  • DO NOT register onError() and ignoreErrors() together - ignoreErrors takes precedence
  • DO NOT perform long-running operations in handlers - offload to separate threads

Edge Cases

  • If no onPartialResponse() handler is registered, partial responses are discarded
  • If no onError() handler is registered and ignoreErrors() is not called, errors propagate to caller
  • Registering multiple handlers of the same type results in all being called in order
  • Empty partial responses may be delivered (zero-length strings)
  • onCompleteResponse() is always called after all partial responses (unless error occurs)
  • Tool execution callbacks are invoked even if tools fail
  • RAG onRetrieved() is called before first partial response

Performance Notes

  • Streaming reduces perceived latency - users see responses as they're generated
  • Handler execution is synchronous with response processing - keep handlers fast
  • Consider using reactive/async patterns in handlers for non-blocking processing
  • Multiple handlers add overhead - combine logic when possible
  • Partial responses may arrive at high frequency - avoid expensive operations per token

Cost Considerations

  • Streaming does not reduce token costs - same as non-streaming
  • Streaming may enable early cancellation in interactive scenarios, saving costs
  • No additional API calls are made for streaming vs non-streaming
  • Tool execution in streaming follows same cost model as non-streaming

Exception Handling

  • IllegalStateException: Thrown if start() is called multiple times
  • RuntimeException: Errors from model or tools are delivered to onError() handler
  • If no onError() handler is registered, exceptions propagate to caller
  • ignoreErrors() suppresses all errors (logged at WARN level)
  • Handler exceptions are caught and logged but don't stop streaming

Related APIs


Exceptions

Exceptions thrown by AI services.

package dev.langchain4j.service;

/**
 * Exception thrown when AI service is misconfigured
 */
public class IllegalConfigurationException extends LangChain4jException {
    /**
     * Constructor
     * @param message Error message
     */
    public IllegalConfigurationException(String message);

    /**
     * Constructor with cause
     * @param message Error message
     * @param cause Underlying cause
     */
    public IllegalConfigurationException(String message, Throwable cause);
}

/**
 * Exception thrown when moderation model flags content
 */
public class ModerationException extends LangChain4jException {
    /**
     * Constructor
     * @param message Error message
     */
    public ModerationException(String message);

    /**
     * Constructor with cause
     * @param message Error message
     * @param cause Underlying cause
     */
    public ModerationException(String message, Throwable cause);
}

Thread Safety

  • Exception instances are immutable and thread-safe
  • Can be safely caught and rethrown across thread boundaries
  • Stack traces are captured at construction time

Common Pitfalls

  • DO NOT catch these as generic Exception - catch specific types for proper error handling
  • DO NOT ignore IllegalConfigurationException - it indicates a programming error that must be fixed
  • DO NOT suppress ModerationException without logging - it indicates policy violations
  • DO NOT retry on IllegalConfigurationException - fix the configuration instead

Edge Cases

  • IllegalConfigurationException is thrown at build time, not invocation time
  • ModerationException is thrown during invocation if content is flagged
  • Both exceptions extend LangChain4jException for generic catch blocks
  • Cause chain may include underlying model provider exceptions

Performance Notes

  • Exception construction includes stack trace generation - avoid creating exceptions in hot paths
  • Catching specific exception types is faster than catching generic Exception

Cost Considerations

  • IllegalConfigurationException incurs no API costs (thrown before API calls)
  • ModerationException occurs after moderation API call but before main model call

Exception Handling

  • Catch IllegalConfigurationException at application startup to fail fast
  • Catch ModerationException at invocation time to handle policy violations
  • Log both exception types with full stack traces for debugging
  • Consider wrapping in application-specific exceptions for cleaner API

Related APIs

  • Guardrails - For moderation and validation configuration

Usage Examples

Basic AI Service

import dev.langchain4j.service.AiServices;

interface Assistant {
    String chat(String message);
}

// Create simple AI service
Assistant assistant = AiServices.create(Assistant.class, chatModel);

try {
    String response = assistant.chat("What is the capital of France?");
    System.out.println(response);
} catch (RuntimeException e) {
    System.err.println("AI service invocation failed: " + e.getMessage());
    throw e;
}

AI Service with Templates

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;

interface Chef {
    @SystemMessage("You are a professional chef with expertise in {{cuisine}} cuisine.")
    @UserMessage("Create a recipe for {{dish}} using {{ingredient}}.")
    String createRecipe(@V("cuisine") String cuisine,
                       @V("dish") String dish,
                       @V("ingredient") String ingredient);
}

Chef chef = AiServices.create(Chef.class, chatModel);

try {
    String recipe = chef.createRecipe("Italian", "pasta", "tomatoes");
    System.out.println(recipe);
} catch (IllegalArgumentException e) {
    System.err.println("Invalid template variables: " + e.getMessage());
}

AI Service with Memory

import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.MemoryId;

interface Assistant {
    String chat(@MemoryId String userId, String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemoryProvider(memoryId -> MessageWindowChatMemory.withMaxMessages(10))
    .build();

try {
    // Different conversations for different users
    String response1 = assistant.chat("user1", "My name is Alice");
    String response2 = assistant.chat("user2", "My name is Bob");
    String response3 = assistant.chat("user1", "What is my name?"); // Will respond "Alice"

    System.out.println("User1 response: " + response3);
} catch (RuntimeException e) {
    System.err.println("Conversation failed: " + e.getMessage());
    // Handle error appropriately
}

AI Service with Tools

import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.service.AiServices;

class WeatherService {
    @Tool("Get current weather for a location")
    String getWeather(String location) {
        // Implementation with error handling
        try {
            // Call weather API
            return "Sunny, 72°F";
        } catch (Exception e) {
            return "Weather data unavailable for " + location;
        }
    }
}

interface Assistant {
    String chat(String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new WeatherService())
    .toolExecutionErrorHandler((toolExecutionRequest, throwable) -> {
        System.err.println("Tool execution failed: " + throwable.getMessage());
        return "Tool execution failed: " + throwable.getMessage();
    })
    .build();

try {
    String response = assistant.chat("What's the weather in New York?");
    System.out.println(response);
} catch (RuntimeException e) {
    System.err.println("AI service with tools failed: " + e.getMessage());
}

Streaming AI Service

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.TokenStream;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.atomic.AtomicReference;

interface Assistant {
    TokenStream chat(String message);
}

Assistant assistant = AiServices.create(Assistant.class, streamingChatModel);

CompletableFuture<String> future = new CompletableFuture<>();
AtomicReference<StringBuilder> responseBuilder = new AtomicReference<>(new StringBuilder());

assistant.chat("Tell me a story")
    .onPartialResponse(token -> {
        System.out.print(token);
        responseBuilder.get().append(token);
    })
    .onCompleteResponse(response -> {
        System.out.println("\nDone!");
        future.complete(responseBuilder.get().toString());
    })
    .onError(throwable -> {
        System.err.println("\nError: " + throwable.getMessage());
        throwable.printStackTrace();
        future.completeExceptionally(throwable);
    })
    .start();

try {
    String fullResponse = future.get(); // Wait for completion
} catch (Exception e) {
    System.err.println("Streaming failed: " + e.getMessage());
}

AI Service with Result Type

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.Result;

interface Assistant {
    Result<String> chat(String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new WeatherService())
    .build();

try {
    Result<String> result = assistant.chat("What's the weather?");

    System.out.println("Content: " + result.content());

    if (result.tokenUsage() != null) {
        System.out.println("Token usage: " + result.tokenUsage());
        System.out.println("Input tokens: " + result.tokenUsage().inputTokenCount());
        System.out.println("Output tokens: " + result.tokenUsage().outputTokenCount());
    }

    if (!result.toolExecutions().isEmpty()) {
        System.out.println("Tool executions: " + result.toolExecutions().size());
        result.toolExecutions().forEach(te ->
            System.out.println("  - " + te.toolName() + ": " + te.result())
        );
    }

    System.out.println("Finish reason: " + result.finishReason());
} catch (RuntimeException e) {
    System.err.println("AI service invocation failed: " + e.getMessage());
}

AI Service with RAG

import dev.langchain4j.service.AiServices;

interface Assistant {
    String chat(String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .contentRetriever(contentRetriever)
    .build();

try {
    String response = assistant.chat("What does the documentation say about X?");
    System.out.println(response);
} catch (RuntimeException e) {
    System.err.println("RAG query failed: " + e.getMessage());
}

Testing Patterns

Mocking AI Services with Mockito

import org.junit.jupiter.api.Test;
import org.mockito.Mockito;
import static org.mockito.ArgumentMatchers.*;
import static org.junit.jupiter.api.Assertions.*;

interface Assistant {
    String chat(String message);
}

@Test
void testWithMockedAssistant() {
    // Create mock
    Assistant assistant = Mockito.mock(Assistant.class);

    // Define behavior
    Mockito.when(assistant.chat(anyString()))
        .thenReturn("Mocked response");

    // Test
    String response = assistant.chat("Hello");
    assertEquals("Mocked response", response);

    // Verify
    Mockito.verify(assistant, Mockito.times(1)).chat("Hello");
}

Testing with Fake Chat Model

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.output.Response;
import org.junit.jupiter.api.Test;

class FakeChatModel implements ChatLanguageModel {
    private final String fixedResponse;

    public FakeChatModel(String fixedResponse) {
        this.fixedResponse = fixedResponse;
    }

    @Override
    public Response<AiMessage> generate(List<ChatMessage> messages) {
        return Response.from(AiMessage.from(fixedResponse));
    }
}

@Test
void testWithFakeChatModel() {
    ChatLanguageModel model = new FakeChatModel("Test response");

    Assistant assistant = AiServices.create(Assistant.class, model);
    String response = assistant.chat("Any message");

    assertEquals("Test response", response);
}

Testing Memory Isolation

import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

interface Assistant {
    String chat(@MemoryId String userId, String message);
}

@Test
void testMemoryIsolation() {
    ChatLanguageModel model = new FakeChatModel("Echo");

    Assistant assistant = AiServices.builder(Assistant.class)
        .chatModel(model)
        .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(10))
        .build();

    // Verify different users have isolated memory
    assistant.chat("user1", "Message from user1");
    assistant.chat("user2", "Message from user2");

    // Memory should be separate per user
    // Add assertions based on your implementation
}

Testing Tool Execution

import dev.langchain4j.agent.tool.Tool;
import org.junit.jupiter.api.Test;
import org.mockito.Mockito;
import static org.mockito.Mockito.*;

class WeatherService {
    @Tool("Get weather")
    String getWeather(String location) {
        return "Sunny";
    }
}

@Test
void testToolExecution() {
    WeatherService weatherService = Mockito.spy(new WeatherService());

    Assistant assistant = AiServices.builder(Assistant.class)
        .chatModel(chatModel)
        .tools(weatherService)
        .build();

    assistant.chat("What's the weather in Paris?");

    // Verify tool was called
    verify(weatherService, atLeastOnce()).getWeather(anyString());
}

Testing Error Handling

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

@Test
void testToolExecutionError() {
    class FailingTool {
        @Tool("Failing tool")
        String fail(String input) {
            throw new RuntimeException("Tool failure");
        }
    }

    AtomicBoolean errorHandlerCalled = new AtomicBoolean(false);

    Assistant assistant = AiServices.builder(Assistant.class)
        .chatModel(chatModel)
        .tools(new FailingTool())
        .toolExecutionErrorHandler((request, throwable) -> {
            errorHandlerCalled.set(true);
            return "Error handled: " + throwable.getMessage();
        })
        .build();

    String response = assistant.chat("Trigger tool");

    assertTrue(errorHandlerCalled.get(), "Error handler should be called");
}

Integration Testing with TestContainers

import org.junit.jupiter.api.Test;
import org.testcontainers.containers.GenericContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;

@Testcontainers
class AiServiceIntegrationTest {
    @Container
    private static GenericContainer<?> ollama = new GenericContainer<>("ollama/ollama:latest")
        .withExposedPorts(11434);

    @Test
    void testWithRealModel() {
        String baseUrl = "http://" + ollama.getHost() + ":" + ollama.getFirstMappedPort();

        ChatLanguageModel model = OllamaChatModel.builder()
            .baseUrl(baseUrl)
            .modelName("llama2")
            .build();

        Assistant assistant = AiServices.create(Assistant.class, model);
        String response = assistant.chat("Hello");

        assertNotNull(response);
        assertFalse(response.isEmpty());
    }
}

Error Recovery Patterns

Retry with Exponential Backoff

import dev.langchain4j.service.AiServices;
import java.time.Duration;

interface Assistant {
    String chat(String message);
}

class RetryableAiService {
    private final Assistant assistant;
    private final int maxRetries;
    private final Duration initialDelay;

    public RetryableAiService(Assistant assistant, int maxRetries, Duration initialDelay) {
        this.assistant = assistant;
        this.maxRetries = maxRetries;
        this.initialDelay = initialDelay;
    }

    public String chatWithRetry(String message) {
        int attempt = 0;
        Duration delay = initialDelay;

        while (attempt < maxRetries) {
            try {
                return assistant.chat(message);
            } catch (RuntimeException e) {
                attempt++;
                if (attempt >= maxRetries) {
                    throw new RuntimeException("Failed after " + maxRetries + " attempts", e);
                }

                System.err.println("Attempt " + attempt + " failed, retrying in " + delay.toMillis() + "ms");

                try {
                    Thread.sleep(delay.toMillis());
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException("Interrupted during retry", ie);
                }

                delay = delay.multipliedBy(2); // Exponential backoff
            }
        }

        throw new RuntimeException("Should not reach here");
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
RetryableAiService retryable = new RetryableAiService(assistant, 3, Duration.ofSeconds(1));
String response = retryable.chatWithRetry("Hello");

Circuit Breaker Pattern

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;

class CircuitBreaker {
    private enum State { CLOSED, OPEN, HALF_OPEN }

    private final Assistant assistant;
    private final int failureThreshold;
    private final Duration timeout;
    private final AtomicInteger failureCount = new AtomicInteger(0);
    private final AtomicReference<State> state = new AtomicReference<>(State.CLOSED);
    private final AtomicReference<Instant> lastFailureTime = new AtomicReference<>();

    public CircuitBreaker(Assistant assistant, int failureThreshold, Duration timeout) {
        this.assistant = assistant;
        this.failureThreshold = failureThreshold;
        this.timeout = timeout;
    }

    public String chat(String message) {
        if (state.get() == State.OPEN) {
            if (Instant.now().isAfter(lastFailureTime.get().plus(timeout))) {
                state.set(State.HALF_OPEN);
                System.out.println("Circuit breaker: transitioning to HALF_OPEN");
            } else {
                throw new RuntimeException("Circuit breaker is OPEN");
            }
        }

        try {
            String response = assistant.chat(message);
            onSuccess();
            return response;
        } catch (RuntimeException e) {
            onFailure();
            throw e;
        }
    }

    private void onSuccess() {
        failureCount.set(0);
        state.set(State.CLOSED);
    }

    private void onFailure() {
        int failures = failureCount.incrementAndGet();
        lastFailureTime.set(Instant.now());

        if (failures >= failureThreshold) {
            state.set(State.OPEN);
            System.err.println("Circuit breaker: transitioning to OPEN after " + failures + " failures");
        }
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
CircuitBreaker circuitBreaker = new CircuitBreaker(assistant, 3, Duration.ofMinutes(1));

try {
    String response = circuitBreaker.chat("Hello");
} catch (RuntimeException e) {
    System.err.println("Circuit breaker prevented call or call failed: " + e.getMessage());
}

Fallback Response Pattern

interface Assistant {
    String chat(String message);
}

class FallbackAiService {
    private final Assistant primary;
    private final Assistant fallback;

    public FallbackAiService(Assistant primary, Assistant fallback) {
        this.primary = primary;
        this.fallback = fallback;
    }

    public String chat(String message) {
        try {
            return primary.chat(message);
        } catch (RuntimeException e) {
            System.err.println("Primary service failed: " + e.getMessage());
            System.err.println("Falling back to secondary service");

            try {
                return fallback.chat(message);
            } catch (RuntimeException fallbackException) {
                System.err.println("Fallback service also failed: " + fallbackException.getMessage());
                return "I apologize, but I'm currently unable to process your request. Please try again later.";
            }
        }
    }
}

// Usage
Assistant primary = AiServices.create(Assistant.class, primaryChatModel);
Assistant fallback = AiServices.create(Assistant.class, fallbackChatModel);
FallbackAiService service = new FallbackAiService(primary, fallback);

String response = service.chat("Hello");

Timeout Pattern

import java.util.concurrent.*;

class TimeoutAiService {
    private final Assistant assistant;
    private final Duration timeout;
    private final ExecutorService executor;

    public TimeoutAiService(Assistant assistant, Duration timeout) {
        this.assistant = assistant;
        this.timeout = timeout;
        this.executor = Executors.newCachedThreadPool();
    }

    public String chatWithTimeout(String message) {
        Future<String> future = executor.submit(() -> assistant.chat(message));

        try {
            return future.get(timeout.toMillis(), TimeUnit.MILLISECONDS);
        } catch (TimeoutException e) {
            future.cancel(true);
            throw new RuntimeException("Request timed out after " + timeout.toMillis() + "ms", e);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new RuntimeException("Request was interrupted", e);
        } catch (ExecutionException e) {
            throw new RuntimeException("Request failed", e.getCause());
        }
    }

    public void shutdown() {
        executor.shutdown();
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
TimeoutAiService timeoutService = new TimeoutAiService(assistant, Duration.ofSeconds(30));

try {
    String response = timeoutService.chatWithTimeout("Hello");
} catch (RuntimeException e) {
    System.err.println("Request failed or timed out: " + e.getMessage());
} finally {
    timeoutService.shutdown();
}

Graceful Degradation with Caching

import java.util.concurrent.ConcurrentHashMap;
import java.util.Map;

class CachedAiService {
    private final Assistant assistant;
    private final Map<String, String> cache = new ConcurrentHashMap<>();
    private final boolean useCacheOnError;

    public CachedAiService(Assistant assistant, boolean useCacheOnError) {
        this.assistant = assistant;
        this.useCacheOnError = useCacheOnError;
    }

    public String chat(String message) {
        // Check cache first
        String cached = cache.get(message);

        try {
            String response = assistant.chat(message);
            cache.put(message, response);
            return response;
        } catch (RuntimeException e) {
            if (useCacheOnError && cached != null) {
                System.err.println("Service failed, returning cached response: " + e.getMessage());
                return cached + " [cached]";
            }
            throw e;
        }
    }

    public void clearCache() {
        cache.clear();
    }
}

// Usage
Assistant assistant = AiServices.create(Assistant.class, chatModel);
CachedAiService cachedService = new CachedAiService(assistant, true);

String response = cachedService.chat("What is 2+2?");
// On subsequent failures, cached response is returned

Integration Patterns

Spring Boot Integration

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.beans.factory.annotation.Value;

@Configuration
public class AiServiceConfig {

    @Bean
    public ChatLanguageModel chatLanguageModel(
            @Value("${openai.api.key}") String apiKey,
            @Value("${openai.model.name:gpt-4}") String modelName) {
        return OpenAiChatModel.builder()
            .apiKey(apiKey)
            .modelName(modelName)
            .temperature(0.7)
            .timeout(Duration.ofSeconds(60))
            .build();
    }

    @Bean
    public ChatMemoryProvider chatMemoryProvider() {
        return memoryId -> MessageWindowChatMemory.withMaxMessages(10);
    }

    @Bean
    public Assistant assistant(
            ChatLanguageModel chatModel,
            ChatMemoryProvider chatMemoryProvider,
            List<Object> tools) {
        return AiServices.builder(Assistant.class)
            .chatModel(chatModel)
            .chatMemoryProvider(chatMemoryProvider)
            .tools(tools)
            .build();
    }

    @Bean
    public Object weatherTool() {
        return new WeatherService();
    }
}

// Controller
@RestController
@RequestMapping("/api/chat")
public class ChatController {

    private final Assistant assistant;

    public ChatController(Assistant assistant) {
        this.assistant = assistant;
    }

    @PostMapping
    public ResponseEntity<ChatResponse> chat(
            @RequestParam String userId,
            @RequestBody ChatRequest request) {
        try {
            String response = assistant.chat(userId, request.getMessage());
            return ResponseEntity.ok(new ChatResponse(response));
        } catch (Exception e) {
            return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(new ChatResponse("Error: " + e.getMessage()));
        }
    }
}

Spring Boot with Result Type

import dev.langchain4j.service.Result;
import org.springframework.web.bind.annotation.*;

interface AssistantWithMetadata {
    Result<String> chat(@MemoryId String userId, String message);
}

@Configuration
public class AiServiceWithResultConfig {

    @Bean
    public AssistantWithMetadata assistantWithMetadata(
            ChatLanguageModel chatModel,
            ChatMemoryProvider chatMemoryProvider) {
        return AiServices.builder(AssistantWithMetadata.class)
            .chatModel(chatModel)
            .chatMemoryProvider(chatMemoryProvider)
            .build();
    }
}

@RestController
@RequestMapping("/api/chat")
public class ChatControllerWithMetadata {

    private final AssistantWithMetadata assistant;

    public ChatControllerWithMetadata(AssistantWithMetadata assistant) {
        this.assistant = assistant;
    }

    @PostMapping("/detailed")
    public ResponseEntity<DetailedChatResponse> chatWithMetadata(
            @RequestParam String userId,
            @RequestBody ChatRequest request) {
        try {
            Result<String> result = assistant.chat(userId, request.getMessage());

            return ResponseEntity.ok(DetailedChatResponse.builder()
                .content(result.content())
                .tokenUsage(result.tokenUsage())
                .finishReason(result.finishReason())
                .toolExecutions(result.toolExecutions())
                .build());
        } catch (Exception e) {
            return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(DetailedChatResponse.error(e.getMessage()));
        }
    }
}

Quarkus Integration

import javax.enterprise.context.ApplicationScoped;
import javax.enterprise.inject.Produces;
import org.eclipse.microprofile.config.inject.ConfigProperty;

@ApplicationScoped
public class AiServiceProducer {

    @ConfigProperty(name = "openai.api.key")
    String apiKey;

    @ConfigProperty(name = "openai.model.name", defaultValue = "gpt-4")
    String modelName;

    @Produces
    @ApplicationScoped
    public ChatLanguageModel chatLanguageModel() {
        return OpenAiChatModel.builder()
            .apiKey(apiKey)
            .modelName(modelName)
            .build();
    }

    @Produces
    @ApplicationScoped
    public Assistant assistant(ChatLanguageModel chatModel) {
        return AiServices.builder(Assistant.class)
            .chatModel(chatModel)
            .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(10))
            .build();
    }
}

// Resource
@Path("/chat")
@ApplicationScoped
public class ChatResource {

    @Inject
    Assistant assistant;

    @POST
    @Produces(MediaType.APPLICATION_JSON)
    @Consumes(MediaType.APPLICATION_JSON)
    public Response chat(@QueryParam("userId") String userId, ChatRequest request) {
        try {
            String response = assistant.chat(userId, request.getMessage());
            return Response.ok(new ChatResponse(response)).build();
        } catch (Exception e) {
            return Response
                .serverError()
                .entity(Map.of("error", e.getMessage()))
                .build();
        }
    }
}

Reactive Streaming with Project Reactor

import reactor.core.publisher.Flux;
import reactor.core.publisher.Sinks;

interface StreamingAssistant {
    TokenStream chat(String message);
}

@Service
public class ReactiveAiService {

    private final StreamingAssistant assistant;

    public ReactiveAiService(StreamingChatModel streamingChatModel) {
        this.assistant = AiServices.create(StreamingAssistant.class, streamingChatModel);
    }

    public Flux<String> chatReactive(String message) {
        Sinks.Many<String> sink = Sinks.many().multicast().onBackpressureBuffer();

        assistant.chat(message)
            .onPartialResponse(sink::tryEmitNext)
            .onCompleteResponse(response -> sink.tryEmitComplete())
            .onError(throwable -> sink.tryEmitError(throwable))
            .start();

        return sink.asFlux();
    }
}

// Controller
@RestController
@RequestMapping("/api/stream")
public class StreamingChatController {

    private final ReactiveAiService aiService;

    public StreamingChatController(ReactiveAiService aiService) {
        this.aiService = aiService;
    }

    @GetMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestParam String message) {
        return aiService.chatReactive(message)
            .onErrorResume(throwable -> {
                return Flux.just("Error: " + throwable.getMessage());
            });
    }
}

Kubernetes Deployment Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: ai-service-config
data:
  application.properties: |
    openai.api.key=${OPENAI_API_KEY}
    openai.model.name=gpt-4
    openai.timeout=60s
    chat.memory.max.messages=10

---
apiVersion: v1
kind: Secret
metadata:
  name: ai-service-secrets
type: Opaque
stringData:
  openai-api-key: "your-api-key-here"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-service
  template:
    metadata:
      labels:
        app: ai-service
    spec:
      containers:
      - name: ai-service
        image: your-registry/ai-service:latest
        ports:
        - containerPort: 8080
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: ai-service-secrets
              key: openai-api-key
        volumeMounts:
        - name: config
          mountPath: /config
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 20
          periodSeconds: 5
      volumes:
      - name: config
        configMap:
          name: ai-service-config

---
apiVersion: v1
kind: Service
metadata:
  name: ai-service
spec:
  selector:
    app: ai-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

Observability with Micrometer

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.stereotype.Service;

@Service
public class ObservableAiService {

    private final Assistant assistant;
    private final MeterRegistry meterRegistry;

    public ObservableAiService(Assistant assistant, MeterRegistry meterRegistry) {
        this.assistant = assistant;
        this.meterRegistry = meterRegistry;
    }

    public String chat(String userId, String message) {
        Timer.Sample sample = Timer.start(meterRegistry);

        try {
            String response = assistant.chat(userId, message);

            sample.stop(Timer.builder("ai.service.request")
                .tag("status", "success")
                .tag("user", userId)
                .register(meterRegistry));

            meterRegistry.counter("ai.service.requests.total",
                "status", "success").increment();

            return response;
        } catch (Exception e) {
            sample.stop(Timer.builder("ai.service.request")
                .tag("status", "error")
                .tag("user", userId)
                .tag("error", e.getClass().getSimpleName())
                .register(meterRegistry));

            meterRegistry.counter("ai.service.requests.total",
                "status", "error",
                "error", e.getClass().getSimpleName()).increment();

            throw e;
        }
    }
}

Related APIs

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j@1.11.0

docs

ai-services.md

chains.md

classification.md

data-types.md

document-processing.md

embedding-store.md

guardrails.md

index.md

memory.md

messages.md

models.md

output-parsing.md

prompts.md

rag.md

request-response.md

spi.md

tools.md

README.md

tile.json