CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j

Build LLM-powered applications in Java with support for chatbots, agents, RAG, tools, and much more

Overview
Eval results
Files

index.mddocs/

LangChain4j

LangChain4j is a Java library for building LLM-powered applications with support for chatbots, agents, RAG (Retrieval Augmented Generation), tools, guardrails, and much more. It provides a high-level API for working with chat models, streaming, memory management, document processing, embeddings, and various integrations.

Package Information

  • Package Name: dev.langchain4j:langchain4j
  • Package Type: maven
  • Language: Java
  • Installation:
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j</artifactId>
    <version>1.11.0</version>
</dependency>

Core Imports

// Core AI Services API
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;
import dev.langchain4j.service.MemoryId;

// Memory management
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.memory.chat.TokenWindowChatMemory;
import dev.langchain4j.memory.chat.ChatMemoryProvider;

// Prompts and templates
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.input.structured.StructuredPrompt;

// RAG (Retrieval Augmented Generation)
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;

// Document processing
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;

// Embedding store
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;

// Tools
import dev.langchain4j.agent.tool.Tool;

Basic Usage

import dev.langchain4j.service.AiServices;

// Define your AI service interface
interface Assistant {
    String chat(String message);
}

// Create AI service with a chat model
Assistant assistant = AiServices.create(Assistant.class, chatModel);

// Use the assistant
String response = assistant.chat("Hello, how are you?");
System.out.println(response);

Quick Start Guide for Common Tasks

Task 1: Simple Chatbot

When to use: Basic Q&A, single-user conversations

// Minimal setup - no memory, no tools
Assistant assistant = AiServices.create(Assistant.class, chatModel);
String answer = assistant.chat("What is Java?");

Task 2: Multi-User Chatbot with Memory

When to use: Multiple users, need conversation history

interface Assistant {
    String chat(@MemoryId String userId, String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(20))
    .build();

// Each user gets separate conversation history
assistant.chat("user1", "My name is Alice");
assistant.chat("user1", "What's my name?"); // Remembers "Alice"

Task 3: RAG-Enabled Assistant

When to use: Need to answer questions from your documents

// 1. Load and embed documents
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingModel embeddingModel = /* your embedding model */;

List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");
for (Document doc : documents) {
    List<TextSegment> segments = DocumentSplitters.recursive(500, 50, tokenizer).split(doc);
    List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
    store.addAll(embeddings, segments);
}

// 2. Create retriever
ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .build();

// 3. Build assistant with RAG
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .contentRetriever(retriever)
    .build();

String answer = assistant.chat("What does the documentation say about X?");

Task 4: Tool-Using Agent

When to use: Need to call external functions/APIs

class Tools {
    @Tool("Get current weather")
    String getWeather(String city) {
        // Call weather API
        return "Sunny, 72°F";
    }

    @Tool("Search database")
    String searchDB(String query) {
        // Query database
        return "Found 10 results";
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new Tools())
    .build();

// LLM will automatically call tools when needed
String answer = assistant.chat("What's the weather in Paris?");

Task 5: Structured Output Extraction

When to use: Extract structured data from text

record Person(String name, int age, String city) {}

interface Extractor {
    Person extractPerson(String text);
}

Extractor extractor = AiServices.create(Extractor.class, chatModel);
Person person = extractor.extractPerson("John is 30 years old and lives in NYC");
// Returns: Person[name=John, age=30, city=NYC]

Decision Trees

When to Use What Memory Type?

Need conversation history?
├─ NO → Don't configure memory (stateless)
└─ YES → How many users?
    ├─ Single user → Use MessageWindowChatMemory.withMaxMessages(N)
    └─ Multiple users → Use ChatMemoryProvider with @MemoryId
        └─ Token limit concern?
            ├─ YES → TokenWindowChatMemory with TokenCountEstimator
            └─ NO → MessageWindowChatMemory with maxMessages

When to Use Streaming vs. Non-Streaming?

User experience requirement?
├─ Show response token-by-token → StreamingChatModel + TokenStream return type
└─ Get complete response at once → ChatModel + String/POJO return type

RAG Content Retriever Selection

Data source?
├─ Documents/knowledge base → EmbeddingStoreContentRetriever
├─ Web search → WebSearchContentRetriever
├─ Custom data source → Implement ContentRetriever interface
└─ Multiple sources → Use QueryRouter with multiple retrievers

Architecture

LangChain4j is built around several key components:

  • AI Services API: High-level interface for creating AI-powered services through Java interfaces
  • Memory Management: Chat memory implementations for maintaining conversation context
  • Document Processing: Loaders, parsers, and splitters for working with documents
  • RAG (Retrieval Augmented Generation): Content retrieval and augmentation for grounding LLM responses
  • Tools: Framework for function calling and tool execution
  • Guardrails: Input and output validation and filtering
  • Output Parsing: Automatic conversion of LLM outputs to Java types
  • Classification: Text classification using embedding-based similarity

Capabilities

Core Models

Core interfaces for interacting with language models, including chat, streaming, and embeddings.

public interface ChatModel {
    ChatResponse chat(ChatRequest chatRequest);
    String chat(String userMessage);
    ChatResponse chat(ChatMessage... messages);
    Set<Capability> supportedCapabilities();
}

public interface StreamingChatModel {
    void chat(ChatRequest chatRequest, StreamingChatResponseHandler handler);
    void chat(String userMessage, StreamingChatResponseHandler handler);
}

public interface EmbeddingModel {
    Response<Embedding> embed(String text);
    Response<List<Embedding>> embedAll(List<TextSegment> textSegments);
    int dimension();
}

Thread Safety: ChatModel and EmbeddingModel implementations are typically thread-safe. StreamingChatModel requires careful handling of concurrent requests to avoid handler confusion.

Common Pitfalls:

  • Don't share StreamingChatResponseHandler instances between multiple concurrent calls
  • Check supportedCapabilities() before using advanced features like tool calling or JSON response formats
  • Be aware of rate limits when making multiple concurrent requests

Related APIs: Chat and Language Models


Chat Messages

Message types for chat interactions including UserMessage, AiMessage, SystemMessage, and multimodal content support.

public class UserMessage implements ChatMessage {
    public UserMessage(String text);
    public UserMessage(List<Content> contents);
    public String singleText();
    public List<Content> contents();
}

public class AiMessage implements ChatMessage {
    public AiMessage(String text);
    public String text();
    public String thinking();
    public List<ToolExecutionRequest> toolExecutionRequests();
}

public class SystemMessage implements ChatMessage {
    public SystemMessage(String text);
    public String text();
}

Thread Safety: Message classes are immutable and thread-safe.

Common Pitfalls:

  • Calling singleText() on multimodal UserMessage throws RuntimeException - use hasSingleText() first
  • AiMessage may have null text when it contains only tool execution requests
  • Don't modify the lists returned by contents() or toolExecutionRequests() - they may be unmodifiable

Edge Cases:

  • Empty UserMessage text is allowed but may cause issues with some models
  • UserMessage with name but no content is valid
  • AiMessage with both text and tool requests is valid (model can provide explanation + request tools)

Related APIs: Chat Messages


Requests and Responses

Request and response types for chat model interactions with comprehensive parameter control.

public class ChatRequest {
    public List<ChatMessage> messages();
    public ChatRequestParameters parameters();
    public static Builder builder();
}

public class ChatResponse {
    public AiMessage aiMessage();
    public TokenUsage tokenUsage();
    public FinishReason finishReason();
}

public interface StreamingChatResponseHandler {
    void onPartialResponse(String partialResponse);
    void onCompleteResponse(ChatResponse completeResponse);
    void onError(Throwable error);
}

Thread Safety: ChatRequest and ChatResponse are immutable. StreamingChatResponseHandler callbacks are invoked sequentially on the same thread.

Common Pitfalls:

  • Temperature values outside model's supported range are silently clamped by some providers
  • stopSequences may not work with all models - check provider documentation
  • TokenUsage may be null for some model providers
  • FinishReason.LENGTH means output was truncated - increase maxOutputTokens

Performance Notes:

  • Lower temperature (0.0-0.3) is faster and more deterministic
  • Higher temperature (0.7-1.0) is slower and more creative
  • Setting maxOutputTokens prevents runaway generations and controls costs

Related APIs: Chat Requests and Responses


Prompts and Templates

Template system for creating reusable prompts with variable substitution. Supports structured prompts and automatic date/time injection.

/**
 * Represents a prompt (input text sent to LLM)
 */
public class Prompt {
    public Prompt(String text);
    public String text();
    public UserMessage toUserMessage();
    public SystemMessage toSystemMessage();
    public static Prompt from(String text);
}

/**
 * Template with {{variable}} placeholders
 * Special variables: {{current_date}}, {{current_time}}, {{current_date_time}}
 */
public class PromptTemplate {
    public PromptTemplate(String template);
    public Prompt apply(Object value);
    public Prompt apply(Map<String, Object> variables);
    public static PromptTemplate from(String template);
}

/**
 * Annotation for structured prompts on Java classes
 */
@Target(TYPE)
@Retention(RUNTIME)
public @interface StructuredPrompt {
    String[] value();
    String delimiter() default "\n";
}

Thread Safety: PromptTemplate is thread-safe and can be reused across threads. Prompt instances are immutable.

Common Pitfalls:

  • Missing variables in template throw IllegalArgumentException at apply() time
  • Variable names are case-sensitive: {{Name}} != {{name}}
  • Special date/time variables use system default timezone - set Clock for custom timezone
  • @StructuredPrompt requires all fields to be accessible (public or with getters)

Performance Notes:

  • Create PromptTemplate once and reuse - template parsing has overhead
  • StructuredPromptProcessor uses reflection - cache results for repeated use

Related APIs: Prompts and Templates


RAG (Retrieval Augmented Generation)

Comprehensive RAG framework for augmenting LLM responses with retrieved information. Supports query transformation, routing, content retrieval, aggregation, and injection.

/**
 * Entry point for RAG flow
 */
public interface RetrievalAugmentor {
    AugmentationResult augment(AugmentationRequest augmentationRequest);
}

/**
 * Default RAG implementation with full pipeline
 */
public class DefaultRetrievalAugmentor implements RetrievalAugmentor {
    public static Builder builder();
}

/**
 * Retrieves content from data sources using queries
 */
public interface ContentRetriever {
    List<Content> retrieve(Query query);
}

/**
 * Embedding-based content retrieval
 */
public class EmbeddingStoreContentRetriever implements ContentRetriever {
    public static Builder builder();
}

/**
 * Web search content retrieval
 */
public class WebSearchContentRetriever implements ContentRetriever {
    public WebSearchContentRetriever(WebSearchEngine webSearchEngine);
}

Thread Safety: DefaultRetrievalAugmentor is thread-safe if all configured components are thread-safe. EmbeddingStoreContentRetriever is thread-safe if the underlying EmbeddingStore is thread-safe.

Common Pitfalls:

  • Setting minScore too high (e.g., 0.9) may return no results - start with 0.6-0.7
  • maxResults too low may miss relevant content - balance between quality and cost
  • Not normalizing embeddings can cause similarity scores to be incorrect
  • Forgetting to configure Executor for parallel retrieval loses performance benefits

Edge Cases:

  • Empty query returns empty results (not error)
  • Query with no matches returns empty list
  • Multiple ContentRetrievers with QueryRouter may return duplicates - use ContentAggregator for deduplication

Performance Notes:

  • Embedding API calls are the bottleneck - batch embed documents during indexing
  • In-memory stores are fast but limited by RAM - use persistent stores for large datasets
  • Parallel retrieval with Executor can significantly speed up multi-source RAG
  • Cache frequently-accessed embeddings to reduce API costs

Cost Considerations:

  • Each query triggers embedding API call (cost per call)
  • Larger maxResults increases context size and LLM costs
  • Consider caching query embeddings for repeated queries

Related APIs: RAG (Retrieval Augmented Generation), Embedding Store


Data Types

Core data types for documents, embeddings, tools, and structured data.

public interface Document {
    String text();
    Metadata metadata();
    TextSegment toTextSegment();
}

public class TextSegment {
    public String text();
    public Metadata metadata();
}

public class Embedding {
    public float[] vector();
    public int dimension();
}

public class ToolSpecification {
    public String name();
    public String description();
    public JsonObjectSchema parameters();
}

Thread Safety: Document, TextSegment, and Embedding are immutable and thread-safe. Metadata is mutable - avoid concurrent modifications.

Common Pitfalls:

  • Document.text() may be very large - check length before processing
  • Embedding.vector() returns direct array reference - don't modify it
  • ToolSpecification description is critical for LLM - make it clear and specific
  • Metadata keys are case-sensitive

Edge Cases:

  • Document with empty text is valid
  • Embedding with zero dimension is invalid (throws exception in most stores)
  • ToolSpecification with empty parameters means no arguments

Related APIs: Data Types


AI Services

High-level API for creating AI-powered services by defining Java interfaces. AiServices provides implementations that handle chat models, streaming, memory, RAG, tools, guardrails, and various output types.

/**
 * Abstract class for building AI services from Java interfaces.
 * Supports system/user message templates, chat memory, RAG, tools, streaming,
 * moderation, and various return types.
 */
public abstract class AiServices<T> {
    /**
     * Create a simple AI service with a chat model
     * @param aiService Interface defining the AI service API
     * @param chatModel Chat model to use
     * @return Implementation of the AI service interface
     * @throws IllegalConfigurationException if configuration is invalid
     */
    public static <T> T create(Class<T> aiService, ChatModel chatModel);

    /**
     * Create a simple AI service with a streaming chat model
     * @param aiService Interface defining the AI service API
     * @param streamingChatModel Streaming chat model to use
     * @return Implementation of the AI service interface
     * @throws IllegalConfigurationException if configuration is invalid
     */
    public static <T> T create(Class<T> aiService, StreamingChatModel streamingChatModel);

    /**
     * Begin building an AI service with full configuration options
     * @param aiService Interface defining the AI service API
     * @return Builder for configuring the AI service
     */
    public static <T> AiServices<T> builder(Class<T> aiService);
}

Thread Safety: Generated AI service implementations are thread-safe. Multiple threads can call methods concurrently. However, if using ChatMemory without ChatMemoryProvider, memory is shared across threads.

Common Pitfalls:

  • Forgetting to call .build() on builder - returns builder, not service
  • Using ChatModel and StreamingChatModel together - choose one
  • Not configuring ChatMemoryProvider for multi-user scenarios - all users share memory
  • Setting both @SystemMessage annotation and systemMessageProvider() - annotation takes precedence
  • Returning TokenStream from non-streaming model throws runtime exception
  • Tool methods that block indefinitely can cause hangs - set timeouts

Edge Cases:

  • Interface with no methods is valid but useless
  • Method with no parameters and no @UserMessage annotation uses empty message
  • @V annotation on parameter without matching template variable is ignored
  • Multiple @MemoryId parameters throws IllegalConfigurationException
  • Return type not supported by OutputParser throws runtime exception

Performance Notes:

  • AI service method calls are blocking - don't call from UI thread
  • Tool execution happens synchronously unless executeToolsConcurrently() is enabled
  • maxSequentialToolsInvocations prevents infinite loops but adds latency
  • ChatMemoryProvider lookup happens on each call - use efficient Map implementations

Cost Considerations:

  • Each method call typically makes 1+ LLM API calls (cost per call)
  • Tool executions may trigger multiple API call rounds
  • Conversation memory increases context size and costs
  • RAG increases prompt size and costs

Exception Handling:

  • IllegalConfigurationException: Invalid builder configuration (e.g., missing chatModel)
  • ModerationException: Content flagged by moderation model (when @Moderate used)
  • RuntimeException: Various runtime errors (tool execution failures, parsing errors, etc.)

Testing Patterns:

// Use mock ChatModel for testing
ChatModel mockModel = (request) -> ChatResponse.builder()
    .aiMessage(AiMessage.from("Mocked response"))
    .build();

Assistant assistant = AiServices.create(Assistant.class, mockModel);
String response = assistant.chat("test");
assertEquals("Mocked response", response);

Related APIs: AI Services, Tools, Memory


Chat Memory

Chat memory implementations for maintaining conversation context. Supports message window and token window strategies with optional persistence.

/**
 * Provider interface for obtaining ChatMemory instances
 */
public interface ChatMemoryProvider {
    /**
     * Get ChatMemory for given memory ID (user/conversation)
     * @param memoryId Identifier for the memory (can be any type with proper equals/hashCode)
     * @return ChatMemory instance for the given ID
     */
    ChatMemory get(Object memoryId);
}

/**
 * ChatMemory implementation that retains a fixed number of most recent messages
 */
public class MessageWindowChatMemory implements ChatMemory {
    /**
     * Create with max message limit
     * @param maxMessages Maximum number of messages to retain
     * @return MessageWindowChatMemory instance
     */
    public static MessageWindowChatMemory withMaxMessages(int maxMessages);

    /**
     * Create builder for full configuration
     * @return Builder instance
     */
    public static Builder builder();
}

/**
 * ChatMemory implementation that retains messages within a token limit
 */
public class TokenWindowChatMemory implements ChatMemory {
    /**
     * Create with max token limit
     * @param maxTokens Maximum number of tokens to retain
     * @param tokenizer Token count estimator
     * @return TokenWindowChatMemory instance
     * @throws IllegalArgumentException if maxTokens <= 0 or tokenizer is null
     */
    public static TokenWindowChatMemory withMaxTokens(int maxTokens, TokenCountEstimator tokenizer);

    /**
     * Create builder for full configuration
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety: MessageWindowChatMemory and TokenWindowChatMemory are NOT thread-safe. Use ChatMemoryProvider with concurrent map for thread-safe multi-user scenarios.

Common Pitfalls:

  • Sharing single ChatMemory instance across users - all see same history
  • Not using ChatMemoryProvider for multi-user apps
  • Setting maxMessages too low (e.g., 2) loses important context
  • Setting maxMessages too high increases token costs
  • TokenWindowChatMemory requires accurate tokenizer - using wrong tokenizer causes issues
  • Forgetting to configure ChatMemoryStore for persistence - memory lost on restart

Edge Cases:

  • maxMessages=1 keeps only most recent message (loses all context)
  • Empty memory returns empty list (not null)
  • Adding null message is ignored
  • ChatMemory.clear() removes all messages
  • TokenWindowChatMemory may remove partial messages to stay under limit

Performance Notes:

  • Message-based windowing is faster than token-based (no tokenization overhead)
  • Token-based windowing is more accurate for model context limits
  • In-memory storage is fast but not persistent
  • Persistent ChatMemoryStore adds I/O latency

Cost Considerations:

  • Larger memory windows increase prompt size and API costs
  • Consider clearing memory periodically for long conversations
  • Token-based windowing optimizes for model context limits

Testing Patterns:

// Test memory isolation
ChatMemoryProvider provider = memoryId -> MessageWindowChatMemory.withMaxMessages(10);
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(mockModel)
    .chatMemoryProvider(provider)
    .build();

assistant.chat("user1", "My name is Alice");
assistant.chat("user2", "My name is Bob");
// Verify each user has separate memory

Related APIs: Chat Memory, AI Services


Document Processing

Loaders, parsers, splitters, and sources for working with documents. Supports file system, classpath, and URL sources.

/**
 * Document loader for loading documents from the file system
 */
public class FileSystemDocumentLoader {
    /**
     * Load a single document from path
     * @param filePath Path to the document
     * @return Loaded document
     * @throws RuntimeException if file cannot be read
     */
    public static Document loadDocument(Path filePath);

    /**
     * Load a single document with custom parser
     * @param filePath Path to the document
     * @param documentParser Parser to use
     * @return Loaded document
     * @throws RuntimeException if file cannot be read or parsing fails
     */
    public static Document loadDocument(Path filePath, DocumentParser documentParser);

    /**
     * Load all documents from directory (non-recursive)
     * @param directoryPath Path to directory
     * @return List of loaded documents
     * @throws RuntimeException if directory cannot be read
     */
    public static List<Document> loadDocuments(Path directoryPath);

    /**
     * Load documents recursively from directory
     * @param directoryPath Path to directory
     * @return List of loaded documents
     * @throws RuntimeException if directory cannot be read
     */
    public static List<Document> loadDocumentsRecursively(Path directoryPath);
}

/**
 * Utility class providing factory methods for recommended document splitters
 */
public class DocumentSplitters {
    /**
     * Create recursive splitter with token limits (recommended for generic text)
     * @param maxSegmentSizeInTokens Maximum segment size in tokens
     * @param maxOverlapSizeInTokens Maximum overlap size in tokens
     * @param tokenCountEstimator Token count estimator
     * @return Configured document splitter
     * @throws IllegalArgumentException if maxSegmentSize <= 0 or overlap >= maxSegmentSize
     */
    public static DocumentSplitter recursive(
        int maxSegmentSizeInTokens,
        int maxOverlapSizeInTokens,
        TokenCountEstimator tokenCountEstimator
    );

    /**
     * Create recursive splitter with character limits
     * @param maxSegmentSizeInChars Maximum segment size in characters
     * @param maxOverlapSizeInChars Maximum overlap size in characters
     * @return Configured document splitter
     * @throws IllegalArgumentException if maxSegmentSize <= 0 or overlap >= maxSegmentSize
     */
    public static DocumentSplitter recursive(
        int maxSegmentSizeInChars,
        int maxOverlapSizeInChars
    );
}

Thread Safety: Document loaders are stateless and thread-safe. Document splitters are stateless and can be reused across threads.

Common Pitfalls:

  • loadDocumentsRecursively() can take very long on large directory trees - consider timeout
  • No file extension filtering - loads all files (including .DS_Store, thumbs.db, etc.)
  • Binary files cause encoding issues - use appropriate parsers or filter by extension
  • Large files load entirely into memory - can cause OutOfMemoryError
  • maxSegmentSize smaller than largest sentence causes issues
  • overlap >= maxSegmentSize throws exception
  • Splitting without overlap loses context at boundaries

Edge Cases:

  • Empty directory returns empty list (not null)
  • File with no read permission throws RuntimeException
  • Empty file returns Document with empty text
  • Non-existent path throws RuntimeException
  • Symlinks are followed (can cause infinite loops if circular)

Performance Notes:

  • Recursive loading of many files is I/O bound - consider parallel processing
  • Parsing is CPU-intensive for large files
  • Splitting is CPU-intensive - cache results if possible
  • Token-based splitting requires tokenizer calls - slower than character-based

Cost Considerations:

  • More segments = more embeddings = higher embedding API costs
  • Smaller segments with overlap increase segment count
  • Balance segment size between retrieval precision and cost

Testing Patterns:

// Test with in-memory documents
Document doc = new Document("Test content", Metadata.from("source", "test"));
DocumentSplitter splitter = DocumentSplitters.recursive(100, 10);
List<TextSegment> segments = splitter.split(doc);
assertTrue(segments.size() > 0);

Related APIs: Document Processing, RAG


Output Parsing

Automatic conversion of LLM outputs to Java types including primitives, dates, enums, POJOs, and collections.

/**
 * Interface for parsing LLM output to desired types
 */
public interface OutputParser<T> {
    /**
     * Parse LLM output text to target type
     * @param text Output text from LLM
     * @return Parsed object of type T
     * @throws OutputParsingException if parsing fails
     */
    T parse(String text);

    /**
     * Get format instructions to include in prompt
     * @return Format instructions string
     */
    String formatInstructions();
}

Available output parsers:

  • Primitives: BooleanOutputParser, ByteOutputParser, ShortOutputParser, IntegerOutputParser, LongOutputParser, FloatOutputParser, DoubleOutputParser
  • Numbers: BigIntegerOutputParser, BigDecimalOutputParser
  • Dates/Times: DateOutputParser, LocalDateOutputParser, LocalTimeOutputParser, LocalDateTimeOutputParser
  • Enums: EnumOutputParser, EnumListOutputParser, EnumSetOutputParser, EnumCollectionOutputParser
  • Strings: StringListOutputParser, StringSetOutputParser, StringCollectionOutputParser
  • POJOs: PojoOutputParser, PojoListOutputParser, PojoSetOutputParser, PojoCollectionOutputParser

Thread Safety: OutputParser implementations are stateless and thread-safe.

Common Pitfalls:

  • LLM may not follow JSON schema exactly - validation needed
  • Complex nested POJOs may fail to parse - simplify structure
  • Enum parsing is case-sensitive by default
  • Date parsing uses system default locale - may not match LLM output format
  • POJO fields must be public or have setters
  • Collections may contain nulls if LLM output is malformed

Edge Cases:

  • Empty string input may throw OutputParsingException
  • LLM returning "null" as string vs actual null handling
  • POJO with nested collections of POJOs
  • Enum with spaces or special characters in name

Performance Notes:

  • JSON parsing is CPU-intensive for large outputs
  • Validating complex schemas adds overhead
  • Consider caching parsers if reusing same type

Exception Handling:

  • OutputParsingException: Thrown when LLM output doesn't match expected format
  • JsonProcessingException (wrapped): JSON syntax errors
  • IllegalArgumentException: Invalid configuration (e.g., invalid enum value)

Testing Patterns:

// Test parser with known output
record Person(String name, int age) {}
OutputParser<Person> parser = new PojoOutputParser<>(Person.class);

String llmOutput = "{\"name\": \"Alice\", \"age\": 30}";
Person person = parser.parse(llmOutput);
assertEquals("Alice", person.name());
assertEquals(30, person.age());

Related APIs: Output Parsing, AI Services


Tools

Framework for function calling and tool execution. Allows LLMs to call Java methods as tools with automatic JSON argument parsing.

/**
 * Interface for executing tools
 */
public interface ToolExecutor {
    /**
     * Execute tool with given request
     * @param toolExecutionRequest Request containing tool name and arguments
     * @param memoryId Memory ID for context
     * @return Result string from tool execution
     * @throws RuntimeException if tool execution fails and propagation is enabled
     */
    String execute(ToolExecutionRequest toolExecutionRequest, Object memoryId);
}

/**
 * Interface for providing tools dynamically
 */
public interface ToolProvider {
    /**
     * Provide tools for the given request
     * @param request Request containing context for tool selection
     * @return Result with tools to make available
     */
    ToolProviderResult provideTools(ToolProviderRequest request);
}

/**
 * Context object passed before tool execution
 */
public class BeforeToolExecution {
    // Contains tool execution request, memory ID, and context
}

/**
 * Represents a tool execution with request and result
 */
public class ToolExecution {
    // Contains tool execution request and result
}

Thread Safety: ToolExecutor implementations must be thread-safe if used in concurrent scenarios. Tool objects should be stateless or use proper synchronization.

Common Pitfalls:

  • Tool method exceptions propagate to LLM as error messages unless caught
  • Tools that take very long block the entire interaction
  • Tool descriptions that are too vague cause hallucination
  • Tool parameter descriptions missing cause incorrect arguments
  • Forgetting @Tool annotation means method is not exposed
  • Tool name collision (same name from different classes) causes unpredictable behavior
  • Tools with side effects (e.g., delete) need careful prompting to avoid unintended calls

Edge Cases:

  • Tool with no parameters still needs empty JSON object: {}
  • Tool returning null is converted to empty string
  • Tool throwing exception can be caught with ToolExecutionErrorHandler
  • LLM may hallucinate tool names not in specification
  • LLM may call multiple tools in single response
  • Tool arguments may be malformed JSON

Performance Notes:

  • Concurrent tool execution (executeToolsConcurrently()) speeds up parallel tool calls
  • Long-running tools should be async with immediate return pattern
  • Consider tool execution timeouts to prevent hangs

Cost Considerations:

  • Each tool call round-trip adds latency and API costs
  • maxSequentialToolsInvocations limits cost but may prevent task completion
  • Tools that return large results increase context size and costs

Exception Handling:

  • ToolArgumentsErrorHandler: Handle JSON parsing or type mismatch errors
  • ToolExecutionErrorHandler: Handle runtime errors during tool execution
  • HallucinatedToolNameStrategy: Handle when LLM invents non-existent tools

Testing Patterns:

// Mock tools for testing
class MockTools {
    @Tool("Get weather")
    String getWeather(String city) {
        return "Mocked: Sunny in " + city;
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(mockChatModel)
    .tools(new MockTools())
    .build();

// Test tool execution tracking
List<ToolExecution> executions = new ArrayList<>();
assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new MockTools())
    .afterToolExecution(executions::add)
    .build();

assistant.chat("What's the weather?");
assertFalse(executions.isEmpty());

Related APIs: Tools, AI Services


Guardrails

Input and output validation and filtering for AI services.

/**
 * Annotation for configuring input guardrails at class level
 */
@Target(TYPE)
@Retention(RUNTIME)
public @interface InputGuardrails {
    // Configuration for input guardrails
}

/**
 * Annotation for configuring output guardrails at class level
 */
@Target(TYPE)
@Retention(RUNTIME)
public @interface OutputGuardrails {
    // Configuration for output guardrails
}

Thread Safety: Guardrail implementations must be thread-safe as they're shared across all invocations.

Common Pitfalls:

  • Guardrails throwing exceptions break the interaction - return error messages instead
  • Too strict guardrails prevent legitimate requests
  • Expensive guardrails (e.g., calling external APIs) add latency
  • Stateful guardrails without proper synchronization cause race conditions
  • Order matters - first guardrail sees original input, subsequent ones see transformed input

Edge Cases:

  • Guardrail returning null vs empty string vs throwing exception
  • Multiple guardrails all rejecting input
  • Output guardrail modifying structure that parser expects
  • Guardrail for streaming responses (applied per token vs complete)

Performance Notes:

  • Guardrails add latency - keep them fast
  • Consider caching guardrail results for repeated inputs
  • Async guardrails can improve throughput

Testing Patterns:

// Test guardrail in isolation
class TestGuardrail implements Guardrail {
    public String validate(String input) {
        if (input.contains("bad")) {
            return "Input rejected";
        }
        return input;
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(mockModel)
    .inputGuardrails(new TestGuardrail())
    .build();

Related APIs: Guardrails, AI Services


Embedding Store

In-memory implementation of embedding store for vector similarity search.

/**
 * In-memory implementation of EmbeddingStore
 * Stores embeddings in memory without persistence
 */
public class InMemoryEmbeddingStore<Embedded> implements EmbeddingStore<Embedded> {
    /**
     * Default constructor
     */
    public InMemoryEmbeddingStore();

    /**
     * Load from file
     * @param file Path to file
     * @return InMemoryEmbeddingStore instance
     * @throws RuntimeException if file cannot be read or parsed
     */
    public static <Embedded> InMemoryEmbeddingStore<Embedded> fromFile(Path file);

    /**
     * Add embedding
     * @param embedding Embedding to add
     * @return Generated ID
     */
    public String add(Embedding embedding);

    /**
     * Add embedding with embedded object
     * @param embedding Embedding to add
     * @param embedded Embedded object to associate
     * @return Generated ID
     */
    public String add(Embedding embedding, Embedded embedded);

    /**
     * Find relevant embeddings
     * @param referenceEmbedding Reference embedding for similarity search
     * @param maxResults Maximum number of results
     * @param minScore Minimum similarity score (0.0 to 1.0)
     * @return List of embedding matches sorted by score descending
     */
    public List<EmbeddingMatch<Embedded>> findRelevant(
        Embedding referenceEmbedding,
        int maxResults,
        double minScore
    );

    /**
     * Serialize to file
     * @param file Path to file
     * @throws RuntimeException if file cannot be written
     */
    public void serializeToFile(Path file);
}

Thread Safety: InMemoryEmbeddingStore is thread-safe. All methods are synchronized for concurrent access.

Common Pitfalls:

  • All data is in memory - large stores can cause OutOfMemoryError
  • File serialization is synchronous and blocks - can be slow for large stores
  • minScore too high returns no results - start with 0.6-0.7
  • Not normalizing embeddings leads to incorrect cosine similarity scores
  • Adding duplicate embeddings creates multiple entries (no automatic deduplication)
  • Deserialization requires same Embedded type - class changes break loading

Edge Cases:

  • Empty store returns empty results (not null)
  • maxResults=0 returns empty list
  • minScore=0.0 returns all embeddings up to maxResults
  • referenceEmbedding with different dimension causes runtime error
  • Serializing empty store creates valid but empty file

Performance Notes:

  • Linear scan for similarity search - O(n) time complexity
  • Large stores (>100k embeddings) should use specialized vector databases
  • Batch additions are more efficient than individual adds
  • File I/O is the bottleneck for serialization

Memory Considerations:

  • Each embedding uses: dimension * 4 bytes (float) + embedded object size
  • 1 million 768-dimensional embeddings ≈ 3GB RAM (embeddings only)
  • Consider persistent stores for production use

Testing Patterns:

// Test with small in-memory store
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
Embedding emb1 = embeddingModel.embed("test").content();
TextSegment seg1 = TextSegment.from("test");
String id = store.add(emb1, seg1);

// Test similarity search
List<EmbeddingMatch<TextSegment>> results = store.findRelevant(emb1, 5, 0.7);
assertEquals(1, results.size());
assertEquals(1.0, results.get(0).score(), 0.01); // Perfect match

Related APIs: Embedding Store, RAG


Text Classification

Text classification using embedding-based similarity with labeled examples.

/**
 * Interface for classifying text based on a set of labels
 * Can return zero, one, or multiple labels for each classification
 */
public interface TextClassifier<L> {
    /**
     * Classify text
     * @param text Text to classify
     * @return List of labels (may be empty)
     */
    List<L> classify(String text);

    /**
     * Classify text with scores
     * @param text Text to classify
     * @return Classification result with scored labels
     */
    ClassificationResult<L> classifyWithScores(String text);
}

/**
 * TextClassifier implementation using EmbeddingModel and predefined examples
 * Classification performed by computing similarity between input embedding
 * and embeddings of labeled example texts
 */
public class EmbeddingModelTextClassifier<L> implements TextClassifier<L> {
    /**
     * Constructor with default values
     * @param embeddingModel Embedding model to use
     * @param examplesByLabel Map of labels to example texts
     * @throws IllegalArgumentException if embeddingModel is null or examplesByLabel is empty
     */
    public EmbeddingModelTextClassifier(
        EmbeddingModel embeddingModel,
        Map<L, ? extends Collection<String>> examplesByLabel
    );

    /**
     * Full constructor with configuration
     * @param embeddingModel Embedding model to use
     * @param examplesByLabel Map of labels to example texts
     * @param maxResults Maximum number of labels to return (default: 1)
     * @param minScore Minimum score threshold (default: 0.0)
     * @param meanToMaxScoreRatio Ratio for filtering results (default: 0.0)
     * @throws IllegalArgumentException if embeddingModel is null, examplesByLabel is empty,
     *         maxResults < 1, minScore < 0, or meanToMaxScoreRatio < 0
     */
    public EmbeddingModelTextClassifier(
        EmbeddingModel embeddingModel,
        Map<L, ? extends Collection<String>> examplesByLabel,
        int maxResults,
        double minScore,
        double meanToMaxScoreRatio
    );
}

/**
 * Represents the result of classification with scored labels
 */
public class ClassificationResult<L> {
    /**
     * Constructor
     * @param scoredLabels List of scored labels
     */
    public ClassificationResult(List<ScoredLabel<L>> scoredLabels);

    /**
     * Get scored labels
     * @return List of scored labels sorted by score descending
     */
    public List<ScoredLabel<L>> scoredLabels();
}

/**
 * Represents a classification label with associated score
 */
public class ScoredLabel<L> {
    /**
     * Constructor
     * @param label The label
     * @param score The score (0.0 to 1.0)
     */
    public ScoredLabel(L label, double score);

    /**
     * Get the label
     * @return The label
     */
    public L label();

    /**
     * Get the score
     * @return The score (0.0 to 1.0)
     */
    public double score();
}

Thread Safety: EmbeddingModelTextClassifier is thread-safe if the EmbeddingModel is thread-safe. Classification operations can be performed concurrently.

Common Pitfalls:

  • Too few examples per label (< 3) leads to poor accuracy
  • Examples not representative of actual data
  • Imbalanced examples (1 example for labelA, 100 for labelB) biases results
  • Labels with very similar examples cause confusion
  • maxResults=1 forces single-label even when multi-label is appropriate
  • minScore too high may return empty results
  • meanToMaxScoreRatio too high filters out valid secondary labels

Edge Cases:

  • Empty input text returns empty results or low-confidence labels
  • No labels meet minScore threshold - returns empty list
  • All labels have same score - arbitrary ordering
  • Label with single example vs multiple examples
  • Very short text (1-2 words) may not classify well

Performance Notes:

  • Classification embeds input and all examples - O(n) embedding calls on first use
  • Consider caching example embeddings (computed once at construction)
  • Batch classification of multiple texts is more efficient
  • Number of examples affects classification time

Cost Considerations:

  • Each classification requires 1 embedding API call
  • More examples per label increases initial embedding cost
  • Consider caching classifier instances to avoid re-embedding examples

Testing Patterns:

// Test with known examples
Map<String, List<String>> examples = Map.of(
    "positive", List.of("great", "excellent", "wonderful"),
    "negative", List.of("bad", "terrible", "awful")
);

TextClassifier<String> classifier = new EmbeddingModelTextClassifier<>(
    embeddingModel,
    examples
);

ClassificationResult<String> result = classifier.classifyWithScores("amazing product");
assertEquals("positive", result.scoredLabels().get(0).label());
assertTrue(result.scoredLabels().get(0).score() > 0.7);

Related APIs: Text Classification, Embedding Models


Chains (Deprecated)

Legacy chain API for sequential processing. Deprecated in favor of AiServices.

/**
 * Functional interface representing a chain step
 * Deprecated in favor of AiServices
 */
@Deprecated
public interface Chain<Input, Output> {
    /**
     * Execute the chain step
     * @param input Input to process
     * @return Output from processing
     */
    Output execute(Input input);
}

/**
 * A chain for conversing with a ChatModel while maintaining memory
 * Deprecated in favor of AiServices
 */
@Deprecated
public class ConversationalChain implements Chain<String, String> {
    /**
     * Create builder
     * @return Builder instance
     */
    public static ConversationalChainBuilder builder();

    /**
     * Execute chain with user message
     * @param userMessage User message
     * @return Response from chat model
     */
    public String execute(String userMessage);
}

/**
 * A chain for conversing with a ChatModel based on retrieved information
 * Supports RAG with RetrievalAugmentor
 * Deprecated in favor of AiServices
 */
@Deprecated
public class ConversationalRetrievalChain implements Chain<String, String> {
    /**
     * Create builder
     * @return Builder instance
     */
    public static Builder builder();

    /**
     * Execute chain with query
     * @param query Query to process
     * @return Response from chat model
     */
    public String execute(String query);
}

Migration Guide:

  • Use AiServices instead of ConversationalChain
  • Use AiServices with contentRetriever instead of ConversationalRetrievalChain
  • AiServices provides better type safety, streaming support, and tool integration

Related APIs: Chains, AI Services


Service Provider Interfaces

SPI interfaces for customization and framework integration.

/**
 * SPI factory interface for creating AI service contexts
 */
public interface AiServiceContextFactory {
    // Factory methods for creating AI service contexts
}

/**
 * SPI factory interface for creating AI services
 */
public interface AiServicesFactory {
    // Factory methods for creating AI services
}

/**
 * SPI adapter interface for token streams
 */
public interface TokenStreamAdapter {
    // Adapter methods for token streams
}

/**
 * SPI factory interface for creating guardrail service builders
 */
public interface GuardrailServiceBuilderFactory {
    // Factory methods for guardrail service builders
}

/**
 * SPI factory interface for creating JSON codecs for in-memory embedding store
 */
public interface InMemoryEmbeddingStoreJsonCodecFactory {
    // Factory methods for JSON codecs
}

Thread Safety: SPI implementations must be thread-safe as they're typically singletons.

Common Pitfalls:

  • Missing META-INF/services configuration file
  • Wrong service interface name in configuration
  • SPI implementation not on classpath
  • Multiple SPI implementations causing conflicts

Related APIs: Service Provider Interfaces


Comparison Tables

Memory Strategy Selection

ScenarioRecommended ApproachReason
Single user, short conversationsMessageWindowChatMemory(10-20)Simple and fast
Multiple users, short conversationsChatMemoryProvider + MessageWindowChatMemoryUser isolation
Long conversations near token limitTokenWindowChatMemoryPrecise token control
Need persistence across restartsChatMemoryStore integrationData durability
Extremely high concurrencyDistributed cache + ChatMemoryStoreScalability

RAG Content Retriever Selection

Data SourceRetriever TypeWhen to Use
Document databaseEmbeddingStoreContentRetrieverSemantic search on internal documents
Web contentWebSearchContentRetrieverReal-time information from web
SQL databaseCustom ContentRetrieverStructured data queries
Multiple sourcesQueryRouter + multiple retrieversHybrid search across systems
Graph databaseCustom ContentRetrieverRelationship-based retrieval

Output Parser Selection

Return TypeParserNotes
Simple types (String, int, boolean)Automatic primitive parsersNo configuration needed
EnumEnumOutputParserCase-sensitive by default
Single POJOPojoOutputParserRequires public fields or setters
List of POJOsPojoListOutputParserMay fail on malformed JSON
Date/TimeDate/LocalDate/LocalTime parsersFormat awareness needed
Complex nested objectsPojoOutputParser with nested classesValidate schema carefully

Common Workflows

Workflow 1: Building a Knowledge Base Chatbot

// Step 1: Load documents
List<Document> documents = FileSystemDocumentLoader
    .loadDocumentsRecursively(Paths.get("/path/to/docs"));

// Step 2: Split into chunks
DocumentSplitter splitter = DocumentSplitters.recursive(500, 50, tokenizer);
List<TextSegment> segments = new ArrayList<>();
for (Document doc : documents) {
    segments.addAll(splitter.split(doc));
}

// Step 3: Embed and store
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
for (int i = 0; i < segments.size(); i++) {
    Embedding embedding = embeddingModel.embed(segments.get(i).text()).content();
    store.add(embedding, segments.get(i));
}

// Step 4: Create RAG retriever
ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .build();

// Step 5: Build AI service
interface KnowledgeBot {
    String chat(@MemoryId String userId, String question);
}

KnowledgeBot bot = AiServices.builder(KnowledgeBot.class)
    .chatModel(chatModel)
    .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(retriever)
    .build();

// Step 6: Use the bot
String answer = bot.chat("user123", "How do I configure X?");

Workflow 2: Building an Agent with Tools

// Step 1: Define tools
class BusinessTools {
    @Tool("Get customer information by ID")
    String getCustomer(String customerId) {
        // Call customer service
        return "Customer: " + customerId;
    }

    @Tool("Get order status by order ID")
    String getOrderStatus(String orderId) {
        // Call order service
        return "Order status: Shipped";
    }

    @Tool("Create support ticket")
    String createTicket(String customerId, String issue) {
        // Create ticket in system
        return "Ticket created: #12345";
    }
}

// Step 2: Configure AI service with tools
interface SupportAgent {
    String chat(@MemoryId String sessionId, String message);
}

SupportAgent agent = AiServices.builder(SupportAgent.class)
    .chatModel(chatModel)
    .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(20))
    .tools(new BusinessTools())
    .systemMessage("You are a customer support agent. Use the available tools to help customers.")
    .beforeToolExecution(ctx ->
        log.info("Executing tool: " + ctx.toolExecutionRequest().name()))
    .afterToolExecution(exec ->
        log.info("Tool result: " + exec.result()))
    .build();

// Step 3: Handle customer requests
String response = agent.chat("session456",
    "What's the status of my order #67890?");
// Agent will automatically call getOrderStatus tool

Workflow 3: Extracting Structured Data at Scale

// Step 1: Define data structure
record Product(
    String name,
    String category,
    double price,
    List<String> features
) {}

// Step 2: Create extractor
interface ProductExtractor {
    List<Product> extractProducts(String text);
}

ProductExtractor extractor = AiServices.create(ProductExtractor.class, chatModel);

// Step 3: Process documents in parallel
List<Document> documents = FileSystemDocumentLoader.loadDocuments(inputPath);

ExecutorService executor = Executors.newFixedThreadPool(4);
List<Future<List<Product>>> futures = documents.stream()
    .map(doc -> executor.submit(() -> extractor.extractProducts(doc.text())))
    .toList();

// Step 4: Collect results
List<Product> allProducts = new ArrayList<>();
for (Future<List<Product>> future : futures) {
    try {
        allProducts.addAll(future.get());
    } catch (Exception e) {
        log.error("Extraction failed", e);
    }
}

executor.shutdown();

// Step 5: Save results
saveToDatabase(allProducts);

Resource Management and Best Practices

Connection Pooling

// Reuse chat model instances - they typically maintain connection pools
ChatModel chatModel = OpenAiChatModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName("gpt-4")
    .build();

// Don't create new instance per request
// BAD: new OpenAiChatModel(...) in hot path
// GOOD: Singleton or application-scoped instance

Memory Management

// For long-running applications, periodically clean up memory
ChatMemoryProvider provider = new ChatMemoryProvider() {
    private final Map<Object, ChatMemory> memories = new ConcurrentHashMap<>();
    private final ScheduledExecutorService cleaner =
        Executors.newSingleThreadScheduledExecutor();

    {
        // Clean up stale memories every hour
        cleaner.scheduleAtFixedRate(() -> {
            long cutoff = System.currentTimeMillis() - TimeUnit.HOURS.toMillis(1);
            memories.entrySet().removeIf(entry ->
                ((TimestampedMemory) entry.getValue()).lastAccessTime() < cutoff
            );
        }, 1, 1, TimeUnit.HOURS);
    }

    @Override
    public ChatMemory get(Object memoryId) {
        return memories.computeIfAbsent(memoryId,
            id -> new TimestampedMemory(MessageWindowChatMemory.withMaxMessages(20))
        );
    }
};

Error Recovery Patterns

// Retry with exponential backoff for transient failures
interface ResilientAssistant {
    @SystemMessage("You are a helpful assistant")
    String chat(String message);
}

ChatModel resilientModel = new ChatModel() {
    private final ChatModel delegate = actualChatModel;
    private final int maxRetries = 3;

    @Override
    public ChatResponse chat(ChatRequest request) {
        int attempt = 0;
        while (attempt < maxRetries) {
            try {
                return delegate.chat(request);
            } catch (Exception e) {
                attempt++;
                if (attempt >= maxRetries) throw e;

                long delay = (long) Math.pow(2, attempt) * 1000;
                try {
                    Thread.sleep(delay);
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException(ie);
                }
            }
        }
        throw new IllegalStateException("Should not reach here");
    }

    // Implement other methods...
};

Performance Optimization Checklist

  • Reuse ChatModel instances (don't create per request)
  • Use streaming for long responses to improve perceived performance
  • Configure appropriate memory window sizes (balance context vs cost)
  • Enable concurrent tool execution for parallel tool calls
  • Cache embeddings for frequently accessed documents
  • Use batch embedding for multiple documents
  • Set reasonable maxOutputTokens to prevent runaway generation
  • Implement request timeouts to prevent hanging
  • Use CDN or local storage for large documents
  • Profile and optimize tool execution time
  • Consider async/non-blocking patterns for high throughput
  • Monitor token usage to optimize costs
  • Use appropriate chunk sizes for RAG (balance precision vs performance)

Troubleshooting Guide

Issue: AI Service returns null

Possible causes:

  • No ChatModel or StreamingChatModel configured
  • Method return type not supported
  • LLM returned empty response

Solutions:

// Ensure model is configured
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel) // Must be set
    .build();

// Check return type is supported (String, POJOs, enums, primitives, etc.)

Issue: Tools not being called

Possible causes:

  • Tool description too vague
  • Tool parameters not documented
  • Model doesn't support tool calling
  • Tool name conflicts

Solutions:

// Make descriptions very specific
@Tool("Get the current weather forecast for a specific city. " +
      "Returns temperature, conditions, and humidity.")
String getWeather(
    @P("The city name, e.g., 'San Francisco' or 'New York'") String city
) {
    // ...
}

// Check model capabilities
if (chatModel.supportedCapabilities().contains(Capability.TOOLS)) {
    // Model supports tools
}

Issue: OutOfMemoryError with documents

Possible causes:

  • Loading too many documents at once
  • Document chunks too large
  • Embedding store too large for heap

Solutions:

// Process documents in batches
int batchSize = 100;
for (int i = 0; i < documents.size(); i += batchSize) {
    List<Document> batch = documents.subList(i,
        Math.min(i + batchSize, documents.size()));
    processDocuments(batch);
}

// Use persistent embedding store instead of in-memory
// Consider vector databases like Pinecone, Weaviate, etc.

Issue: RAG returns irrelevant results

Possible causes:

  • minScore too low
  • maxResults too high
  • Poor document chunking strategy
  • Embeddings not normalized

Solutions:

// Tune retrieval parameters
ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(3) // Start small
    .minScore(0.75) // Higher threshold
    .build();

// Use better chunking strategy
DocumentSplitter splitter = DocumentSplitters.recursive(
    300, // Smaller chunks
    50,  // More overlap
    tokenizer
);

See Also

  • Official LangChain4j Documentation
  • LangChain4j GitHub Repository
  • LangChain4j Examples
  • Community Discord

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j@1.11.0

docs

ai-services.md

chains.md

classification.md

data-types.md

document-processing.md

embedding-store.md

guardrails.md

index.md

memory.md

messages.md

models.md

output-parsing.md

prompts.md

rag.md

request-response.md

spi.md

tools.md

README.md

tile.json