tessl/maven-dev-langchain4j--langchain4j

Build LLM-powered applications in Java with support for chatbots, agents, RAG, tools, and much more

Overview

Eval results

Files

LangChain4j

Name: tessl/maven-dev-langchain4j--langchain4j
Author: tessl

LangChain4j is a Java library for building LLM-powered applications with support for chatbots, agents, RAG (Retrieval Augmented Generation), tools, guardrails, and much more. It provides a high-level API for working with chat models, streaming, memory management, document processing, embeddings, and various integrations.

Package Information

Package Name: dev.langchain4j:langchain4j
Package Type: maven
Language: Java
Installation:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j</artifactId>
    <version>1.11.0</version>
</dependency>

Core Imports

// Core AI Services API
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;
import dev.langchain4j.service.MemoryId;

// Memory management
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.memory.chat.TokenWindowChatMemory;
import dev.langchain4j.memory.chat.ChatMemoryProvider;

// Prompts and templates
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.input.structured.StructuredPrompt;

// RAG (Retrieval Augmented Generation)
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;

// Document processing
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;

// Embedding store
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;

// Tools
import dev.langchain4j.agent.tool.Tool;

Basic Usage

import dev.langchain4j.service.AiServices;

// Define your AI service interface
interface Assistant {
    String chat(String message);
}

// Create AI service with a chat model
Assistant assistant = AiServices.create(Assistant.class, chatModel);

// Use the assistant
String response = assistant.chat("Hello, how are you?");
System.out.println(response);

Quick Start Guide for Common Tasks

Task 1: Simple Chatbot

When to use: Basic Q&A, single-user conversations

// Minimal setup - no memory, no tools
Assistant assistant = AiServices.create(Assistant.class, chatModel);
String answer = assistant.chat("What is Java?");

Task 2: Multi-User Chatbot with Memory

When to use: Multiple users, need conversation history

interface Assistant {
    String chat(@MemoryId String userId, String message);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(20))
    .build();

// Each user gets separate conversation history
assistant.chat("user1", "My name is Alice");
assistant.chat("user1", "What's my name?"); // Remembers "Alice"

Task 3: RAG-Enabled Assistant

When to use: Need to answer questions from your documents

// 1. Load and embed documents
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingModel embeddingModel = /* your embedding model */;

List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");
for (Document doc : documents) {
    List<TextSegment> segments = DocumentSplitters.recursive(500, 50, tokenizer).split(doc);
    List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
    store.addAll(embeddings, segments);
}

// 2. Create retriever
ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .build();

// 3. Build assistant with RAG
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .contentRetriever(retriever)
    .build();

String answer = assistant.chat("What does the documentation say about X?");

Task 4: Tool-Using Agent

When to use: Need to call external functions/APIs

class Tools {
    @Tool("Get current weather")
    String getWeather(String city) {
        // Call weather API
        return "Sunny, 72°F";
    }

    @Tool("Search database")
    String searchDB(String query) {
        // Query database
        return "Found 10 results";
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new Tools())
    .build();

// LLM will automatically call tools when needed
String answer = assistant.chat("What's the weather in Paris?");

Task 5: Structured Output Extraction

When to use: Extract structured data from text

record Person(String name, int age, String city) {}

interface Extractor {
    Person extractPerson(String text);
}

Extractor extractor = AiServices.create(Extractor.class, chatModel);
Person person = extractor.extractPerson("John is 30 years old and lives in NYC");
// Returns: Person[name=John, age=30, city=NYC]

Decision Trees

When to Use What Memory Type?

Need conversation history?
├─ NO → Don't configure memory (stateless)
└─ YES → How many users?
    ├─ Single user → Use MessageWindowChatMemory.withMaxMessages(N)
    └─ Multiple users → Use ChatMemoryProvider with @MemoryId
        └─ Token limit concern?
            ├─ YES → TokenWindowChatMemory with TokenCountEstimator
            └─ NO → MessageWindowChatMemory with maxMessages

When to Use Streaming vs. Non-Streaming?

User experience requirement?
├─ Show response token-by-token → StreamingChatModel + TokenStream return type
└─ Get complete response at once → ChatModel + String/POJO return type

RAG Content Retriever Selection

Data source?
├─ Documents/knowledge base → EmbeddingStoreContentRetriever
├─ Web search → WebSearchContentRetriever
├─ Custom data source → Implement ContentRetriever interface
└─ Multiple sources → Use QueryRouter with multiple retrievers

Architecture

LangChain4j is built around several key components:

AI Services API: High-level interface for creating AI-powered services through Java interfaces
Memory Management: Chat memory implementations for maintaining conversation context
Document Processing: Loaders, parsers, and splitters for working with documents
RAG (Retrieval Augmented Generation): Content retrieval and augmentation for grounding LLM responses
Tools: Framework for function calling and tool execution
Guardrails: Input and output validation and filtering
Output Parsing: Automatic conversion of LLM outputs to Java types
Classification: Text classification using embedding-based similarity

Capabilities

Core Models

Core interfaces for interacting with language models, including chat, streaming, and embeddings.

public interface ChatModel {
    ChatResponse chat(ChatRequest chatRequest);
    String chat(String userMessage);
    ChatResponse chat(ChatMessage... messages);
    Set<Capability> supportedCapabilities();
}

public interface StreamingChatModel {
    void chat(ChatRequest chatRequest, StreamingChatResponseHandler handler);
    void chat(String userMessage, StreamingChatResponseHandler handler);
}

public interface EmbeddingModel {
    Response<Embedding> embed(String text);
    Response<List<Embedding>> embedAll(List<TextSegment> textSegments);
    int dimension();
}

Thread Safety: ChatModel and EmbeddingModel implementations are typically thread-safe. StreamingChatModel requires careful handling of concurrent requests to avoid handler confusion.

Common Pitfalls:

Don't share StreamingChatResponseHandler instances between multiple concurrent calls
Check supportedCapabilities() before using advanced features like tool calling or JSON response formats
Be aware of rate limits when making multiple concurrent requests

Related APIs: Chat and Language Models

Chat Messages

Message types for chat interactions including UserMessage, AiMessage, SystemMessage, and multimodal content support.

public class UserMessage implements ChatMessage {
    public UserMessage(String text);
    public UserMessage(List<Content> contents);
    public String singleText();
    public List<Content> contents();
}

public class AiMessage implements ChatMessage {
    public AiMessage(String text);
    public String text();
    public String thinking();
    public List<ToolExecutionRequest> toolExecutionRequests();
}

public class SystemMessage implements ChatMessage {
    public SystemMessage(String text);
    public String text();
}

Thread Safety: Message classes are immutable and thread-safe.

Common Pitfalls:

Calling singleText() on multimodal UserMessage throws RuntimeException - use hasSingleText() first
AiMessage may have null text when it contains only tool execution requests
Don't modify the lists returned by contents() or toolExecutionRequests() - they may be unmodifiable

Edge Cases:

Empty UserMessage text is allowed but may cause issues with some models
UserMessage with name but no content is valid
AiMessage with both text and tool requests is valid (model can provide explanation + request tools)

Related APIs: Chat Messages

Requests and Responses

Request and response types for chat model interactions with comprehensive parameter control.

public class ChatRequest {
    public List<ChatMessage> messages();
    public ChatRequestParameters parameters();
    public static Builder builder();
}

public class ChatResponse {
    public AiMessage aiMessage();
    public TokenUsage tokenUsage();
    public FinishReason finishReason();
}

public interface StreamingChatResponseHandler {
    void onPartialResponse(String partialResponse);
    void onCompleteResponse(ChatResponse completeResponse);
    void onError(Throwable error);
}

Thread Safety: ChatRequest and ChatResponse are immutable. StreamingChatResponseHandler callbacks are invoked sequentially on the same thread.

Common Pitfalls:

Temperature values outside model's supported range are silently clamped by some providers
stopSequences may not work with all models - check provider documentation
TokenUsage may be null for some model providers
FinishReason.LENGTH means output was truncated - increase maxOutputTokens

Performance Notes:

Lower temperature (0.0-0.3) is faster and more deterministic
Higher temperature (0.7-1.0) is slower and more creative
Setting maxOutputTokens prevents runaway generations and controls costs

Related APIs: Chat Requests and Responses

Prompts and Templates

Template system for creating reusable prompts with variable substitution. Supports structured prompts and automatic date/time injection.

/**
 * Represents a prompt (input text sent to LLM)
 */
public class Prompt {
    public Prompt(String text);
    public String text();
    public UserMessage toUserMessage();
    public SystemMessage toSystemMessage();
    public static Prompt from(String text);
}

/**
 * Template with {{variable}} placeholders
 * Special variables: {{current_date}}, {{current_time}}, {{current_date_time}}
 */
public class PromptTemplate {
    public PromptTemplate(String template);
    public Prompt apply(Object value);
    public Prompt apply(Map<String, Object> variables);
    public static PromptTemplate from(String template);
}

/**
 * Annotation for structured prompts on Java classes
 */
@Target(TYPE)
@Retention(RUNTIME)
public @interface StructuredPrompt {
    String[] value();
    String delimiter() default "\n";
}

Thread Safety: PromptTemplate is thread-safe and can be reused across threads. Prompt instances are immutable.

Common Pitfalls:

Missing variables in template throw IllegalArgumentException at apply() time
Variable names are case-sensitive: {{Name}} != {{name}}
Special date/time variables use system default timezone - set Clock for custom timezone
@StructuredPrompt requires all fields to be accessible (public or with getters)

Performance Notes:

Create PromptTemplate once and reuse - template parsing has overhead
StructuredPromptProcessor uses reflection - cache results for repeated use

Related APIs: Prompts and Templates

RAG (Retrieval Augmented Generation)

Comprehensive RAG framework for augmenting LLM responses with retrieved information. Supports query transformation, routing, content retrieval, aggregation, and injection.

/**
 * Entry point for RAG flow
 */
public interface RetrievalAugmentor {
    AugmentationResult augment(AugmentationRequest augmentationRequest);
}

/**
 * Default RAG implementation with full pipeline
 */
public class DefaultRetrievalAugmentor implements RetrievalAugmentor {
    public static Builder builder();
}

/**
 * Retrieves content from data sources using queries
 */
public interface ContentRetriever {
    List<Content> retrieve(Query query);
}

/**
 * Embedding-based content retrieval
 */
public class EmbeddingStoreContentRetriever implements ContentRetriever {
    public static Builder builder();
}

/**
 * Web search content retrieval
 */
public class WebSearchContentRetriever implements ContentRetriever {
    public WebSearchContentRetriever(WebSearchEngine webSearchEngine);
}

Thread Safety: DefaultRetrievalAugmentor is thread-safe if all configured components are thread-safe. EmbeddingStoreContentRetriever is thread-safe if the underlying EmbeddingStore is thread-safe.

Common Pitfalls:

Setting minScore too high (e.g., 0.9) may return no results - start with 0.6-0.7
maxResults too low may miss relevant content - balance between quality and cost
Not normalizing embeddings can cause similarity scores to be incorrect
Forgetting to configure Executor for parallel retrieval loses performance benefits

Edge Cases:

Empty query returns empty results (not error)
Query with no matches returns empty list
Multiple ContentRetrievers with QueryRouter may return duplicates - use ContentAggregator for deduplication

Performance Notes:

Embedding API calls are the bottleneck - batch embed documents during indexing
In-memory stores are fast but limited by RAM - use persistent stores for large datasets
Parallel retrieval with Executor can significantly speed up multi-source RAG
Cache frequently-accessed embeddings to reduce API costs

Cost Considerations:

Each query triggers embedding API call (cost per call)
Larger maxResults increases context size and LLM costs
Consider caching query embeddings for repeated queries

Data Types

Core data types for documents, embeddings, tools, and structured data.

public interface Document {
    String text();
    Metadata metadata();
    TextSegment toTextSegment();
}

public class TextSegment {
    public String text();
    public Metadata metadata();
}

public class Embedding {
    public float[] vector();
    public int dimension();
}

public class ToolSpecification {
    public String name();
    public String description();
    public JsonObjectSchema parameters();
}

Thread Safety: Document, TextSegment, and Embedding are immutable and thread-safe. Metadata is mutable - avoid concurrent modifications.

Common Pitfalls:

Document.text() may be very large - check length before processing
Embedding.vector() returns direct array reference - don't modify it
ToolSpecification description is critical for LLM - make it clear and specific
Metadata keys are case-sensitive

Edge Cases:

Document with empty text is valid
Embedding with zero dimension is invalid (throws exception in most stores)
ToolSpecification with empty parameters means no arguments

Related APIs: Data Types

AI Services

High-level API for creating AI-powered services by defining Java interfaces. AiServices provides implementations that handle chat models, streaming, memory, RAG, tools, guardrails, and various output types.

/**
 * Abstract class for building AI services from Java interfaces.
 * Supports system/user message templates, chat memory, RAG, tools, streaming,
 * moderation, and various return types.
 */
public abstract class AiServices<T> {
    /**
     * Create a simple AI service with a chat model
     * @param aiService Interface defining the AI service API
     * @param chatModel Chat model to use
     * @return Implementation of the AI service interface
     * @throws IllegalConfigurationException if configuration is invalid
     */
    public static <T> T create(Class<T> aiService, ChatModel chatModel);

    /**
     * Create a simple AI service with a streaming chat model
     * @param aiService Interface defining the AI service API
     * @param streamingChatModel Streaming chat model to use
     * @return Implementation of the AI service interface
     * @throws IllegalConfigurationException if configuration is invalid
     */
    public static <T> T create(Class<T> aiService, StreamingChatModel streamingChatModel);

    /**
     * Begin building an AI service with full configuration options
     * @param aiService Interface defining the AI service API
     * @return Builder for configuring the AI service
     */
    public static <T> AiServices<T> builder(Class<T> aiService);
}

Thread Safety: Generated AI service implementations are thread-safe. Multiple threads can call methods concurrently. However, if using ChatMemory without ChatMemoryProvider, memory is shared across threads.

Common Pitfalls:

Forgetting to call .build() on builder - returns builder, not service
Using ChatModel and StreamingChatModel together - choose one
Not configuring ChatMemoryProvider for multi-user scenarios - all users share memory
Setting both @SystemMessage annotation and systemMessageProvider() - annotation takes precedence
Returning TokenStream from non-streaming model throws runtime exception
Tool methods that block indefinitely can cause hangs - set timeouts

Edge Cases:

Interface with no methods is valid but useless
Method with no parameters and no @UserMessage annotation uses empty message
@V annotation on parameter without matching template variable is ignored
Multiple @MemoryId parameters throws IllegalConfigurationException
Return type not supported by OutputParser throws runtime exception

Performance Notes:

AI service method calls are blocking - don't call from UI thread
Tool execution happens synchronously unless executeToolsConcurrently() is enabled
maxSequentialToolsInvocations prevents infinite loops but adds latency
ChatMemoryProvider lookup happens on each call - use efficient Map implementations

Cost Considerations:

Each method call typically makes 1+ LLM API calls (cost per call)
Tool executions may trigger multiple API call rounds
Conversation memory increases context size and costs
RAG increases prompt size and costs

Exception Handling:

IllegalConfigurationException: Invalid builder configuration (e.g., missing chatModel)
ModerationException: Content flagged by moderation model (when @Moderate used)
RuntimeException: Various runtime errors (tool execution failures, parsing errors, etc.)

Testing Patterns:

// Use mock ChatModel for testing
ChatModel mockModel = (request) -> ChatResponse.builder()
    .aiMessage(AiMessage.from("Mocked response"))
    .build();

Assistant assistant = AiServices.create(Assistant.class, mockModel);
String response = assistant.chat("test");
assertEquals("Mocked response", response);

Related APIs: AI Services, Tools, Memory

Chat Memory

Chat memory implementations for maintaining conversation context. Supports message window and token window strategies with optional persistence.

/**
 * Provider interface for obtaining ChatMemory instances
 */
public interface ChatMemoryProvider {
    /**
     * Get ChatMemory for given memory ID (user/conversation)
     * @param memoryId Identifier for the memory (can be any type with proper equals/hashCode)
     * @return ChatMemory instance for the given ID
     */
    ChatMemory get(Object memoryId);
}

/**
 * ChatMemory implementation that retains a fixed number of most recent messages
 */
public class MessageWindowChatMemory implements ChatMemory {
    /**
     * Create with max message limit
     * @param maxMessages Maximum number of messages to retain
     * @return MessageWindowChatMemory instance
     */
    public static MessageWindowChatMemory withMaxMessages(int maxMessages);

    /**
     * Create builder for full configuration
     * @return Builder instance
     */
    public static Builder builder();
}

/**
 * ChatMemory implementation that retains messages within a token limit
 */
public class TokenWindowChatMemory implements ChatMemory {
    /**
     * Create with max token limit
     * @param maxTokens Maximum number of tokens to retain
     * @param tokenizer Token count estimator
     * @return TokenWindowChatMemory instance
     * @throws IllegalArgumentException if maxTokens <= 0 or tokenizer is null
     */
    public static TokenWindowChatMemory withMaxTokens(int maxTokens, TokenCountEstimator tokenizer);

    /**
     * Create builder for full configuration
     * @return Builder instance
     */
    public static Builder builder();
}

Thread Safety: MessageWindowChatMemory and TokenWindowChatMemory are NOT thread-safe. Use ChatMemoryProvider with concurrent map for thread-safe multi-user scenarios.

Common Pitfalls:

Sharing single ChatMemory instance across users - all see same history
Not using ChatMemoryProvider for multi-user apps
Setting maxMessages too low (e.g., 2) loses important context
Setting maxMessages too high increases token costs
TokenWindowChatMemory requires accurate tokenizer - using wrong tokenizer causes issues
Forgetting to configure ChatMemoryStore for persistence - memory lost on restart

Edge Cases:

maxMessages=1 keeps only most recent message (loses all context)
Empty memory returns empty list (not null)
Adding null message is ignored
ChatMemory.clear() removes all messages
TokenWindowChatMemory may remove partial messages to stay under limit

Performance Notes:

Message-based windowing is faster than token-based (no tokenization overhead)
Token-based windowing is more accurate for model context limits
In-memory storage is fast but not persistent
Persistent ChatMemoryStore adds I/O latency

Cost Considerations:

Larger memory windows increase prompt size and API costs
Consider clearing memory periodically for long conversations
Token-based windowing optimizes for model context limits

Testing Patterns:

// Test memory isolation
ChatMemoryProvider provider = memoryId -> MessageWindowChatMemory.withMaxMessages(10);
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(mockModel)
    .chatMemoryProvider(provider)
    .build();

assistant.chat("user1", "My name is Alice");
assistant.chat("user2", "My name is Bob");
// Verify each user has separate memory

Related APIs: Chat Memory, AI Services

Document Processing

Loaders, parsers, splitters, and sources for working with documents. Supports file system, classpath, and URL sources.

/**
 * Document loader for loading documents from the file system
 */
public class FileSystemDocumentLoader {
    /**
     * Load a single document from path
     * @param filePath Path to the document
     * @return Loaded document
     * @throws RuntimeException if file cannot be read
     */
    public static Document loadDocument(Path filePath);

    /**
     * Load a single document with custom parser
     * @param filePath Path to the document
     * @param documentParser Parser to use
     * @return Loaded document
     * @throws RuntimeException if file cannot be read or parsing fails
     */
    public static Document loadDocument(Path filePath, DocumentParser documentParser);

    /**
     * Load all documents from directory (non-recursive)
     * @param directoryPath Path to directory
     * @return List of loaded documents
     * @throws RuntimeException if directory cannot be read
     */
    public static List<Document> loadDocuments(Path directoryPath);

    /**
     * Load documents recursively from directory
     * @param directoryPath Path to directory
     * @return List of loaded documents
     * @throws RuntimeException if directory cannot be read
     */
    public static List<Document> loadDocumentsRecursively(Path directoryPath);
}

/**
 * Utility class providing factory methods for recommended document splitters
 */
public class DocumentSplitters {
    /**
     * Create recursive splitter with token limits (recommended for generic text)
     * @param maxSegmentSizeInTokens Maximum segment size in tokens
     * @param maxOverlapSizeInTokens Maximum overlap size in tokens
     * @param tokenCountEstimator Token count estimator
     * @return Configured document splitter
     * @throws IllegalArgumentException if maxSegmentSize <= 0 or overlap >= maxSegmentSize
     */
    public static DocumentSplitter recursive(
        int maxSegmentSizeInTokens,
        int maxOverlapSizeInTokens,
        TokenCountEstimator tokenCountEstimator
    );

    /**
     * Create recursive splitter with character limits
     * @param maxSegmentSizeInChars Maximum segment size in characters
     * @param maxOverlapSizeInChars Maximum overlap size in characters
     * @return Configured document splitter
     * @throws IllegalArgumentException if maxSegmentSize <= 0 or overlap >= maxSegmentSize
     */
    public static DocumentSplitter recursive(
        int maxSegmentSizeInChars,
        int maxOverlapSizeInChars
    );
}

Thread Safety: Document loaders are stateless and thread-safe. Document splitters are stateless and can be reused across threads.

Common Pitfalls:

loadDocumentsRecursively() can take very long on large directory trees - consider timeout
No file extension filtering - loads all files (including .DS_Store, thumbs.db, etc.)
Binary files cause encoding issues - use appropriate parsers or filter by extension
Large files load entirely into memory - can cause OutOfMemoryError
maxSegmentSize smaller than largest sentence causes issues
overlap >= maxSegmentSize throws exception
Splitting without overlap loses context at boundaries

Edge Cases:

Empty directory returns empty list (not null)
File with no read permission throws RuntimeException
Empty file returns Document with empty text
Non-existent path throws RuntimeException
Symlinks are followed (can cause infinite loops if circular)

Performance Notes:

Recursive loading of many files is I/O bound - consider parallel processing
Parsing is CPU-intensive for large files
Splitting is CPU-intensive - cache results if possible
Token-based splitting requires tokenizer calls - slower than character-based

Cost Considerations:

More segments = more embeddings = higher embedding API costs
Smaller segments with overlap increase segment count
Balance segment size between retrieval precision and cost

Testing Patterns:

// Test with in-memory documents
Document doc = new Document("Test content", Metadata.from("source", "test"));
DocumentSplitter splitter = DocumentSplitters.recursive(100, 10);
List<TextSegment> segments = splitter.split(doc);
assertTrue(segments.size() > 0);

Related APIs: Document Processing, RAG

Output Parsing

Automatic conversion of LLM outputs to Java types including primitives, dates, enums, POJOs, and collections.

/**
 * Interface for parsing LLM output to desired types
 */
public interface OutputParser<T> {
    /**
     * Parse LLM output text to target type
     * @param text Output text from LLM
     * @return Parsed object of type T
     * @throws OutputParsingException if parsing fails
     */
    T parse(String text);

    /**
     * Get format instructions to include in prompt
     * @return Format instructions string
     */
    String formatInstructions();
}

Available output parsers:

Primitives: BooleanOutputParser, ByteOutputParser, ShortOutputParser, IntegerOutputParser, LongOutputParser, FloatOutputParser, DoubleOutputParser
Numbers: BigIntegerOutputParser, BigDecimalOutputParser
Dates/Times: DateOutputParser, LocalDateOutputParser, LocalTimeOutputParser, LocalDateTimeOutputParser
Enums: EnumOutputParser, EnumListOutputParser, EnumSetOutputParser, EnumCollectionOutputParser
Strings: StringListOutputParser, StringSetOutputParser, StringCollectionOutputParser
POJOs: PojoOutputParser, PojoListOutputParser, PojoSetOutputParser, PojoCollectionOutputParser

Thread Safety: OutputParser implementations are stateless and thread-safe.

Common Pitfalls:

LLM may not follow JSON schema exactly - validation needed
Complex nested POJOs may fail to parse - simplify structure
Enum parsing is case-sensitive by default
Date parsing uses system default locale - may not match LLM output format
POJO fields must be public or have setters
Collections may contain nulls if LLM output is malformed

Edge Cases:

Empty string input may throw OutputParsingException
LLM returning "null" as string vs actual null handling
POJO with nested collections of POJOs
Enum with spaces or special characters in name

Performance Notes:

JSON parsing is CPU-intensive for large outputs
Validating complex schemas adds overhead
Consider caching parsers if reusing same type

Exception Handling:

OutputParsingException: Thrown when LLM output doesn't match expected format
JsonProcessingException (wrapped): JSON syntax errors
IllegalArgumentException: Invalid configuration (e.g., invalid enum value)

Testing Patterns:

// Test parser with known output
record Person(String name, int age) {}
OutputParser<Person> parser = new PojoOutputParser<>(Person.class);

String llmOutput = "{\"name\": \"Alice\", \"age\": 30}";
Person person = parser.parse(llmOutput);
assertEquals("Alice", person.name());
assertEquals(30, person.age());

Related APIs: Output Parsing, AI Services

Tools

Framework for function calling and tool execution. Allows LLMs to call Java methods as tools with automatic JSON argument parsing.

/**
 * Interface for executing tools
 */
public interface ToolExecutor {
    /**
     * Execute tool with given request
     * @param toolExecutionRequest Request containing tool name and arguments
     * @param memoryId Memory ID for context
     * @return Result string from tool execution
     * @throws RuntimeException if tool execution fails and propagation is enabled
     */
    String execute(ToolExecutionRequest toolExecutionRequest, Object memoryId);
}

/**
 * Interface for providing tools dynamically
 */
public interface ToolProvider {
    /**
     * Provide tools for the given request
     * @param request Request containing context for tool selection
     * @return Result with tools to make available
     */
    ToolProviderResult provideTools(ToolProviderRequest request);
}

/**
 * Context object passed before tool execution
 */
public class BeforeToolExecution {
    // Contains tool execution request, memory ID, and context
}

/**
 * Represents a tool execution with request and result
 */
public class ToolExecution {
    // Contains tool execution request and result
}

Thread Safety: ToolExecutor implementations must be thread-safe if used in concurrent scenarios. Tool objects should be stateless or use proper synchronization.

Common Pitfalls:

Tool method exceptions propagate to LLM as error messages unless caught
Tools that take very long block the entire interaction
Tool descriptions that are too vague cause hallucination
Tool parameter descriptions missing cause incorrect arguments
Forgetting @Tool annotation means method is not exposed
Tool name collision (same name from different classes) causes unpredictable behavior
Tools with side effects (e.g., delete) need careful prompting to avoid unintended calls

Edge Cases:

Tool with no parameters still needs empty JSON object: {}
Tool returning null is converted to empty string
Tool throwing exception can be caught with ToolExecutionErrorHandler
LLM may hallucinate tool names not in specification
LLM may call multiple tools in single response
Tool arguments may be malformed JSON

Performance Notes:

Concurrent tool execution (executeToolsConcurrently()) speeds up parallel tool calls
Long-running tools should be async with immediate return pattern
Consider tool execution timeouts to prevent hangs

Cost Considerations:

Each tool call round-trip adds latency and API costs
maxSequentialToolsInvocations limits cost but may prevent task completion
Tools that return large results increase context size and costs

Exception Handling:

ToolArgumentsErrorHandler: Handle JSON parsing or type mismatch errors
ToolExecutionErrorHandler: Handle runtime errors during tool execution
HallucinatedToolNameStrategy: Handle when LLM invents non-existent tools

Testing Patterns:

// Mock tools for testing
class MockTools {
    @Tool("Get weather")
    String getWeather(String city) {
        return "Mocked: Sunny in " + city;
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(mockChatModel)
    .tools(new MockTools())
    .build();

// Test tool execution tracking
List<ToolExecution> executions = new ArrayList<>();
assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .tools(new MockTools())
    .afterToolExecution(executions::add)
    .build();

assistant.chat("What's the weather?");
assertFalse(executions.isEmpty());

Related APIs: Tools, AI Services

Guardrails

Input and output validation and filtering for AI services.

/**
 * Annotation for configuring input guardrails at class level
 */
@Target(TYPE)
@Retention(RUNTIME)
public @interface InputGuardrails {
    // Configuration for input guardrails
}

/**
 * Annotation for configuring output guardrails at class level
 */
@Target(TYPE)
@Retention(RUNTIME)
public @interface OutputGuardrails {
    // Configuration for output guardrails
}

Thread Safety: Guardrail implementations must be thread-safe as they're shared across all invocations.

Common Pitfalls:

Guardrails throwing exceptions break the interaction - return error messages instead
Too strict guardrails prevent legitimate requests
Expensive guardrails (e.g., calling external APIs) add latency
Stateful guardrails without proper synchronization cause race conditions
Order matters - first guardrail sees original input, subsequent ones see transformed input

Edge Cases:

Guardrail returning null vs empty string vs throwing exception
Multiple guardrails all rejecting input
Output guardrail modifying structure that parser expects
Guardrail for streaming responses (applied per token vs complete)

Performance Notes:

Guardrails add latency - keep them fast
Consider caching guardrail results for repeated inputs
Async guardrails can improve throughput

Testing Patterns:

// Test guardrail in isolation
class TestGuardrail implements Guardrail {
    public String validate(String input) {
        if (input.contains("bad")) {
            return "Input rejected";
        }
        return input;
    }
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(mockModel)
    .inputGuardrails(new TestGuardrail())
    .build();

Related APIs: Guardrails, AI Services

Embedding Store

In-memory implementation of embedding store for vector similarity search.

/**
 * In-memory implementation of EmbeddingStore
 * Stores embeddings in memory without persistence
 */
public class InMemoryEmbeddingStore<Embedded> implements EmbeddingStore<Embedded> {
    /**
     * Default constructor
     */
    public InMemoryEmbeddingStore();

    /**
     * Load from file
     * @param file Path to file
     * @return InMemoryEmbeddingStore instance
     * @throws RuntimeException if file cannot be read or parsed
     */
    public static <Embedded> InMemoryEmbeddingStore<Embedded> fromFile(Path file);

    /**
     * Add embedding
     * @param embedding Embedding to add
     * @return Generated ID
     */
    public String add(Embedding embedding);

    /**
     * Add embedding with embedded object
     * @param embedding Embedding to add
     * @param embedded Embedded object to associate
     * @return Generated ID
     */
    public String add(Embedding embedding, Embedded embedded);

    /**
     * Find relevant embeddings
     * @param referenceEmbedding Reference embedding for similarity search
     * @param maxResults Maximum number of results
     * @param minScore Minimum similarity score (0.0 to 1.0)
     * @return List of embedding matches sorted by score descending
     */
    public List<EmbeddingMatch<Embedded>> findRelevant(
        Embedding referenceEmbedding,
        int maxResults,
        double minScore
    );

    /**
     * Serialize to file
     * @param file Path to file
     * @throws RuntimeException if file cannot be written
     */
    public void serializeToFile(Path file);
}

Thread Safety: InMemoryEmbeddingStore is thread-safe. All methods are synchronized for concurrent access.

Common Pitfalls:

All data is in memory - large stores can cause OutOfMemoryError
File serialization is synchronous and blocks - can be slow for large stores
minScore too high returns no results - start with 0.6-0.7
Not normalizing embeddings leads to incorrect cosine similarity scores
Adding duplicate embeddings creates multiple entries (no automatic deduplication)
Deserialization requires same Embedded type - class changes break loading

Edge Cases:

Empty store returns empty results (not null)
maxResults=0 returns empty list
minScore=0.0 returns all embeddings up to maxResults
referenceEmbedding with different dimension causes runtime error
Serializing empty store creates valid but empty file

Performance Notes:

Linear scan for similarity search - O(n) time complexity
Large stores (>100k embeddings) should use specialized vector databases
Batch additions are more efficient than individual adds
File I/O is the bottleneck for serialization

Memory Considerations:

Each embedding uses: dimension * 4 bytes (float) + embedded object size
1 million 768-dimensional embeddings ≈ 3GB RAM (embeddings only)
Consider persistent stores for production use

Testing Patterns:

// Test with small in-memory store
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
Embedding emb1 = embeddingModel.embed("test").content();
TextSegment seg1 = TextSegment.from("test");
String id = store.add(emb1, seg1);

// Test similarity search
List<EmbeddingMatch<TextSegment>> results = store.findRelevant(emb1, 5, 0.7);
assertEquals(1, results.size());
assertEquals(1.0, results.get(0).score(), 0.01); // Perfect match

Related APIs: Embedding Store, RAG

Text Classification

Text classification using embedding-based similarity with labeled examples.

/**
 * Interface for classifying text based on a set of labels
 * Can return zero, one, or multiple labels for each classification
 */
public interface TextClassifier<L> {
    /**
     * Classify text
     * @param text Text to classify
     * @return List of labels (may be empty)
     */
    List<L> classify(String text);

    /**
     * Classify text with scores
     * @param text Text to classify
     * @return Classification result with scored labels
     */
    ClassificationResult<L> classifyWithScores(String text);
}

/**
 * TextClassifier implementation using EmbeddingModel and predefined examples
 * Classification performed by computing similarity between input embedding
 * and embeddings of labeled example texts
 */
public class EmbeddingModelTextClassifier<L> implements TextClassifier<L> {
    /**
     * Constructor with default values
     * @param embeddingModel Embedding model to use
     * @param examplesByLabel Map of labels to example texts
     * @throws IllegalArgumentException if embeddingModel is null or examplesByLabel is empty
     */
    public EmbeddingModelTextClassifier(
        EmbeddingModel embeddingModel,
        Map<L, ? extends Collection<String>> examplesByLabel
    );

    /**
     * Full constructor with configuration
     * @param embeddingModel Embedding model to use
     * @param examplesByLabel Map of labels to example texts
     * @param maxResults Maximum number of labels to return (default: 1)
     * @param minScore Minimum score threshold (default: 0.0)
     * @param meanToMaxScoreRatio Ratio for filtering results (default: 0.0)
     * @throws IllegalArgumentException if embeddingModel is null, examplesByLabel is empty,
     *         maxResults < 1, minScore < 0, or meanToMaxScoreRatio < 0
     */
    public EmbeddingModelTextClassifier(
        EmbeddingModel embeddingModel,
        Map<L, ? extends Collection<String>> examplesByLabel,
        int maxResults,
        double minScore,
        double meanToMaxScoreRatio
    );
}

/**
 * Represents the result of classification with scored labels
 */
public class ClassificationResult<L> {
    /**
     * Constructor
     * @param scoredLabels List of scored labels
     */
    public ClassificationResult(List<ScoredLabel<L>> scoredLabels);

    /**
     * Get scored labels
     * @return List of scored labels sorted by score descending
     */
    public List<ScoredLabel<L>> scoredLabels();
}

/**
 * Represents a classification label with associated score
 */
public class ScoredLabel<L> {
    /**
     * Constructor
     * @param label The label
     * @param score The score (0.0 to 1.0)
     */
    public ScoredLabel(L label, double score);

    /**
     * Get the label
     * @return The label
     */
    public L label();

    /**
     * Get the score
     * @return The score (0.0 to 1.0)
     */
    public double score();
}

Thread Safety: EmbeddingModelTextClassifier is thread-safe if the EmbeddingModel is thread-safe. Classification operations can be performed concurrently.

Common Pitfalls:

Too few examples per label (< 3) leads to poor accuracy
Examples not representative of actual data
Imbalanced examples (1 example for labelA, 100 for labelB) biases results
Labels with very similar examples cause confusion
maxResults=1 forces single-label even when multi-label is appropriate
minScore too high may return empty results
meanToMaxScoreRatio too high filters out valid secondary labels

Edge Cases:

Empty input text returns empty results or low-confidence labels
No labels meet minScore threshold - returns empty list
All labels have same score - arbitrary ordering
Label with single example vs multiple examples
Very short text (1-2 words) may not classify well

Performance Notes:

Classification embeds input and all examples - O(n) embedding calls on first use
Consider caching example embeddings (computed once at construction)
Batch classification of multiple texts is more efficient
Number of examples affects classification time

Cost Considerations:

Each classification requires 1 embedding API call
More examples per label increases initial embedding cost
Consider caching classifier instances to avoid re-embedding examples

Testing Patterns:

// Test with known examples
Map<String, List<String>> examples = Map.of(
    "positive", List.of("great", "excellent", "wonderful"),
    "negative", List.of("bad", "terrible", "awful")
);

TextClassifier<String> classifier = new EmbeddingModelTextClassifier<>(
    embeddingModel,
    examples
);

ClassificationResult<String> result = classifier.classifyWithScores("amazing product");
assertEquals("positive", result.scoredLabels().get(0).label());
assertTrue(result.scoredLabels().get(0).score() > 0.7);

Related APIs: Text Classification, Embedding Models

Chains (Deprecated)

Legacy chain API for sequential processing. Deprecated in favor of AiServices.

/**
 * Functional interface representing a chain step
 * Deprecated in favor of AiServices
 */
@Deprecated
public interface Chain<Input, Output> {
    /**
     * Execute the chain step
     * @param input Input to process
     * @return Output from processing
     */
    Output execute(Input input);
}

/**
 * A chain for conversing with a ChatModel while maintaining memory
 * Deprecated in favor of AiServices
 */
@Deprecated
public class ConversationalChain implements Chain<String, String> {
    /**
     * Create builder
     * @return Builder instance
     */
    public static ConversationalChainBuilder builder();

    /**
     * Execute chain with user message
     * @param userMessage User message
     * @return Response from chat model
     */
    public String execute(String userMessage);
}

/**
 * A chain for conversing with a ChatModel based on retrieved information
 * Supports RAG with RetrievalAugmentor
 * Deprecated in favor of AiServices
 */
@Deprecated
public class ConversationalRetrievalChain implements Chain<String, String> {
    /**
     * Create builder
     * @return Builder instance
     */
    public static Builder builder();

    /**
     * Execute chain with query
     * @param query Query to process
     * @return Response from chat model
     */
    public String execute(String query);
}

Migration Guide:

Use AiServices instead of ConversationalChain
Use AiServices with contentRetriever instead of ConversationalRetrievalChain
AiServices provides better type safety, streaming support, and tool integration

Related APIs: Chains, AI Services

Service Provider Interfaces

SPI interfaces for customization and framework integration.

/**
 * SPI factory interface for creating AI service contexts
 */
public interface AiServiceContextFactory {
    // Factory methods for creating AI service contexts
}

/**
 * SPI factory interface for creating AI services
 */
public interface AiServicesFactory {
    // Factory methods for creating AI services
}

/**
 * SPI adapter interface for token streams
 */
public interface TokenStreamAdapter {
    // Adapter methods for token streams
}

/**
 * SPI factory interface for creating guardrail service builders
 */
public interface GuardrailServiceBuilderFactory {
    // Factory methods for guardrail service builders
}

/**
 * SPI factory interface for creating JSON codecs for in-memory embedding store
 */
public interface InMemoryEmbeddingStoreJsonCodecFactory {
    // Factory methods for JSON codecs
}

Thread Safety: SPI implementations must be thread-safe as they're typically singletons.

Common Pitfalls:

Missing META-INF/services configuration file
Wrong service interface name in configuration
SPI implementation not on classpath
Multiple SPI implementations causing conflicts

Related APIs: Service Provider Interfaces

Comparison Tables

Memory Strategy Selection

Scenario	Recommended Approach	Reason
Single user, short conversations	MessageWindowChatMemory(10-20)	Simple and fast
Multiple users, short conversations	ChatMemoryProvider + MessageWindowChatMemory	User isolation
Long conversations near token limit	TokenWindowChatMemory	Precise token control
Need persistence across restarts	ChatMemoryStore integration	Data durability
Extremely high concurrency	Distributed cache + ChatMemoryStore	Scalability

RAG Content Retriever Selection

Data Source	Retriever Type	When to Use
Document database	EmbeddingStoreContentRetriever	Semantic search on internal documents
Web content	WebSearchContentRetriever	Real-time information from web
SQL database	Custom ContentRetriever	Structured data queries
Multiple sources	QueryRouter + multiple retrievers	Hybrid search across systems
Graph database	Custom ContentRetriever	Relationship-based retrieval

Output Parser Selection

Return Type	Parser	Notes
Simple types (String, int, boolean)	Automatic primitive parsers	No configuration needed
Enum	EnumOutputParser	Case-sensitive by default
Single POJO	PojoOutputParser	Requires public fields or setters
List of POJOs	PojoListOutputParser	May fail on malformed JSON
Date/Time	Date/LocalDate/LocalTime parsers	Format awareness needed
Complex nested objects	PojoOutputParser with nested classes	Validate schema carefully

Common Workflows

Workflow 1: Building a Knowledge Base Chatbot

// Step 1: Load documents
List<Document> documents = FileSystemDocumentLoader
    .loadDocumentsRecursively(Paths.get("/path/to/docs"));

// Step 2: Split into chunks
DocumentSplitter splitter = DocumentSplitters.recursive(500, 50, tokenizer);
List<TextSegment> segments = new ArrayList<>();
for (Document doc : documents) {
    segments.addAll(splitter.split(doc));
}

// Step 3: Embed and store
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
for (int i = 0; i < segments.size(); i++) {
    Embedding embedding = embeddingModel.embed(segments.get(i).text()).content();
    store.add(embedding, segments.get(i));
}

// Step 4: Create RAG retriever
ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .build();

// Step 5: Build AI service
interface KnowledgeBot {
    String chat(@MemoryId String userId, String question);
}

KnowledgeBot bot = AiServices.builder(KnowledgeBot.class)
    .chatModel(chatModel)
    .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(retriever)
    .build();

// Step 6: Use the bot
String answer = bot.chat("user123", "How do I configure X?");

Workflow 2: Building an Agent with Tools

// Step 1: Define tools
class BusinessTools {
    @Tool("Get customer information by ID")
    String getCustomer(String customerId) {
        // Call customer service
        return "Customer: " + customerId;
    }

    @Tool("Get order status by order ID")
    String getOrderStatus(String orderId) {
        // Call order service
        return "Order status: Shipped";
    }

    @Tool("Create support ticket")
    String createTicket(String customerId, String issue) {
        // Create ticket in system
        return "Ticket created: #12345";
    }
}

// Step 2: Configure AI service with tools
interface SupportAgent {
    String chat(@MemoryId String sessionId, String message);
}

SupportAgent agent = AiServices.builder(SupportAgent.class)
    .chatModel(chatModel)
    .chatMemoryProvider(id -> MessageWindowChatMemory.withMaxMessages(20))
    .tools(new BusinessTools())
    .systemMessage("You are a customer support agent. Use the available tools to help customers.")
    .beforeToolExecution(ctx ->
        log.info("Executing tool: " + ctx.toolExecutionRequest().name()))
    .afterToolExecution(exec ->
        log.info("Tool result: " + exec.result()))
    .build();

// Step 3: Handle customer requests
String response = agent.chat("session456",
    "What's the status of my order #67890?");
// Agent will automatically call getOrderStatus tool

Workflow 3: Extracting Structured Data at Scale

// Step 1: Define data structure
record Product(
    String name,
    String category,
    double price,
    List<String> features
) {}

// Step 2: Create extractor
interface ProductExtractor {
    List<Product> extractProducts(String text);
}

ProductExtractor extractor = AiServices.create(ProductExtractor.class, chatModel);

// Step 3: Process documents in parallel
List<Document> documents = FileSystemDocumentLoader.loadDocuments(inputPath);

ExecutorService executor = Executors.newFixedThreadPool(4);
List<Future<List<Product>>> futures = documents.stream()
    .map(doc -> executor.submit(() -> extractor.extractProducts(doc.text())))
    .toList();

// Step 4: Collect results
List<Product> allProducts = new ArrayList<>();
for (Future<List<Product>> future : futures) {
    try {
        allProducts.addAll(future.get());
    } catch (Exception e) {
        log.error("Extraction failed", e);
    }
}

executor.shutdown();

// Step 5: Save results
saveToDatabase(allProducts);

Resource Management and Best Practices

Connection Pooling

// Reuse chat model instances - they typically maintain connection pools
ChatModel chatModel = OpenAiChatModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .modelName("gpt-4")
    .build();

// Don't create new instance per request
// BAD: new OpenAiChatModel(...) in hot path
// GOOD: Singleton or application-scoped instance

Memory Management

// For long-running applications, periodically clean up memory
ChatMemoryProvider provider = new ChatMemoryProvider() {
    private final Map<Object, ChatMemory> memories = new ConcurrentHashMap<>();
    private final ScheduledExecutorService cleaner =
        Executors.newSingleThreadScheduledExecutor();

    {
        // Clean up stale memories every hour
        cleaner.scheduleAtFixedRate(() -> {
            long cutoff = System.currentTimeMillis() - TimeUnit.HOURS.toMillis(1);
            memories.entrySet().removeIf(entry ->
                ((TimestampedMemory) entry.getValue()).lastAccessTime() < cutoff
            );
        }, 1, 1, TimeUnit.HOURS);
    }

    @Override
    public ChatMemory get(Object memoryId) {
        return memories.computeIfAbsent(memoryId,
            id -> new TimestampedMemory(MessageWindowChatMemory.withMaxMessages(20))
        );
    }
};

Error Recovery Patterns

// Retry with exponential backoff for transient failures
interface ResilientAssistant {
    @SystemMessage("You are a helpful assistant")
    String chat(String message);
}

ChatModel resilientModel = new ChatModel() {
    private final ChatModel delegate = actualChatModel;
    private final int maxRetries = 3;

    @Override
    public ChatResponse chat(ChatRequest request) {
        int attempt = 0;
        while (attempt < maxRetries) {
            try {
                return delegate.chat(request);
            } catch (Exception e) {
                attempt++;
                if (attempt >= maxRetries) throw e;

                long delay = (long) Math.pow(2, attempt) * 1000;
                try {
                    Thread.sleep(delay);
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException(ie);
                }
            }
        }
        throw new IllegalStateException("Should not reach here");
    }

    // Implement other methods...
};

Performance Optimization Checklist

Troubleshooting Guide

Issue: AI Service returns null

Possible causes:

No ChatModel or StreamingChatModel configured
Method return type not supported
LLM returned empty response

Solutions:

// Ensure model is configured
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel) // Must be set
    .build();

// Check return type is supported (String, POJOs, enums, primitives, etc.)

Issue: Tools not being called

Possible causes:

Tool description too vague
Tool parameters not documented
Model doesn't support tool calling
Tool name conflicts

Solutions:

// Make descriptions very specific
@Tool("Get the current weather forecast for a specific city. " +
      "Returns temperature, conditions, and humidity.")
String getWeather(
    @P("The city name, e.g., 'San Francisco' or 'New York'") String city
) {
    // ...
}

// Check model capabilities
if (chatModel.supportedCapabilities().contains(Capability.TOOLS)) {
    // Model supports tools
}

Issue: OutOfMemoryError with documents

Possible causes:

Loading too many documents at once
Document chunks too large
Embedding store too large for heap

Solutions:

// Process documents in batches
int batchSize = 100;
for (int i = 0; i < documents.size(); i += batchSize) {
    List<Document> batch = documents.subList(i,
        Math.min(i + batchSize, documents.size()));
    processDocuments(batch);
}

// Use persistent embedding store instead of in-memory
// Consider vector databases like Pinecone, Weaviate, etc.

Issue: RAG returns irrelevant results

Possible causes:

minScore too low
maxResults too high
Poor document chunking strategy
Embeddings not normalized

Solutions:

// Tune retrieval parameters
ContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(3) // Start small
    .minScore(0.75) // Higher threshold
    .build();

// Use better chunking strategy
DocumentSplitter splitter = DocumentSplitters.recursive(
    300, // Smaller chunks
    50,  // More overlap
    tokenizer
);

tessl/maven-dev-langchain4j--langchain4j

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

LangChain4j

Package Information

Core Imports

Basic Usage

Quick Start Guide for Common Tasks

Task 1: Simple Chatbot

Task 2: Multi-User Chatbot with Memory

Task 3: RAG-Enabled Assistant

Task 4: Tool-Using Agent

Task 5: Structured Output Extraction

Decision Trees

When to Use What Memory Type?

When to Use Streaming vs. Non-Streaming?

RAG Content Retriever Selection

Architecture

Capabilities

Core Models

Chat Messages

Requests and Responses

Prompts and Templates

RAG (Retrieval Augmented Generation)

Data Types

AI Services

Chat Memory

Document Processing

Output Parsing

Tools

Guardrails

Embedding Store

Text Classification

Chains (Deprecated)

Service Provider Interfaces

Comparison Tables

Memory Strategy Selection

RAG Content Retriever Selection

Output Parser Selection

Common Workflows

Workflow 1: Building a Knowledge Base Chatbot

Workflow 2: Building an Agent with Tools

Workflow 3: Extracting Structured Data at Scale

Resource Management and Best Practices

Connection Pooling

Memory Management

Error Recovery Patterns

Performance Optimization Checklist

Troubleshooting Guide

Issue: AI Service returns null

Issue: Tools not being called

Issue: OutOfMemoryError with documents

Issue: RAG returns irrelevant results

See Also

index.mddocs/