CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-core

Core classes and interfaces of LangChain4j providing foundational abstractions for LLM interaction, RAG, embeddings, agents, and observability

Overview
Eval results
Files

LangChain4j Core

Package: dev.langchain4j:langchain4j-core Version: 1.11.0 Language: Java 8+ Thread-Safety: Most types are immutable and thread-safe unless documented otherwise

LangChain4j Core provides the foundational abstractions and interfaces for building LLM-powered applications in Java. It contains essential components for chat models, embeddings, RAG (Retrieval Augmented Generation), tools and agents, memory management, guardrails, and observability. This library serves as the foundation for the broader LangChain4j ecosystem, enabling developers to build sophisticated AI applications with a unified API across different LLM providers and vector stores.

Package Information

  • Package Name: langchain4j-core
  • Package Type: Maven
  • Group ID: dev.langchain4j
  • Artifact ID: langchain4j-core
  • Language: Java
  • Minimum Java Version: 8
  • Installation:

Maven:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-core</artifactId>
    <version>1.11.0</version>
</dependency>

Gradle:

implementation 'dev.langchain4j:langchain4j-core:1.11.0'

Gradle (Kotlin DSL):

implementation("dev.langchain4j:langchain4j-core:1.11.0")

Core Imports

Essential imports for common use cases:

// ============================================================================
// CHAT MODELS - Conversational AI
// ============================================================================
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.request.ChatRequestParameters;
import dev.langchain4j.model.chat.request.DefaultChatRequestParameters;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.StreamingHandle;

// ============================================================================
// MESSAGES - Conversation structure
// ============================================================================
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.ChatMessageType;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.ToolExecutionResultMessage;

// ============================================================================
// CONTENT TYPES - Multimodal support
// ============================================================================
import dev.langchain4j.data.message.Content;
import dev.langchain4j.data.message.TextContent;
import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.AudioContent;
import dev.langchain4j.data.message.VideoContent;
import dev.langchain4j.data.message.PdfFileContent;

// ============================================================================
// EMBEDDINGS - Vector representations
// ============================================================================
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.DimensionAwareEmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.filter.Filter;

// ============================================================================
// DOCUMENTS & SEGMENTS - Text processing for RAG
// ============================================================================
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.document.DocumentSplitter;

// ============================================================================
// RAG - Retrieval Augmented Generation
// ============================================================================
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.query.Query;

// ============================================================================
// TOOLS - Function calling
// ============================================================================
import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.agent.tool.P;
import dev.langchain4j.agent.tool.ToolMemoryId;
import dev.langchain4j.agent.tool.ToolSpecification;
import dev.langchain4j.agent.tool.ToolExecutionRequest;
import dev.langchain4j.agent.tool.ReturnBehavior;

// ============================================================================
// RESPONSES - Output handling
// ============================================================================
import dev.langchain4j.model.output.Response;
import dev.langchain4j.model.output.TokenUsage;
import dev.langchain4j.model.output.FinishReason;

// ============================================================================
// EXCEPTIONS - Error handling
// ============================================================================
import dev.langchain4j.exception.LangChain4jException;
import dev.langchain4j.exception.RetriableException;
import dev.langchain4j.exception.NonRetriableException;
import dev.langchain4j.exception.TimeoutException;
import dev.langchain4j.exception.RateLimitException;
import dev.langchain4j.exception.AuthenticationException;
import dev.langchain4j.exception.InvalidRequestException;
import dev.langchain4j.exception.ContentFilteredException;

Quick Start Examples

1. Simple Chat Interaction

Thread-Safety: ChatModel implementations are typically thread-safe Error Handling: Catches common exceptions Performance: Single request, synchronous

import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.exception.LangChain4jException;

// Initialize chat model (from provider-specific module)
ChatModel chatModel = /* OpenAiChatModel, AnthropicChatModel, etc. */;

try {
    // Simple string-based chat (most convenient)
    String response = chatModel.chat("What is the capital of France?");
    System.out.println("Answer: " + response);

    // Message-based chat (more control)
    ChatRequest request = ChatRequest.builder()
        .messages(UserMessage.from("Explain quantum computing"))
        .build();

    ChatResponse chatResponse = chatModel.chat(request);
    String aiResponse = chatResponse.aiMessage().text();

    // Access metadata
    TokenUsage tokenUsage = chatResponse.tokenUsage();
    if (tokenUsage != null) {
        System.out.println("Input tokens: " + tokenUsage.inputTokenCount());
        System.out.println("Output tokens: " + tokenUsage.outputTokenCount());
    }

} catch (AuthenticationException e) {
    // Invalid API key - do not retry
    System.err.println("Authentication failed: " + e.getMessage());
} catch (RateLimitException e) {
    // Rate limit exceeded - retry with backoff
    System.err.println("Rate limit exceeded, retry after delay");
} catch (LangChain4jException e) {
    // Other errors
    System.err.println("Error: " + e.getMessage());
}

Common Pitfalls:

  • ❌ Not handling exceptions - Always catch at least LangChain4jException
  • ❌ Ignoring null TokenUsage - Some models don't provide token counts
  • ❌ Reusing same instance across threads without checking thread-safety

2. Working with Embeddings

Thread-Safety: EmbeddingModel implementations are typically thread-safe Performance: Batch operations are significantly more efficient Resource Management: No explicit cleanup needed

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.DimensionAwareEmbeddingModel;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import java.util.List;
import java.util.ArrayList;

// Initialize embedding model (from provider-specific module)
EmbeddingModel embeddingModel = /* OpenAiEmbeddingModel, etc. */;

// Check dimensions if needed
int dimensions = 0;
if (embeddingModel instanceof DimensionAwareEmbeddingModel) {
    dimensions = ((DimensionAwareEmbeddingModel) embeddingModel).dimension();
    System.out.println("Embedding dimensions: " + dimensions);
}

// Create embeddings (PREFER BATCH for multiple items - much more efficient)
List<TextSegment> segments = new ArrayList<>();
segments.add(TextSegment.from("First document"));
segments.add(TextSegment.from("Second document"));
segments.add(TextSegment.from("Third document"));

// Batch embedding (RECOMMENDED for multiple items)
Response<List<Embedding>> response = embeddingModel.embedAll(segments);
List<Embedding> embeddings = response.content();

System.out.println("Generated " + embeddings.size() + " embeddings");
for (int i = 0; i < embeddings.size(); i++) {
    Embedding emb = embeddings.get(i);
    float[] vector = emb.vector();
    System.out.println("Document " + i + ": " + vector.length + " dimensions");
}

// Single embedding (use only for one-off operations)
Response<Embedding> singleResponse = embeddingModel.embed("Single text");
Embedding singleEmbedding = singleResponse.content();

Performance Notes:

  • ✅ Always use embedAll() for multiple texts (10-100x faster than individual calls)
  • ✅ Batch sizes: Most models handle 100-1000 items efficiently
  • ⚠️ Large batches may hit rate limits - implement retry with exponential backoff

Common Pitfalls:

  • ❌ Using individual embed() calls in loop instead of embedAll()
  • ❌ Not normalizing vectors when computing cosine similarity
  • ❌ Mixing embeddings from different models (dimensions/semantic spaces differ)

3. Embedding Store and Semantic Search

Thread-Safety: Depends on implementation - check provider documentation Performance: Use filters to reduce search space Persistence: In-memory stores lose data on restart

import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.filter.Filter;
import dev.langchain4j.store.embedding.filter.comparison.IsEqualTo;
import dev.langchain4j.data.document.Metadata;

// Initialize embedding store (from provider-specific module)
// Examples: InMemoryEmbeddingStore, PineconeEmbeddingStore, ChromaEmbeddingStore
EmbeddingStore<TextSegment> embeddingStore = /* ... */;

// Add embeddings with metadata (for filtering)
List<String> ids = new ArrayList<>();
for (int i = 0; i < segments.size(); i++) {
    TextSegment segment = segments.get(i);
    Embedding embedding = embeddings.get(i);

    // Option 1: Store returns generated ID
    String id = embeddingStore.add(embedding, segment);
    ids.add(id);

    // Option 2: Provide your own ID
    // embeddingStore.add("custom-id-" + i, embedding);
}

// Perform semantic search
String query = "machine learning concepts";
Embedding queryEmbedding = embeddingModel.embed(query).content();

// Basic search
EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(5)                      // Top-k results
    .minScore(0.7)                      // Similarity threshold (0.0-1.0)
    .build();

EmbeddingSearchResult<TextSegment> searchResult = embeddingStore.search(searchRequest);

System.out.println("Found " + searchResult.matches().size() + " matches:");
for (EmbeddingMatch<TextSegment> match : searchResult.matches()) {
    System.out.println("Score: " + match.score());                    // Similarity score
    System.out.println("Text: " + match.embedded().text());           // Original text
    System.out.println("ID: " + match.embeddingId());                 // Document ID
    System.out.println("Metadata: " + match.embedded().metadata());   // Document metadata
    System.out.println("---");
}

// Search with metadata filtering (faster, more relevant)
Metadata docMetadata = Metadata.from(Map.of(
    "category", "technical",
    "language", "en"
));

Filter filter = Filter.metadataKey("category").isEqualTo("technical");

EmbeddingSearchRequest filteredRequest = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(5)
    .minScore(0.7)
    .filter(filter)                     // Apply metadata filter
    .build();

searchResult = embeddingStore.search(filteredRequest);

Performance Notes:

  • ✅ Use metadata filters to reduce search space
  • ✅ Adjust maxResults and minScore to balance precision/recall
  • ⚠️ Very low minScore (<0.5) may return irrelevant results
  • ⚠️ Large maxResults (>100) may impact performance

Common Pitfalls:

  • ❌ Not setting minScore - may return very dissimilar results
  • ❌ Storing embeddings without metadata - limits filtering capabilities
  • ❌ Using in-memory store for production (data loss on restart)
  • ❌ Not handling empty search results

4. Tool Definition for Function Calling

Thread-Safety: Tool instances should be stateless or thread-safe Performance: Tool execution is synchronous by default Error Handling: Throw ToolExecutionException for execution errors

import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.agent.tool.P;
import dev.langchain4j.agent.tool.ToolMemoryId;
import dev.langchain4j.agent.tool.ReturnBehavior;
import dev.langchain4j.exception.ToolExecutionException;

/**
 * Tool class must be instantiable and contain @Tool methods.
 * IMPORTANT: Keep tools stateless or ensure thread-safety.
 */
public class WeatherTools {

    /**
     * Get current weather for a location.
     *
     * @param city City name (required, non-empty)
     * @param country ISO 3166-1 alpha-2 country code (required, e.g., "US", "FR")
     * @return Weather description string
     * @throws ToolExecutionException if weather service fails
     */
    @Tool("Get current weather for a specific location")
    public String getCurrentWeather(
        @P("The city name, e.g., 'Paris', 'New York'") String city,
        @P("The ISO country code, e.g., 'FR', 'US'") String country
    ) {
        // Input validation (ALWAYS validate tool inputs)
        if (city == null || city.trim().isEmpty()) {
            throw new ToolExecutionException("City name cannot be empty");
        }
        if (country == null || !country.matches("[A-Z]{2}")) {
            throw new ToolExecutionException("Country must be 2-letter ISO code");
        }

        try {
            // Call external service
            WeatherData data = weatherService.getWeather(city, country);
            return String.format("Temperature: %d°C, Condition: %s",
                data.temperature(), data.condition());

        } catch (ServiceException e) {
            // Throw ToolExecutionException for LLM to handle
            throw new ToolExecutionException("Weather service unavailable: " + e.getMessage(), e);
        }
    }

    /**
     * Tool with custom name and immediate return behavior.
     * Result goes directly to user, not back to LLM.
     */
    @Tool(
        name = "emergency_shutdown",
        value = {"Immediately shut down the system"},
        returnBehavior = ReturnBehavior.IMMEDIATE
    )
    public String shutdownSystem(@ToolMemoryId String userId) {
        // IMMEDIATE behavior: result goes to user, not LLM
        logAction(userId, "emergency_shutdown");
        performShutdown();
        return "System is shutting down...";
    }

    /**
     * Tool with TO_LLM behavior (default).
     * Result goes back to LLM for processing.
     */
    @Tool("Calculate compound interest for an investment")
    public double calculateCompoundInterest(
        @P("Principal amount in dollars") double principal,
        @P("Annual interest rate as decimal (e.g., 0.05 for 5%)") double rate,
        @P("Number of years") int years
    ) {
        if (principal <= 0 || rate < 0 || years <= 0) {
            throw new ToolExecutionException("Invalid input parameters");
        }

        // TO_LLM behavior: LLM formats result for user
        return principal * Math.pow(1 + rate, years);
    }
}

Best Practices:

  • ✅ Always validate tool inputs (LLMs can provide invalid data)
  • ✅ Use descriptive @P annotations (helps LLM choose correct parameters)
  • ✅ Throw ToolExecutionException with clear messages (LLM can communicate to user)
  • ✅ Keep tool methods stateless (enables thread-safety)
  • ✅ Use @ToolMemoryId for user context in multi-user scenarios
  • ✅ Document parameter formats in @P descriptions

Common Pitfalls:

  • ❌ Not validating inputs - LLMs can hallucinate invalid values
  • ❌ Swallowing exceptions - always propagate as ToolExecutionException
  • ❌ Storing state in tool instances without synchronization
  • ❌ Using vague descriptions - be specific about parameter formats
  • ❌ Not handling null/empty strings

Architecture Overview

LangChain4j Core is built around several key architectural components:

Model Abstractions

  • Unified Interfaces: Common interfaces across providers (ChatModel, EmbeddingModel, etc.)
  • Synchronous & Streaming: Both blocking and non-blocking APIs
  • Provider Agnostic: Switch providers without code changes
  • Observable: Built-in listener support for monitoring

See: Chat Models | Embedding Models | Language Models

Data Structures

  • Message Types: System, User, AI, Tool results
  • Multimodal Content: Text, images, audio, video, PDFs
  • Documents: Rich metadata, segmentation
  • Embeddings: Vector representations with normalization
  • Immutability: Most data structures are immutable and thread-safe

See: Messages | Documents | Embeddings

RAG Framework

  • Modular Pipeline: Query routing, transformation, retrieval, aggregation, injection
  • Multiple Retrievers: Vector stores, web search, hybrid retrieval
  • Content Processing: Filtering, re-ranking, deduplication
  • Extensible: Custom components via interfaces

See: RAG System

Vector Store Interface

  • Generic Abstraction: Works with any vector database
  • Metadata Filtering: Reduce search space with filters
  • Similarity Search: Cosine similarity with configurable thresholds
  • Batch Operations: Efficient bulk operations

See: Embeddings and Vector Search

Tool System

  • Annotation-Based: Declarative tool definitions with @Tool
  • Automatic Schemas: JSON schemas generated from annotations
  • Parameter Descriptions: Help LLM understand tool usage
  • Flexible Returns: Results to LLM or directly to user
  • Error Handling: Structured exception propagation

See: Tools and Agents

Guardrails

  • Input Validation: Pre-process and validate user inputs
  • Output Filtering: Post-process LLM responses
  • Content Transformation: Modify messages in pipeline
  • Safety Checks: Content moderation, PII redaction
  • Composable: Chain multiple guardrails

See: Guardrails

Observability

  • Event-Based: Listener pattern for monitoring
  • Lifecycle Events: Request, response, error, completion
  • Tool Execution: Track tool calls and results
  • Guardrail Events: Monitor validation steps
  • Metrics & Logging: Integrate with monitoring systems

See: Observability

Memory Management

  • Conversation History: Persistent chat memory
  • Windowing Strategies: Message limits, token limits
  • Storage Backends: In-memory, Redis, PostgreSQL, etc.
  • Multi-User Support: Isolated memories per user/session

See: Chat Memory

Capabilities Matrix

Chat Models

Primary Interface: ChatModel (synchronous), StreamingChatModel (streaming)

interface ChatModel {
    ChatResponse chat(ChatRequest request);           // Full control
    ChatResponse chat(List<ChatMessage> messages);    // Message history
    String chat(String userMessage);                  // Simple text
}

interface StreamingChatModel {
    StreamingHandle chat(ChatRequest request, StreamingChatResponseHandler handler);
    StreamingHandle chat(List<ChatMessage> messages, StreamingChatResponseHandler handler);
}

Capabilities:

  • ✅ Multimodal inputs (text, images, audio, video, PDFs)
  • ✅ Tool calling / function calling
  • ✅ Structured outputs with JSON schemas
  • ✅ Streaming responses
  • ✅ Conversation history
  • ✅ Token usage tracking
  • ✅ Configurable parameters (temperature, top-p, etc.)
  • ✅ Observable with listeners

Thread-Safety: Implementation-dependent, usually thread-safe See: Chat Models

Embedding Models

Primary Interface: EmbeddingModel, DimensionAwareEmbeddingModel

interface EmbeddingModel {
    Response<Embedding> embed(String text);                         // Single text
    Response<Embedding> embed(TextSegment textSegment);            // With metadata
    Response<List<Embedding>> embedAll(List<TextSegment> textSegments);  // Batch
}

Capabilities:

  • ✅ Single and batch embedding
  • ✅ Dimension awareness
  • ✅ Token usage tracking
  • ✅ Observable with listeners

Performance: Always prefer embedAll() for multiple items Thread-Safety: Implementation-dependent, usually thread-safe See: Embedding Models

Language Models

Primary Interface: LanguageModel (text completion without chat structure)

interface LanguageModel {
    Response<String> generate(String prompt);
}

interface StreamingLanguageModel {
    StreamingHandle generate(String prompt, StreamingResponseHandler<String> handler);
}

Use Cases: Simple text completion, no conversation context needed See: Language Models

Other Model Types

interface ImageModel {
    Response<Image> generate(String prompt);
    Response<Image> edit(Image image, String prompt);
}

interface AudioTranscriptionModel {
    Response<String> transcribe(Audio audio);
}

interface ModerationModel {
    Response<Moderation> moderate(String text);
    Response<Moderation> moderate(List<ChatMessage> messages);
}

interface ScoringModel {
    Response<Double> score(String text, String query);
    Response<List<Double>> scoreAll(List<String> texts, String query);
}

See: Other Model Types

Exception Handling Guide

LangChain4j provides a comprehensive exception hierarchy for proper error handling:

Exception Classification

class LangChain4jException extends RuntimeException { }           // Base exception

// Retriable errors (transient failures - can retry)
class RetriableException extends LangChain4jException { }
    class TimeoutException extends RetriableException { }          // Retry with backoff
    class RateLimitException extends RetriableException { }        // Retry after delay
    class InternalServerException extends RetriableException { }   // Retry with backoff

// Non-retriable errors (permanent failures - do not retry)
class NonRetriableException extends LangChain4jException { }
    class AuthenticationException extends NonRetriableException { }  // Fix credentials
    class InvalidRequestException extends NonRetriableException { }  // Fix request
    class ContentFilteredException extends NonRetriableException { } // Change content
    class ModelNotFoundException extends NonRetriableException { }   // Use valid model

Recommended Error Handling Pattern

import dev.langchain4j.exception.*;

public class RobustChatClient {
    private final ChatModel chatModel;
    private final int maxRetries = 3;

    public String chatWithRetry(String message) {
        int attempt = 0;
        long backoff = 1000; // Start with 1 second

        while (attempt < maxRetries) {
            try {
                return chatModel.chat(message);

            } catch (TimeoutException | InternalServerException e) {
                // Retriable - retry with exponential backoff
                attempt++;
                if (attempt < maxRetries) {
                    sleep(backoff);
                    backoff *= 2;
                }

            } catch (RateLimitException e) {
                // Rate limit - retry after longer delay
                attempt++;
                if (attempt < maxRetries) {
                    sleep(60000); // Wait 1 minute
                }

            } catch (AuthenticationException e) {
                // Non-retriable - fail fast
                throw new RuntimeException("Invalid API credentials", e);

            } catch (InvalidRequestException e) {
                // Non-retriable - fix request
                throw new RuntimeException("Malformed request", e);

            } catch (ContentFilteredException e) {
                // Non-retriable - content policy violation
                return "I cannot generate that content due to safety policies.";

            } catch (ModelNotFoundException e) {
                // Non-retriable - configuration error
                throw new RuntimeException("Model not available", e);
            }
        }

        throw new RuntimeException("Max retries exceeded");
    }

    private void sleep(long ms) {
        try {
            Thread.sleep(ms);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new RuntimeException(e);
        }
    }
}

See: Exception Hierarchy

Thread-Safety & Concurrency

General Guidelines

ComponentThread-SafetyNotes
ChatModelUsually safeCheck provider docs
EmbeddingModelUsually safeCheck provider docs
EmbeddingStoreImplementation-specificInMemory uses ConcurrentHashMap
ChatMemoryNot safeSynchronize externally
Message TypesImmutableAlways thread-safe
Tool InstancesMake statelessOr synchronize access

Safe Concurrent Usage

// Models are typically thread-safe
ExecutorService executor = Executors.newFixedThreadPool(10);

for (String query : queries) {
    executor.submit(() -> {
        try {
            String response = chatModel.chat(query);  // Safe if model is thread-safe
            processResponse(response);
        } catch (Exception e) {
            handleError(e);
        }
    });
}

executor.shutdown();
executor.awaitTermination(1, TimeUnit.HOURS);

Unsafe Patterns to Avoid

// ❌ BAD: Sharing mutable state in tools without synchronization
public class StatefulTool {
    private int callCount = 0;  // NOT thread-safe

    @Tool("Count calls")
    public int countCalls() {
        return callCount++;  // Race condition!
    }
}

// ✅ GOOD: Use AtomicInteger or synchronization
public class ThreadSafeTool {
    private final AtomicInteger callCount = new AtomicInteger(0);

    @Tool("Count calls")
    public int countCalls() {
        return callCount.incrementAndGet();  // Thread-safe
    }
}

Performance Best Practices

1. Batch Operations

// ❌ BAD: Individual calls in loop (very slow)
for (String text : texts) {
    Embedding emb = embeddingModel.embed(text).content();
    // Process embedding
}

// ✅ GOOD: Batch operation (10-100x faster)
List<TextSegment> segments = texts.stream()
    .map(TextSegment::from)
    .collect(Collectors.toList());
Response<List<Embedding>> response = embeddingModel.embedAll(segments);
List<Embedding> embeddings = response.content();

2. Connection Pooling

Most provider implementations use connection pooling internally. For custom implementations:

// Configure HTTP client with connection pooling
OkHttpClient httpClient = new OkHttpClient.Builder()
    .connectionPool(new ConnectionPool(20, 5, TimeUnit.MINUTES))
    .build();

3. Async/Streaming for Large Responses

// For long responses, use streaming to start processing sooner
StreamingChatModel streamingModel = /* ... */;

streamingModel.chat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(PartialResponse response) {
        // Process tokens as they arrive (lower latency)
        processToken(response.partialText());
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        // Finalize processing
    }
});

4. Metadata Filtering

// ✅ GOOD: Use filters to reduce search space
Filter filter = Filter.metadataKey("category").isEqualTo("technical");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)
    .filter(filter)  // Much faster than scanning all vectors
    .build();

Common Pitfalls & Solutions

PitfallSolution
Not handling null TokenUsageAlways check if (tokenUsage != null)
Using individual embed() in loopUse embedAll() for batch operations
Not validating tool inputsLLMs can hallucinate - always validate
Ignoring exception hierarchyUse retriable vs non-retriable classification
Mixing embeddings from different modelsEmbeddings are model-specific
Not normalizing embeddingsNormalize for cosine similarity
Using in-memory stores in productionUse persistent stores (Redis, Postgres, etc.)
Not setting minScore in searchMay return irrelevant results
Storing state in tool instancesKeep tools stateless
Not handling empty search resultsAlways check matches().isEmpty()

Next Steps

Core Concepts

Data Structures

Advanced Features

Type Reference

Additional Resources

  • GitHub: langchain4j/langchain4j
  • Documentation: docs.langchain4j.dev
  • Examples: langchain4j-examples
  • Community: Discord

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-core
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/dev.langchain4j/langchain4j-core@1.11.x