Core classes and interfaces of LangChain4j providing foundational abstractions for LLM interaction, RAG, embeddings, agents, and observability
Package: dev.langchain4j:langchain4j-core
Version: 1.11.0
Language: Java 8+
Thread-Safety: Most types are immutable and thread-safe unless documented otherwise
LangChain4j Core provides the foundational abstractions and interfaces for building LLM-powered applications in Java. It contains essential components for chat models, embeddings, RAG (Retrieval Augmented Generation), tools and agents, memory management, guardrails, and observability. This library serves as the foundation for the broader LangChain4j ecosystem, enabling developers to build sophisticated AI applications with a unified API across different LLM providers and vector stores.
Maven:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-core</artifactId>
<version>1.11.0</version>
</dependency>Gradle:
implementation 'dev.langchain4j:langchain4j-core:1.11.0'Gradle (Kotlin DSL):
implementation("dev.langchain4j:langchain4j-core:1.11.0")Essential imports for common use cases:
// ============================================================================
// CHAT MODELS - Conversational AI
// ============================================================================
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.request.ChatRequestParameters;
import dev.langchain4j.model.chat.request.DefaultChatRequestParameters;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.StreamingHandle;
// ============================================================================
// MESSAGES - Conversation structure
// ============================================================================
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.ChatMessageType;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.ToolExecutionResultMessage;
// ============================================================================
// CONTENT TYPES - Multimodal support
// ============================================================================
import dev.langchain4j.data.message.Content;
import dev.langchain4j.data.message.TextContent;
import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.AudioContent;
import dev.langchain4j.data.message.VideoContent;
import dev.langchain4j.data.message.PdfFileContent;
// ============================================================================
// EMBEDDINGS - Vector representations
// ============================================================================
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.DimensionAwareEmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.filter.Filter;
// ============================================================================
// DOCUMENTS & SEGMENTS - Text processing for RAG
// ============================================================================
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.document.DocumentSplitter;
// ============================================================================
// RAG - Retrieval Augmented Generation
// ============================================================================
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.query.Query;
// ============================================================================
// TOOLS - Function calling
// ============================================================================
import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.agent.tool.P;
import dev.langchain4j.agent.tool.ToolMemoryId;
import dev.langchain4j.agent.tool.ToolSpecification;
import dev.langchain4j.agent.tool.ToolExecutionRequest;
import dev.langchain4j.agent.tool.ReturnBehavior;
// ============================================================================
// RESPONSES - Output handling
// ============================================================================
import dev.langchain4j.model.output.Response;
import dev.langchain4j.model.output.TokenUsage;
import dev.langchain4j.model.output.FinishReason;
// ============================================================================
// EXCEPTIONS - Error handling
// ============================================================================
import dev.langchain4j.exception.LangChain4jException;
import dev.langchain4j.exception.RetriableException;
import dev.langchain4j.exception.NonRetriableException;
import dev.langchain4j.exception.TimeoutException;
import dev.langchain4j.exception.RateLimitException;
import dev.langchain4j.exception.AuthenticationException;
import dev.langchain4j.exception.InvalidRequestException;
import dev.langchain4j.exception.ContentFilteredException;Thread-Safety: ChatModel implementations are typically thread-safe Error Handling: Catches common exceptions Performance: Single request, synchronous
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.exception.LangChain4jException;
// Initialize chat model (from provider-specific module)
ChatModel chatModel = /* OpenAiChatModel, AnthropicChatModel, etc. */;
try {
// Simple string-based chat (most convenient)
String response = chatModel.chat("What is the capital of France?");
System.out.println("Answer: " + response);
// Message-based chat (more control)
ChatRequest request = ChatRequest.builder()
.messages(UserMessage.from("Explain quantum computing"))
.build();
ChatResponse chatResponse = chatModel.chat(request);
String aiResponse = chatResponse.aiMessage().text();
// Access metadata
TokenUsage tokenUsage = chatResponse.tokenUsage();
if (tokenUsage != null) {
System.out.println("Input tokens: " + tokenUsage.inputTokenCount());
System.out.println("Output tokens: " + tokenUsage.outputTokenCount());
}
} catch (AuthenticationException e) {
// Invalid API key - do not retry
System.err.println("Authentication failed: " + e.getMessage());
} catch (RateLimitException e) {
// Rate limit exceeded - retry with backoff
System.err.println("Rate limit exceeded, retry after delay");
} catch (LangChain4jException e) {
// Other errors
System.err.println("Error: " + e.getMessage());
}Common Pitfalls:
LangChain4jExceptionTokenUsage - Some models don't provide token countsThread-Safety: EmbeddingModel implementations are typically thread-safe Performance: Batch operations are significantly more efficient Resource Management: No explicit cleanup needed
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.DimensionAwareEmbeddingModel;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import java.util.List;
import java.util.ArrayList;
// Initialize embedding model (from provider-specific module)
EmbeddingModel embeddingModel = /* OpenAiEmbeddingModel, etc. */;
// Check dimensions if needed
int dimensions = 0;
if (embeddingModel instanceof DimensionAwareEmbeddingModel) {
dimensions = ((DimensionAwareEmbeddingModel) embeddingModel).dimension();
System.out.println("Embedding dimensions: " + dimensions);
}
// Create embeddings (PREFER BATCH for multiple items - much more efficient)
List<TextSegment> segments = new ArrayList<>();
segments.add(TextSegment.from("First document"));
segments.add(TextSegment.from("Second document"));
segments.add(TextSegment.from("Third document"));
// Batch embedding (RECOMMENDED for multiple items)
Response<List<Embedding>> response = embeddingModel.embedAll(segments);
List<Embedding> embeddings = response.content();
System.out.println("Generated " + embeddings.size() + " embeddings");
for (int i = 0; i < embeddings.size(); i++) {
Embedding emb = embeddings.get(i);
float[] vector = emb.vector();
System.out.println("Document " + i + ": " + vector.length + " dimensions");
}
// Single embedding (use only for one-off operations)
Response<Embedding> singleResponse = embeddingModel.embed("Single text");
Embedding singleEmbedding = singleResponse.content();Performance Notes:
embedAll() for multiple texts (10-100x faster than individual calls)Common Pitfalls:
embed() calls in loop instead of embedAll()Thread-Safety: Depends on implementation - check provider documentation Performance: Use filters to reduce search space Persistence: In-memory stores lose data on restart
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.filter.Filter;
import dev.langchain4j.store.embedding.filter.comparison.IsEqualTo;
import dev.langchain4j.data.document.Metadata;
// Initialize embedding store (from provider-specific module)
// Examples: InMemoryEmbeddingStore, PineconeEmbeddingStore, ChromaEmbeddingStore
EmbeddingStore<TextSegment> embeddingStore = /* ... */;
// Add embeddings with metadata (for filtering)
List<String> ids = new ArrayList<>();
for (int i = 0; i < segments.size(); i++) {
TextSegment segment = segments.get(i);
Embedding embedding = embeddings.get(i);
// Option 1: Store returns generated ID
String id = embeddingStore.add(embedding, segment);
ids.add(id);
// Option 2: Provide your own ID
// embeddingStore.add("custom-id-" + i, embedding);
}
// Perform semantic search
String query = "machine learning concepts";
Embedding queryEmbedding = embeddingModel.embed(query).content();
// Basic search
EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5) // Top-k results
.minScore(0.7) // Similarity threshold (0.0-1.0)
.build();
EmbeddingSearchResult<TextSegment> searchResult = embeddingStore.search(searchRequest);
System.out.println("Found " + searchResult.matches().size() + " matches:");
for (EmbeddingMatch<TextSegment> match : searchResult.matches()) {
System.out.println("Score: " + match.score()); // Similarity score
System.out.println("Text: " + match.embedded().text()); // Original text
System.out.println("ID: " + match.embeddingId()); // Document ID
System.out.println("Metadata: " + match.embedded().metadata()); // Document metadata
System.out.println("---");
}
// Search with metadata filtering (faster, more relevant)
Metadata docMetadata = Metadata.from(Map.of(
"category", "technical",
"language", "en"
));
Filter filter = Filter.metadataKey("category").isEqualTo("technical");
EmbeddingSearchRequest filteredRequest = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(5)
.minScore(0.7)
.filter(filter) // Apply metadata filter
.build();
searchResult = embeddingStore.search(filteredRequest);Performance Notes:
maxResults and minScore to balance precision/recallminScore (<0.5) may return irrelevant resultsmaxResults (>100) may impact performanceCommon Pitfalls:
minScore - may return very dissimilar resultsThread-Safety: Tool instances should be stateless or thread-safe
Performance: Tool execution is synchronous by default
Error Handling: Throw ToolExecutionException for execution errors
import dev.langchain4j.agent.tool.Tool;
import dev.langchain4j.agent.tool.P;
import dev.langchain4j.agent.tool.ToolMemoryId;
import dev.langchain4j.agent.tool.ReturnBehavior;
import dev.langchain4j.exception.ToolExecutionException;
/**
* Tool class must be instantiable and contain @Tool methods.
* IMPORTANT: Keep tools stateless or ensure thread-safety.
*/
public class WeatherTools {
/**
* Get current weather for a location.
*
* @param city City name (required, non-empty)
* @param country ISO 3166-1 alpha-2 country code (required, e.g., "US", "FR")
* @return Weather description string
* @throws ToolExecutionException if weather service fails
*/
@Tool("Get current weather for a specific location")
public String getCurrentWeather(
@P("The city name, e.g., 'Paris', 'New York'") String city,
@P("The ISO country code, e.g., 'FR', 'US'") String country
) {
// Input validation (ALWAYS validate tool inputs)
if (city == null || city.trim().isEmpty()) {
throw new ToolExecutionException("City name cannot be empty");
}
if (country == null || !country.matches("[A-Z]{2}")) {
throw new ToolExecutionException("Country must be 2-letter ISO code");
}
try {
// Call external service
WeatherData data = weatherService.getWeather(city, country);
return String.format("Temperature: %d°C, Condition: %s",
data.temperature(), data.condition());
} catch (ServiceException e) {
// Throw ToolExecutionException for LLM to handle
throw new ToolExecutionException("Weather service unavailable: " + e.getMessage(), e);
}
}
/**
* Tool with custom name and immediate return behavior.
* Result goes directly to user, not back to LLM.
*/
@Tool(
name = "emergency_shutdown",
value = {"Immediately shut down the system"},
returnBehavior = ReturnBehavior.IMMEDIATE
)
public String shutdownSystem(@ToolMemoryId String userId) {
// IMMEDIATE behavior: result goes to user, not LLM
logAction(userId, "emergency_shutdown");
performShutdown();
return "System is shutting down...";
}
/**
* Tool with TO_LLM behavior (default).
* Result goes back to LLM for processing.
*/
@Tool("Calculate compound interest for an investment")
public double calculateCompoundInterest(
@P("Principal amount in dollars") double principal,
@P("Annual interest rate as decimal (e.g., 0.05 for 5%)") double rate,
@P("Number of years") int years
) {
if (principal <= 0 || rate < 0 || years <= 0) {
throw new ToolExecutionException("Invalid input parameters");
}
// TO_LLM behavior: LLM formats result for user
return principal * Math.pow(1 + rate, years);
}
}Best Practices:
@P annotations (helps LLM choose correct parameters)ToolExecutionException with clear messages (LLM can communicate to user)@ToolMemoryId for user context in multi-user scenarios@P descriptionsCommon Pitfalls:
ToolExecutionExceptionLangChain4j Core is built around several key architectural components:
ChatModel, EmbeddingModel, etc.)See: Chat Models | Embedding Models | Language Models
See: Messages | Documents | Embeddings
See: RAG System
See: Embeddings and Vector Search
@ToolSee: Tools and Agents
See: Guardrails
See: Observability
See: Chat Memory
Primary Interface: ChatModel (synchronous), StreamingChatModel (streaming)
interface ChatModel {
ChatResponse chat(ChatRequest request); // Full control
ChatResponse chat(List<ChatMessage> messages); // Message history
String chat(String userMessage); // Simple text
}
interface StreamingChatModel {
StreamingHandle chat(ChatRequest request, StreamingChatResponseHandler handler);
StreamingHandle chat(List<ChatMessage> messages, StreamingChatResponseHandler handler);
}Capabilities:
Thread-Safety: Implementation-dependent, usually thread-safe See: Chat Models
Primary Interface: EmbeddingModel, DimensionAwareEmbeddingModel
interface EmbeddingModel {
Response<Embedding> embed(String text); // Single text
Response<Embedding> embed(TextSegment textSegment); // With metadata
Response<List<Embedding>> embedAll(List<TextSegment> textSegments); // Batch
}Capabilities:
Performance: Always prefer embedAll() for multiple items
Thread-Safety: Implementation-dependent, usually thread-safe
See: Embedding Models
Primary Interface: LanguageModel (text completion without chat structure)
interface LanguageModel {
Response<String> generate(String prompt);
}
interface StreamingLanguageModel {
StreamingHandle generate(String prompt, StreamingResponseHandler<String> handler);
}Use Cases: Simple text completion, no conversation context needed See: Language Models
interface ImageModel {
Response<Image> generate(String prompt);
Response<Image> edit(Image image, String prompt);
}
interface AudioTranscriptionModel {
Response<String> transcribe(Audio audio);
}
interface ModerationModel {
Response<Moderation> moderate(String text);
Response<Moderation> moderate(List<ChatMessage> messages);
}
interface ScoringModel {
Response<Double> score(String text, String query);
Response<List<Double>> scoreAll(List<String> texts, String query);
}See: Other Model Types
LangChain4j provides a comprehensive exception hierarchy for proper error handling:
class LangChain4jException extends RuntimeException { } // Base exception
// Retriable errors (transient failures - can retry)
class RetriableException extends LangChain4jException { }
class TimeoutException extends RetriableException { } // Retry with backoff
class RateLimitException extends RetriableException { } // Retry after delay
class InternalServerException extends RetriableException { } // Retry with backoff
// Non-retriable errors (permanent failures - do not retry)
class NonRetriableException extends LangChain4jException { }
class AuthenticationException extends NonRetriableException { } // Fix credentials
class InvalidRequestException extends NonRetriableException { } // Fix request
class ContentFilteredException extends NonRetriableException { } // Change content
class ModelNotFoundException extends NonRetriableException { } // Use valid modelimport dev.langchain4j.exception.*;
public class RobustChatClient {
private final ChatModel chatModel;
private final int maxRetries = 3;
public String chatWithRetry(String message) {
int attempt = 0;
long backoff = 1000; // Start with 1 second
while (attempt < maxRetries) {
try {
return chatModel.chat(message);
} catch (TimeoutException | InternalServerException e) {
// Retriable - retry with exponential backoff
attempt++;
if (attempt < maxRetries) {
sleep(backoff);
backoff *= 2;
}
} catch (RateLimitException e) {
// Rate limit - retry after longer delay
attempt++;
if (attempt < maxRetries) {
sleep(60000); // Wait 1 minute
}
} catch (AuthenticationException e) {
// Non-retriable - fail fast
throw new RuntimeException("Invalid API credentials", e);
} catch (InvalidRequestException e) {
// Non-retriable - fix request
throw new RuntimeException("Malformed request", e);
} catch (ContentFilteredException e) {
// Non-retriable - content policy violation
return "I cannot generate that content due to safety policies.";
} catch (ModelNotFoundException e) {
// Non-retriable - configuration error
throw new RuntimeException("Model not available", e);
}
}
throw new RuntimeException("Max retries exceeded");
}
private void sleep(long ms) {
try {
Thread.sleep(ms);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException(e);
}
}
}See: Exception Hierarchy
| Component | Thread-Safety | Notes |
|---|---|---|
| ChatModel | Usually safe | Check provider docs |
| EmbeddingModel | Usually safe | Check provider docs |
| EmbeddingStore | Implementation-specific | InMemory uses ConcurrentHashMap |
| ChatMemory | Not safe | Synchronize externally |
| Message Types | Immutable | Always thread-safe |
| Tool Instances | Make stateless | Or synchronize access |
// Models are typically thread-safe
ExecutorService executor = Executors.newFixedThreadPool(10);
for (String query : queries) {
executor.submit(() -> {
try {
String response = chatModel.chat(query); // Safe if model is thread-safe
processResponse(response);
} catch (Exception e) {
handleError(e);
}
});
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.HOURS);// ❌ BAD: Sharing mutable state in tools without synchronization
public class StatefulTool {
private int callCount = 0; // NOT thread-safe
@Tool("Count calls")
public int countCalls() {
return callCount++; // Race condition!
}
}
// ✅ GOOD: Use AtomicInteger or synchronization
public class ThreadSafeTool {
private final AtomicInteger callCount = new AtomicInteger(0);
@Tool("Count calls")
public int countCalls() {
return callCount.incrementAndGet(); // Thread-safe
}
}// ❌ BAD: Individual calls in loop (very slow)
for (String text : texts) {
Embedding emb = embeddingModel.embed(text).content();
// Process embedding
}
// ✅ GOOD: Batch operation (10-100x faster)
List<TextSegment> segments = texts.stream()
.map(TextSegment::from)
.collect(Collectors.toList());
Response<List<Embedding>> response = embeddingModel.embedAll(segments);
List<Embedding> embeddings = response.content();Most provider implementations use connection pooling internally. For custom implementations:
// Configure HTTP client with connection pooling
OkHttpClient httpClient = new OkHttpClient.Builder()
.connectionPool(new ConnectionPool(20, 5, TimeUnit.MINUTES))
.build();// For long responses, use streaming to start processing sooner
StreamingChatModel streamingModel = /* ... */;
streamingModel.chat(request, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(PartialResponse response) {
// Process tokens as they arrive (lower latency)
processToken(response.partialText());
}
@Override
public void onCompleteResponse(ChatResponse response) {
// Finalize processing
}
});// ✅ GOOD: Use filters to reduce search space
Filter filter = Filter.metadataKey("category").isEqualTo("technical");
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(10)
.filter(filter) // Much faster than scanning all vectors
.build();| Pitfall | Solution |
|---|---|
Not handling null TokenUsage | Always check if (tokenUsage != null) |
| Using individual embed() in loop | Use embedAll() for batch operations |
| Not validating tool inputs | LLMs can hallucinate - always validate |
| Ignoring exception hierarchy | Use retriable vs non-retriable classification |
| Mixing embeddings from different models | Embeddings are model-specific |
| Not normalizing embeddings | Normalize for cosine similarity |
| Using in-memory stores in production | Use persistent stores (Redis, Postgres, etc.) |
| Not setting minScore in search | May return irrelevant results |
| Storing state in tool instances | Keep tools stateless |
| Not handling empty search results | Always check matches().isEmpty() |
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-core