Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features
The langchain4j-ollama module provides a comprehensive integration with Ollama's local LLM API, organized around several key architectural patterns and components.
The module implements standard LangChain4j interfaces for different model types:
OllamaChatModel for synchronous conversational AIOllamaStreamingChatModel for real-time streaming responsesOllamaLanguageModel for stateless text completionOllamaStreamingLanguageModel for streaming text generationOllamaEmbeddingModel for vector embeddingsThis interface-based design allows seamless integration with the broader LangChain4j ecosystem while providing Ollama-specific capabilities.
Thread Safety: All implemented models are immutable and thread-safe after construction.
All model classes use the Builder pattern with fluent configuration:
OllamaChatModel model = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama2")
.temperature(0.7)
.build();Builders provide:
Thread Safety: Builders are not thread-safe; each thread must use its own builder instance.
Validation: Invalid configurations throw IllegalArgumentException or IllegalStateException at build time.
OllamaBaseChatModel serves as the abstract base class for both OllamaChatModel and OllamaStreamingChatModel, providing:
Inheritance Hierarchy:
abstract class OllamaBaseChatModel {
// Shared configuration and functionality
}
class OllamaChatModel extends OllamaBaseChatModel implements ChatModel {
// Synchronous implementation with retry logic
}
class OllamaStreamingChatModel extends OllamaBaseChatModel implements StreamingChatModel {
// Streaming implementation without retry
}Thread Safety: Abstract base class is stateless; subclasses are immutable and thread-safe.
The module uses LangChain4j's HttpClient interface for communication:
HttpClientBuilderHTTP Configuration:
builder()
.httpClientBuilder(customHttpClientBuilder) // Custom client
.timeout(Duration.ofMinutes(5)) // Request timeout
.customHeaders(headerSupplier) // Dynamic headers
.maxRetries(3) // Retry attempts (non-streaming)Thread Safety: HTTP client implementation is thread-safe with connection pooling.
Error Handling:
HttpTimeoutException - Timeout exceededIOException - Network connectivity issuesStreaming models use Server-Sent Events (SSE) for real-time token delivery:
Client → HTTP Request → Ollama API → SSE Stream → Parser → Handler CallbacksComponents:
This enables responsive UX for chat applications without blocking.
Threading: SSE parsing occurs on HTTP client thread; handler callbacks execute on same thread
Error Handling: Errors during streaming trigger StreamingResponseHandler.onError()
No Retry: Streaming operations do not retry on failure
Request parameters follow a layered architecture:
Parameters flow through: Model Defaults → Request Defaults → Per-Request Overrides
Example:
// 1. Model defaults
OllamaChatModel model = OllamaChatModel.builder()
.temperature(0.7) // Default for all requests
.numCtx(2048) // Default context
.build();
// 2. Per-request override
OllamaChatRequestParameters params = OllamaChatRequestParameters.builder()
.temperature(0.9) // Overrides model default
.build();
ChatRequest request = ChatRequest.builder()
.parameters(params) // Apply override
.build();Immutability: All parameter objects are immutable; overrides create new instances
Nullability: Null parameter values mean "use default from previous layer"
OllamaModels provides administrative operations:
OllamaModelCardThis enables dynamic model discovery and management without hardcoding model names.
API Operations:
OllamaModels ollamaModels = OllamaModels.builder().build();
// Available models
Response<List<OllamaModel>> models = ollamaModels.availableModels();
// Model details
Response<OllamaModelCard> card = ollamaModels.modelCard("llama2");
// Running models
Response<List<RunningOllamaModel>> running = ollamaModels.runningModels();
// Delete model
ollamaModels.deleteModel("old-model"); // No return value; throws on errorThread Safety: Immutable and thread-safe; safe for concurrent operations
Error Handling:
RuntimeException - Ollama server errors (model not found, etc.)IOException - Network issuesSupporting types provide rich model metadata:
Mutability: Type objects are mutable (have setters); use defensive copying if shared
Nullability: All fields can be null; check before accessing
Factory interfaces enable dependency injection and framework integration:
// Factory pattern
OllamaChatModelBuilderFactory → provides → OllamaChatModel.Builder
OllamaEmbeddingModelBuilderFactory → provides → OllamaEmbeddingModel.Builder
// ... (5 factory interfaces total)This follows the Java ServiceLoader pattern for extensibility.
Usage:
ServiceLoader<OllamaChatModelBuilderFactory> loader =
ServiceLoader.load(OllamaChatModelBuilderFactory.class);
OllamaChatModelBuilderFactory factory = loader.findFirst()
.orElseThrow(() -> new IllegalStateException("No factory found"));
OllamaChatModel model = factory.get()
.modelName("llama2")
.build();Thread Safety: Factory instances should be stateless and thread-safe
Fluent API for configuring models with sensible defaults and extensive customization options.
Benefits: Type safety, immutability, fluent configuration
Different sampling strategies (standard, mirostat) configured via parameters.
Implementation: Parameter objects define strategy; model executes
OllamaBaseChatModel defines common structure with doChat() specializations.
Benefits: Code reuse, consistent behavior, specialized implementations
ChatModelListener interfaces for request/response monitoring.
Benefits: Decoupled observability, extensible monitoring
SPI factory interfaces for custom model instantiation.
Benefits: Dependency injection, framework integration, testability
The module integrates with LangChain4j through:
ChatModel, LanguageModel, EmbeddingModel)ChatRequest, ChatResponse, Embedding, TextSegment)HttpClient, Response<T>, TokenUsage)Compatibility: Fully compatible with LangChain4j ecosystem; models are drop-in replacements
Communication with Ollama follows its REST API:
Endpoints:
POST /api/chat - Chat completions (streaming and non-streaming)POST /api/generate - Text generation (streaming and non-streaming)POST /api/embeddings - Vector embeddingsGET /api/tags - List modelsPOST /api/show - Model informationDELETE /api/delete - Remove modelsGET /api/ps - Running modelsProtocol: HTTP/1.1 with JSON request/response bodies and SSE for streaming
Error Responses: HTTP error codes mapped to exceptions
Support for reasoning models like DeepSeek R1:
AiMessage.thinking()onPartialThinking() callbackAPI:
OllamaChatModel model = OllamaChatModel.builder()
.think(true) // Enable thinking
.returnThinking(true) // Return thinking text
.build();
ChatResponse response = model.doChat(request);
String thinking = response.aiMessage().thinking(); // May be nullNullability: thinking() returns null if thinking not enabled or not available
Advanced perplexity control for consistent output quality:
Valid Values:
mirostat: 0 (disabled), 1 (Mirostat), 2 (Mirostat 2.0)mirostatEta: > 0.0 (typically 0.01-1.0)mirostatTau: > 0.0 (typically 1.0-10.0)Integration with LangChain4j's tool system:
Support: Requires Ollama model with tool support capability
Guarantees:
Exceptions at build():
IllegalArgumentException - Invalid parameter valuesIllegalStateException - Required parameters missing (e.g., modelName)Example:
try {
OllamaChatModel model = OllamaChatModel.builder()
.build(); // Missing modelName
} catch (IllegalStateException e) {
// Handle missing required parameter
}Exceptions during requests:
HttpTimeoutException - Request timeout exceededIOException - Network connectivity issuesRuntimeException - Ollama server errors (model not found, invalid request)Retry Logic:
maxRetries)onError() callbackExample:
import java.io.IOException;
import java.net.http.HttpTimeoutException;
try {
ChatResponse response = model.doChat(request);
} catch (HttpTimeoutException e) {
// Timeout - request took too long
} catch (IOException e) {
// Network error - server unreachable
} catch (RuntimeException e) {
// Server error - model not found, etc.
}keepAlive)embedAll() sends multiple texts in single request (more efficient than individual embed() calls)numCtx uses more memory and is slower; choose minimum required sizeOptimization Tips:
// Keep models loaded longer for frequent use
.keepAlive(600) // 10 minutes
// Batch embeddings
embedModel.embedAll(segments); // Single request
// Smaller context for faster inference
.numCtx(2048) // Instead of 4096 if sufficientHttpClientBuilder for special networking needs (proxies, authentication, etc.)Example:
// Custom header supplier for token refresh
Supplier<Map<String, String>> headers = () ->
Map.of("Authorization", "Bearer " + tokenManager.getToken());
OllamaChatModel model = OllamaChatModel.builder()
.customHeaders(headers)
.build();This architecture enables both simple usage for basic cases and extensive customization for advanced scenarios.
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-ollama@1.11.0