Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features
Langchain4j-Ollama provides Java integration for Ollama, enabling local LLM interactions through a complete set of model interfaces including chat, language, streaming, and embedding models.
Maven Coordinates:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-ollama</artifactId>
<version>${langchain4j.version}</version>
</dependency>Package: dev.langchain4j.model.ollama
Java Version: Java 8+ (source and target compatibility level 8)
Base URL: http://localhost:11434 (default Ollama server endpoint)
// Main model classes
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.ollama.OllamaLanguageModel;
import dev.langchain4j.model.ollama.OllamaStreamingLanguageModel;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
// Model management
import dev.langchain4j.model.ollama.OllamaModels;
// Request parameters
import dev.langchain4j.model.ollama.OllamaChatRequestParameters;
// Supporting types
import dev.langchain4j.model.ollama.OllamaModel;
import dev.langchain4j.model.ollama.OllamaModelCard;
import dev.langchain4j.model.ollama.OllamaModelDetails;
import dev.langchain4j.model.ollama.RunningOllamaModel;
// Langchain4j core types
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
// Create chat model
OllamaChatModel model = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama2")
.temperature(0.7)
.build();
// Send message
ChatRequest request = ChatRequest.builder()
.messages(UserMessage.from("What is the capital of France?"))
.build();
ChatResponse response = model.doChat(request);
System.out.println(response.aiMessage().text());import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
OllamaStreamingChatModel streamingModel = OllamaStreamingChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama2")
.build();
streamingModel.doChat(request, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
System.out.print(partialResponse);
}
@Override
public void onCompleteResponse(ChatResponse response) {
System.out.println("\nDone!");
}
@Override
public void onError(Throwable error) {
error.printStackTrace();
}
});import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import java.util.List;
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl("http://localhost:11434")
.modelName("nomic-embed-text")
.build();
List<TextSegment> segments = List.of(
TextSegment.from("Hello world"),
TextSegment.from("Goodbye world")
);
Response<List<Embedding>> embeddings = embeddingModel.embedAll(segments);The langchain4j-ollama module is built around several key architectural patterns that enable flexible integration with Ollama's local LLM capabilities.
Key architectural components:
ChatModel, LanguageModel, EmbeddingModel) for ecosystem compatibilityOllamaBaseChatModel base class provides common functionality for chat modelsLearn more: Architecture Documentation
Synchronous and streaming chat interactions with full conversation context support.
// Synchronous chat
OllamaChatModel chatModel = OllamaChatModel.builder()
.modelName("llama2")
.temperature(0.8)
.maxRetries(3)
.build();
ChatResponse response = chatModel.doChat(request);Thread Safety: Immutable after build(); safe for concurrent requests
Learn more: Chat Models Documentation
Simple text completion for prompts without conversation context.
// Synchronous completion
OllamaLanguageModel languageModel = OllamaLanguageModel.builder()
.modelName("llama2")
.numPredict(100)
.build();
Response<String> completion = languageModel.generate("Once upon a time");Thread Safety: Immutable after build(); safe for concurrent requests
Learn more: Language Models Documentation
Generate vector embeddings for text segments.
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.modelName("nomic-embed-text")
.build();
Response<List<Embedding>> embeddings = embeddingModel.embedAll(textSegments);
String modelName = embeddingModel.modelName();Returns: Deterministic embeddings (same input → same output for same model)
Thread Safety: Immutable after build(); safe for concurrent requests
Learn more: Embedding Model Documentation
List, inspect, and manage Ollama models.
OllamaModels ollamaModels = OllamaModels.builder()
.baseUrl("http://localhost:11434")
.build();
// List available models
Response<List<OllamaModel>> models = ollamaModels.availableModels();
// Get model details
Response<OllamaModelCard> card = ollamaModels.modelCard("llama2");
// List running models
Response<List<RunningOllamaModel>> running = ollamaModels.runningModels();
// Delete a model
ollamaModels.deleteModel("old-model");Thread Safety: Immutable after build(); safe for concurrent operations
Learn more: Model Management Documentation
Ollama-specific parameters for fine-tuned control.
// Mirostat sampling
OllamaChatModel model = OllamaChatModel.builder()
.modelName("llama2")
.mirostat(2) // Mirostat 2.0
.mirostatEta(0.1) // Learning rate
.mirostatTau(5.0) // Tau parameter
.build();
// Reasoning/thinking mode
OllamaChatModel reasoningModel = OllamaChatModel.builder()
.modelName("deepseek-r1")
.think(true) // Enable thinking
.returnThinking(true) // Return thinking text
.build();
// Context window and repetition control
OllamaChatModel configuredModel = OllamaChatModel.builder()
.numCtx(4096) // Context window size
.repeatPenalty(1.1) // Repetition penalty
.repeatLastN(64) // Check last N tokens
.minP(0.05) // Minimum probability
.seed(42) // Reproducibility
.build();Parameter Validation: Invalid values throw IllegalArgumentException at build time
Learn more: Request Parameters Documentation
Complete type definitions for Ollama model metadata.
// Model information
OllamaModel model = OllamaModel.builder()
.name("llama2")
.size(3826793677L)
.digest("sha256:...")
.build();
// Model details
OllamaModelDetails details = OllamaModelDetails.builder()
.format("gguf")
.family("llama")
.parameterSize("7B")
.quantizationLevel("Q4_0")
.build();
// Model card with full metadata
OllamaModelCard card = OllamaModelCard.builder()
.license("Apache 2.0")
.template("{{ .Prompt }}")
.details(details)
.build();Nullability: All fields can be null except where noted in type documentation
Learn more: Types Documentation
Support for reasoning models like DeepSeek R1:
OllamaChatModel model = OllamaChatModel.builder()
.modelName("deepseek-r1")
.think(true) // Enable structured thinking
.returnThinking(true) // Return thinking in AiMessage
.build();Thinking modes:
think(true): LLM thinks and returns thoughts in separate fieldthink(false): LLM does not thinkthink(null) (default): Reasoning LLMs prepend thoughts with <think> tagsAdvanced perplexity control:
OllamaChatModel model = OllamaChatModel.builder()
.mirostat(2) // 0=disabled, 1=Mirostat, 2=Mirostat 2.0
.mirostatEta(0.1) // Learning rate (default: 0.1)
.mirostatTau(5.0) // Coherence/diversity balance (default: 5.0)
.build();Valid Values:
mirostat: 0, 1, or 2 onlymirostatEta: > 0.0 (typically 0.01 to 1.0)mirostatTau: > 0.0 (typically 1.0 to 10.0)Automatic retry with configurable attempts:
OllamaChatModel model = OllamaChatModel.builder()
.maxRetries(3) // Default: 2
.build();Note: Retry only applies to non-streaming models; streaming models do not retry
Static or dynamic HTTP headers:
// Static headers
Map<String, String> headers = Map.of("Authorization", "Bearer token");
OllamaChatModel model = OllamaChatModel.builder()
.customHeaders(headers)
.build();
// Dynamic headers (e.g., for token refresh)
Supplier<Map<String, String>> headerSupplier = () ->
Map.of("Authorization", "Bearer " + getToken());
OllamaChatModel model = OllamaChatModel.builder()
.customHeaders(headerSupplier)
.build();Nullability: Both customHeaders methods accept null (means no custom headers)
Request/response logging and model listeners:
OllamaChatModel model = OllamaChatModel.builder()
.logRequests(true)
.logResponses(true)
.logger(customLogger)
.listeners(List.of(chatModelListener))
.build();Nullability: logger defaults to SLF4J logger for the class; listeners defaults to empty list
builder()
.baseUrl("http://localhost:11434") // Ollama server URL (default: http://localhost:11434)
.timeout(Duration.ofMinutes(5)) // Request timeout (default: no timeout)
.httpClientBuilder(customHttpClientBuilder) // Custom HTTP client (default: LangChain4j default)
.customHeaders(headers) // Custom headers (default: none)builder()
.modelName("llama2") // Model name (required for all models except OllamaModels)
.temperature(0.7) // Sampling temperature 0.0-2.0+ (default: model-specific)
.topP(0.9) // Nucleus sampling 0.0-1.0 (default: model-specific)
.topK(40) // Top-K sampling > 0 (default: model-specific)
.numPredict(512) // Max output tokens > 0 (default: model-specific)
.numCtx(2048) // Context window size > 0 (default: model-specific)
.stop(List.of("END")) // Stop sequences (default: none)
.seed(42) // Random seed (default: random)builder()
.mirostat(2) // Mirostat mode: 0, 1, 2 (default: 0)
.mirostatEta(0.1) // Mirostat learning rate > 0.0 (default: 0.1)
.mirostatTau(5.0) // Mirostat tau > 0.0 (default: 5.0)
.repeatPenalty(1.1) // Repetition penalty >= 0.0 (default: 1.0)
.repeatLastN(64) // Repeat check window >= 0 (default: 64)
.minP(0.05) // Minimum probability 0.0-1.0 (default: 0.0)
.think(true) // Thinking mode (default: null)
.returnThinking(true) // Return thinking text (default: false)builder()
.maxRetries(3) // Max retry attempts >= 0 (default: 2, N/A for streaming)
.logRequests(true) // Log requests (default: false)
.logResponses(true) // Log responses (default: false)
.logger(customLogger) // Custom logger (default: SLF4J logger)
.listeners(listeners) // Chat model listeners (default: empty list)
.supportedCapabilities(caps) // Declare capabilities (default: empty set)Model Configuration:
IllegalArgumentException - Invalid parameter values at build timeIllegalStateException - Required parameters (e.g., modelName) not set at build timeRuntime Errors:
HttpTimeoutException - Request timeout exceededIOException - Network connectivity issuesRuntimeException - Ollama server errors (wrapped server error responses)import java.io.IOException;
import java.net.http.HttpTimeoutException;
try {
ChatResponse response = model.doChat(request);
// Process response
} catch (HttpTimeoutException e) {
// Handle timeout - request took too long
logger.error("Request timed out", e);
} catch (IOException e) {
// Handle network errors - server unreachable
logger.error("Network error", e);
} catch (RuntimeException e) {
// Handle server errors - model not found, invalid request, etc.
logger.error("Ollama server error", e);
}| Parameter | Default Value | Description | Valid Range |
|---|---|---|---|
baseUrl | http://localhost:11434 | Ollama server URL | Valid URL |
maxRetries | 2 | Maximum retry attempts | >= 0 |
mirostat | 0 | Mirostat sampling mode | 0, 1, 2 |
mirostatEta | 0.1 | Mirostat learning rate | > 0.0 |
mirostatTau | 5.0 | Mirostat tau parameter | > 0.0 |
minP | 0.0 | Minimum probability threshold | 0.0-1.0 |
repeatPenalty | 1.0 | Repetition penalty | >= 0.0 |
repeatLastN | 64 | Repetition check window | >= 0 |
keepAlive | 300 (5m) | Model keep-alive duration (seconds) | >= 0 |
returnThinking | false | Return thinking text in response | true/false |
logRequests | false | Log outgoing requests | true/false |
logResponses | false | Log incoming responses | true/false |
Model Instances: All built model instances (OllamaChatModel, OllamaLanguageModel, OllamaEmbeddingModel, etc.) are immutable and thread-safe after calling build(). Multiple threads can safely share and use the same model instance for concurrent requests.
Builders: Builder instances are not thread-safe. Each thread should use its own builder instance or synchronize access.
Stateless Operations: All model operations are stateless (conversation history must be managed by caller). No shared mutable state between requests.
Connection Pooling: HTTP client reuses connections safely across concurrent requests.
Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-ollama