Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features
Chat models provide conversational AI capabilities with full context management, supporting both synchronous and streaming interactions.
Synchronous chat model for blocking request/response interactions.
package dev.langchain4j.model.ollama;
public class OllamaChatModel
extends OllamaBaseChatModel
implements ChatModelThread Safety: Immutable after build(); safe for concurrent requests from multiple threads
Nullability: Instance never null after successful build()
public static OllamaChatModel.OllamaChatModelBuilder builder()Returns a builder for creating OllamaChatModel instances.
Returns: Fresh OllamaChatModelBuilder instance
Example:
OllamaChatModel model = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama2")
.temperature(0.7)
.maxRetries(3)
.build();public ChatResponse doChat(ChatRequest chatRequest)Executes a synchronous chat request and returns the complete response.
Parameters:
chatRequest - The chat request containing messages and parameters (must not be null)Returns: ChatResponse containing the AI's response message and metadata
Throws:
IllegalArgumentException - If chatRequest is nullHttpTimeoutException - If request exceeds configured timeoutIOException - If network connectivity failsRuntimeException - If Ollama server returns error (model not found, etc.)Thread Safety: Safe for concurrent calls; no shared mutable state
Retry Behavior: Automatically retries on transient failures up to maxRetries times
Example:
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
import java.io.IOException;
import java.net.http.HttpTimeoutException;
try {
ChatRequest request = ChatRequest.builder()
.messages(UserMessage.from("Tell me a joke"))
.build();
ChatResponse response = model.doChat(request);
System.out.println(response.aiMessage().text());
} catch (HttpTimeoutException e) {
System.err.println("Request timed out: " + e.getMessage());
} catch (IOException e) {
System.err.println("Network error: " + e.getMessage());
} catch (RuntimeException e) {
System.err.println("Server error: " + e.getMessage());
}public OllamaChatRequestParameters defaultRequestParameters()Returns the default request parameters configured for this model.
Returns: OllamaChatRequestParameters - The default parameters
Example:
OllamaChatRequestParameters defaults = model.defaultRequestParameters();
Double temperature = defaults.temperature(); // May be null if not setpublic List<ChatModelListener> listeners()Returns the list of registered chat model listeners for observability.
Returns: List<ChatModelListener> - Registered listeners
public ModelProvider provider()Returns the model provider identifier.
Returns: ModelProvider.OLLAMA - Always returns OLLAMA constant
public Set<Capability> supportedCapabilities()Returns the set of capabilities supported by this model.
Returns: Set<Capability> - Supported capabilities
Builder for configuring and creating OllamaChatModel instances.
public static class OllamaChatModelBuilder
extends OllamaBaseChatModel.Builder<OllamaChatModel, OllamaChatModelBuilder>Thread Safety: Not thread-safe; each thread must use its own builder instance
public OllamaChatModelBuilder maxRetries(Integer maxRetries)Sets the maximum number of retry attempts for failed requests.
Parameters:
maxRetries - Maximum retry attempts
>= 02Returns: This builder instance (never null)
Throws:
IllegalArgumentException - If maxRetries < 0Note: Retry only applies to transient failures (network errors, timeouts); server errors (model not found) are not retried
Example:
OllamaChatModel model = OllamaChatModel.builder()
.maxRetries(5)
.build();public OllamaChatModelBuilder modelName(String modelName)Sets the name of the Ollama model to use.
Parameters:
modelName - Model name (e.g., "llama2", "mistral", "deepseek-r1")
Returns: This builder instance (never null)
Throws:
IllegalStateException at build() - If modelName not setRuntimeException at runtime - If model not found on serverExample:
OllamaChatModel model = OllamaChatModel.builder()
.modelName("llama2:13b")
.build();public OllamaChatModelBuilder temperature(Double temperature)Sets the sampling temperature for randomness control.
Parameters:
temperature - Temperature value
0.0-2.0+ (higher values possible but not recommended)0.0 = deterministic (with same seed)0.7-0.9 = balanced> 1.0 = very creative/randomReturns: This builder instance (never null)
Example:
OllamaChatModel model = OllamaChatModel.builder()
.temperature(0.7)
.build();public OllamaChatModelBuilder think(Boolean think)Controls thinking/reasoning mode for models like DeepSeek R1.
Parameters:
think - Thinking mode
true: LLM thinks and returns thoughts in separate thinking fieldfalse: LLM does not thinknull (default): Reasoning LLMs prepend thoughts delimited by <think> tagsReturns: This builder instance (never null)
See also: returnThinking() to control whether thinking text is returned
Example:
OllamaChatModel model = OllamaChatModel.builder()
.modelName("deepseek-r1")
.think(true)
.build();public OllamaChatModelBuilder returnThinking(Boolean returnThinking)Controls whether to return thinking/reasoning text in AiMessage.thinking() and invoke streaming callbacks.
Parameters:
returnThinking - Whether to parse and return thinking text
falsetrue: Thinking text returned in AiMessage.thinking()false: Thinking text not returnedReturns: This builder instance (never null)
Note: This only controls whether to return thinking text; it does not enable thinking. Use think() to enable thinking.
Example:
OllamaChatModel model = OllamaChatModel.builder()
.think(true)
.returnThinking(true)
.build();
ChatResponse response = model.doChat(request);
String thinking = response.aiMessage().thinking(); // Non-null if thinking occurredpublic OllamaChatModel build()Builds and returns the configured OllamaChatModel instance.
Returns: Configured OllamaChatModel
Throws:
IllegalStateException - If required parameters missing (e.g., modelName)IllegalArgumentException - If parameter values invalidExample:
try {
OllamaChatModel model = OllamaChatModel.builder()
.modelName("llama2")
.build();
} catch (IllegalStateException e) {
System.err.println("Missing required parameter: " + e.getMessage());
}Streaming chat model for real-time token-by-token responses.
package dev.langchain4j.model.ollama;
public class OllamaStreamingChatModel
extends OllamaBaseChatModel
implements StreamingChatModelThread Safety: Immutable after build(); safe for concurrent requests
Streaming Threading: Callbacks invoked on HTTP client thread; ensure thread-safe callback implementations
public static OllamaStreamingChatModel.OllamaStreamingChatModelBuilder builder()Returns a builder for creating OllamaStreamingChatModel instances.
Returns: Fresh OllamaStreamingChatModelBuilder instance (not thread-safe)
Example:
OllamaStreamingChatModel model = OllamaStreamingChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama2")
.build();public void doChat(ChatRequest chatRequest, StreamingChatResponseHandler handler)Executes a streaming chat request, invoking the handler as tokens arrive.
Parameters:
chatRequest - The chat request containing messages and parameters (must not be null)handler - Handler for streaming response callbacks (must not be null)Returns: void - Method returns immediately; response arrives via callbacks
Throws:
IllegalArgumentException - If chatRequest or handler is nullError Handling: Errors during streaming trigger handler.onError(Throwable)
Thread Safety:
No Retry: Streaming operations do not automatically retry on failure
Example:
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;
model.doChat(request, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
System.out.print(partialResponse); // Must be thread-safe
}
@Override
public void onPartialThinking(PartialThinking thinking) {
// Called when thinking text arrives (if returnThinking=true)
System.out.print("[thinking] " + thinking.text());
}
@Override
public void onCompleteResponse(ChatResponse response) {
System.out.println("\n[Done]");
}
@Override
public void onError(Throwable error) {
System.err.println("Error: " + error.getMessage());
}
});import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
import java.io.IOException;
OllamaChatModel model = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama2")
.temperature(0.7)
.build();
try {
ChatRequest request = ChatRequest.builder()
.messages(UserMessage.from("What is 2+2?"))
.build();
ChatResponse response = model.doChat(request);
System.out.println(response.aiMessage().text());
} catch (IOException e) {
System.err.println("Network error: " + e.getMessage());
}import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
import java.util.ArrayList;
import java.util.List;
OllamaChatModel model = OllamaChatModel.builder()
.modelName("llama2")
.build();
List<ChatMessage> history = new ArrayList<>();
// First turn
history.add(UserMessage.from("Hi, I'm learning Java."));
ChatRequest request1 = ChatRequest.builder().messages(history).build();
ChatResponse response1 = model.doChat(request1);
history.add(response1.aiMessage());
System.out.println("AI: " + response1.aiMessage().text());
// Second turn
history.add(UserMessage.from("Can you recommend a good book?"));
ChatRequest request2 = ChatRequest.builder().messages(history).build();
ChatResponse response2 = model.doChat(request2);
history.add(response2.aiMessage());
System.out.println("AI: " + response2.aiMessage().text());Note: Models are stateless; caller must maintain conversation history
// For simple request/response
OllamaChatModel model = OllamaChatModel.builder().build();
// For real-time user interfaces
OllamaStreamingChatModel streamingModel = OllamaStreamingChatModel.builder().build();OllamaChatModel model = OllamaChatModel.builder()
.maxRetries(3) // For production, use 3-5 retries
.timeout(Duration.ofMinutes(2))
.build();import java.io.IOException;
import java.net.http.HttpTimeoutException;
try {
ChatResponse response = model.doChat(request);
// Process response
} catch (HttpTimeoutException e) {
logger.error("Request timed out", e);
// Retry or fallback
} catch (IOException e) {
logger.error("Network error", e);
// Check connectivity
} catch (RuntimeException e) {
logger.error("Server error", e);
// Check model availability
}// Handler must be thread-safe if shared
class ThreadSafeHandler implements StreamingChatResponseHandler {
private final AtomicReference<String> response = new AtomicReference<>("");
@Override
public synchronized void onPartialResponse(String partial) {
// Synchronized for thread safety
response.updateAndGet(current -> current + partial);
}
// Other methods...
}Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-ollama