CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-ollama

Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features

Overview
Eval results
Files

chat-models.mddocs/

Chat Models

Chat models provide conversational AI capabilities with full context management, supporting both synchronous and streaming interactions.

OllamaChatModel

Synchronous chat model for blocking request/response interactions.

Class Signature

package dev.langchain4j.model.ollama;

public class OllamaChatModel
    extends OllamaBaseChatModel
    implements ChatModel

Thread Safety: Immutable after build(); safe for concurrent requests from multiple threads

Nullability: Instance never null after successful build()

Creating an Instance

public static OllamaChatModel.OllamaChatModelBuilder builder()

Returns a builder for creating OllamaChatModel instances.

Returns: Fresh OllamaChatModelBuilder instance

  • Never null
  • Not thread-safe (each thread needs own builder)

Example:

OllamaChatModel model = OllamaChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .temperature(0.7)
    .maxRetries(3)
    .build();

Public Methods

doChat

public ChatResponse doChat(ChatRequest chatRequest)

Executes a synchronous chat request and returns the complete response.

Parameters:

  • chatRequest - The chat request containing messages and parameters (must not be null)

Returns: ChatResponse containing the AI's response message and metadata

  • Never null
  • Always contains at least one message
  • May contain thinking text if enabled

Throws:

  • IllegalArgumentException - If chatRequest is null
  • HttpTimeoutException - If request exceeds configured timeout
  • IOException - If network connectivity fails
  • RuntimeException - If Ollama server returns error (model not found, etc.)

Thread Safety: Safe for concurrent calls; no shared mutable state

Retry Behavior: Automatically retries on transient failures up to maxRetries times

Example:

import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
import java.io.IOException;
import java.net.http.HttpTimeoutException;

try {
    ChatRequest request = ChatRequest.builder()
        .messages(UserMessage.from("Tell me a joke"))
        .build();

    ChatResponse response = model.doChat(request);
    System.out.println(response.aiMessage().text());

} catch (HttpTimeoutException e) {
    System.err.println("Request timed out: " + e.getMessage());
} catch (IOException e) {
    System.err.println("Network error: " + e.getMessage());
} catch (RuntimeException e) {
    System.err.println("Server error: " + e.getMessage());
}

defaultRequestParameters

public OllamaChatRequestParameters defaultRequestParameters()

Returns the default request parameters configured for this model.

Returns: OllamaChatRequestParameters - The default parameters

  • Never null
  • Immutable
  • Returns copy; modifications don't affect model

Example:

OllamaChatRequestParameters defaults = model.defaultRequestParameters();
Double temperature = defaults.temperature();  // May be null if not set

listeners

public List<ChatModelListener> listeners()

Returns the list of registered chat model listeners for observability.

Returns: List<ChatModelListener> - Registered listeners

  • Never null
  • May be empty
  • Immutable list

provider

public ModelProvider provider()

Returns the model provider identifier.

Returns: ModelProvider.OLLAMA - Always returns OLLAMA constant

supportedCapabilities

public Set<Capability> supportedCapabilities()

Returns the set of capabilities supported by this model.

Returns: Set<Capability> - Supported capabilities

  • Never null
  • May be empty if not configured
  • Immutable set

OllamaChatModel.OllamaChatModelBuilder

Builder for configuring and creating OllamaChatModel instances.

Class Signature

public static class OllamaChatModelBuilder
    extends OllamaBaseChatModel.Builder<OllamaChatModel, OllamaChatModelBuilder>

Thread Safety: Not thread-safe; each thread must use its own builder instance

Builder Methods

maxRetries

public OllamaChatModelBuilder maxRetries(Integer maxRetries)

Sets the maximum number of retry attempts for failed requests.

Parameters:

  • maxRetries - Maximum retry attempts
    • Valid range: >= 0
    • Default if not set: 2
    • Null means use default

Returns: This builder instance (never null)

Throws:

  • IllegalArgumentException - If maxRetries < 0

Note: Retry only applies to transient failures (network errors, timeouts); server errors (model not found) are not retried

Example:

OllamaChatModel model = OllamaChatModel.builder()
    .maxRetries(5)
    .build();

modelName

public OllamaChatModelBuilder modelName(String modelName)

Sets the name of the Ollama model to use.

Parameters:

  • modelName - Model name (e.g., "llama2", "mistral", "deepseek-r1")
    • Must not be null or empty
    • Must be available in Ollama server
    • Can include tag (e.g., "llama2:13b")

Returns: This builder instance (never null)

Throws:

  • IllegalStateException at build() - If modelName not set
  • RuntimeException at runtime - If model not found on server

Example:

OllamaChatModel model = OllamaChatModel.builder()
    .modelName("llama2:13b")
    .build();

temperature

public OllamaChatModelBuilder temperature(Double temperature)

Sets the sampling temperature for randomness control.

Parameters:

  • temperature - Temperature value
    • Valid range: 0.0-2.0+ (higher values possible but not recommended)
    • Default if not set: Model-specific default
    • 0.0 = deterministic (with same seed)
    • 0.7-0.9 = balanced
    • > 1.0 = very creative/random
    • Null means use model default

Returns: This builder instance (never null)

Example:

OllamaChatModel model = OllamaChatModel.builder()
    .temperature(0.7)
    .build();

think

public OllamaChatModelBuilder think(Boolean think)

Controls thinking/reasoning mode for models like DeepSeek R1.

Parameters:

  • think - Thinking mode
    • true: LLM thinks and returns thoughts in separate thinking field
    • false: LLM does not think
    • null (default): Reasoning LLMs prepend thoughts delimited by <think> tags
    • Only effective for reasoning-capable models

Returns: This builder instance (never null)

See also: returnThinking() to control whether thinking text is returned

Example:

OllamaChatModel model = OllamaChatModel.builder()
    .modelName("deepseek-r1")
    .think(true)
    .build();

returnThinking

public OllamaChatModelBuilder returnThinking(Boolean returnThinking)

Controls whether to return thinking/reasoning text in AiMessage.thinking() and invoke streaming callbacks.

Parameters:

  • returnThinking - Whether to parse and return thinking text
    • Default if not set: false
    • true: Thinking text returned in AiMessage.thinking()
    • false: Thinking text not returned
    • Null means use default

Returns: This builder instance (never null)

Note: This only controls whether to return thinking text; it does not enable thinking. Use think() to enable thinking.

Example:

OllamaChatModel model = OllamaChatModel.builder()
    .think(true)
    .returnThinking(true)
    .build();

ChatResponse response = model.doChat(request);
String thinking = response.aiMessage().thinking();  // Non-null if thinking occurred

build

public OllamaChatModel build()

Builds and returns the configured OllamaChatModel instance.

Returns: Configured OllamaChatModel

  • Never null
  • Immutable
  • Thread-safe

Throws:

  • IllegalStateException - If required parameters missing (e.g., modelName)
  • IllegalArgumentException - If parameter values invalid

Example:

try {
    OllamaChatModel model = OllamaChatModel.builder()
        .modelName("llama2")
        .build();
} catch (IllegalStateException e) {
    System.err.println("Missing required parameter: " + e.getMessage());
}

OllamaStreamingChatModel

Streaming chat model for real-time token-by-token responses.

Class Signature

package dev.langchain4j.model.ollama;

public class OllamaStreamingChatModel
    extends OllamaBaseChatModel
    implements StreamingChatModel

Thread Safety: Immutable after build(); safe for concurrent requests

Streaming Threading: Callbacks invoked on HTTP client thread; ensure thread-safe callback implementations

Creating an Instance

public static OllamaStreamingChatModel.OllamaStreamingChatModelBuilder builder()

Returns a builder for creating OllamaStreamingChatModel instances.

Returns: Fresh OllamaStreamingChatModelBuilder instance (not thread-safe)

Example:

OllamaStreamingChatModel model = OllamaStreamingChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .build();

Public Methods

doChat

public void doChat(ChatRequest chatRequest, StreamingChatResponseHandler handler)

Executes a streaming chat request, invoking the handler as tokens arrive.

Parameters:

  • chatRequest - The chat request containing messages and parameters (must not be null)
  • handler - Handler for streaming response callbacks (must not be null)

Returns: void - Method returns immediately; response arrives via callbacks

Throws:

  • IllegalArgumentException - If chatRequest or handler is null

Error Handling: Errors during streaming trigger handler.onError(Throwable)

Thread Safety:

  • Safe for concurrent calls
  • Handler callbacks invoked on HTTP client thread
  • Ensure handler implementation is thread-safe if shared

No Retry: Streaming operations do not automatically retry on failure

Example:

import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;

model.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String partialResponse) {
        System.out.print(partialResponse);  // Must be thread-safe
    }

    @Override
    public void onPartialThinking(PartialThinking thinking) {
        // Called when thinking text arrives (if returnThinking=true)
        System.out.print("[thinking] " + thinking.text());
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        System.out.println("\n[Done]");
    }

    @Override
    public void onError(Throwable error) {
        System.err.println("Error: " + error.getMessage());
    }
});

Usage Examples

Basic Synchronous Chat

import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
import java.io.IOException;

OllamaChatModel model = OllamaChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .temperature(0.7)
    .build();

try {
    ChatRequest request = ChatRequest.builder()
        .messages(UserMessage.from("What is 2+2?"))
        .build();

    ChatResponse response = model.doChat(request);
    System.out.println(response.aiMessage().text());

} catch (IOException e) {
    System.err.println("Network error: " + e.getMessage());
}

Multi-turn Conversation

import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
import java.util.ArrayList;
import java.util.List;

OllamaChatModel model = OllamaChatModel.builder()
    .modelName("llama2")
    .build();

List<ChatMessage> history = new ArrayList<>();

// First turn
history.add(UserMessage.from("Hi, I'm learning Java."));
ChatRequest request1 = ChatRequest.builder().messages(history).build();
ChatResponse response1 = model.doChat(request1);
history.add(response1.aiMessage());

System.out.println("AI: " + response1.aiMessage().text());

// Second turn
history.add(UserMessage.from("Can you recommend a good book?"));
ChatRequest request2 = ChatRequest.builder().messages(history).build();
ChatResponse response2 = model.doChat(request2);
history.add(response2.aiMessage());

System.out.println("AI: " + response2.aiMessage().text());

Note: Models are stateless; caller must maintain conversation history

Best Practices

1. Choose the Right Model Type

// For simple request/response
OllamaChatModel model = OllamaChatModel.builder().build();

// For real-time user interfaces
OllamaStreamingChatModel streamingModel = OllamaStreamingChatModel.builder().build();

2. Configure Retry Logic

OllamaChatModel model = OllamaChatModel.builder()
    .maxRetries(3)  // For production, use 3-5 retries
    .timeout(Duration.ofMinutes(2))
    .build();

3. Implement Proper Error Handling

import java.io.IOException;
import java.net.http.HttpTimeoutException;

try {
    ChatResponse response = model.doChat(request);
    // Process response
} catch (HttpTimeoutException e) {
    logger.error("Request timed out", e);
    // Retry or fallback
} catch (IOException e) {
    logger.error("Network error", e);
    // Check connectivity
} catch (RuntimeException e) {
    logger.error("Server error", e);
    // Check model availability
}

4. Thread-Safe Handler Implementation

// Handler must be thread-safe if shared
class ThreadSafeHandler implements StreamingChatResponseHandler {
    private final AtomicReference<String> response = new AtomicReference<>("");

    @Override
    public synchronized void onPartialResponse(String partial) {
        // Synchronized for thread safety
        response.updateAndGet(current -> current + partial);
    }

    // Other methods...
}

See Also

  • Request Parameters - Detailed parameter documentation
  • Language Models - Simple text completion
  • Model Management - Managing Ollama models

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-ollama

docs

architecture.md

chat-models.md

embedding-model.md

index.md

language-models.md

model-management.md

request-parameters.md

spi.md

types.md

README.md

tile.json