CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-ollama

Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features

Overview
Eval results
Files

index.mddocs/

Langchain4j-Ollama

Langchain4j-Ollama provides Java integration for Ollama, enabling local LLM interactions through a complete set of model interfaces including chat, language, streaming, and embedding models.

Package Information

Maven Coordinates:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-ollama</artifactId>
    <version>${langchain4j.version}</version>
</dependency>

Package: dev.langchain4j.model.ollama

Java Version: Java 8+ (source and target compatibility level 8)

Base URL: http://localhost:11434 (default Ollama server endpoint)

Core Imports

// Main model classes
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.ollama.OllamaLanguageModel;
import dev.langchain4j.model.ollama.OllamaStreamingLanguageModel;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;

// Model management
import dev.langchain4j.model.ollama.OllamaModels;

// Request parameters
import dev.langchain4j.model.ollama.OllamaChatRequestParameters;

// Supporting types
import dev.langchain4j.model.ollama.OllamaModel;
import dev.langchain4j.model.ollama.OllamaModelCard;
import dev.langchain4j.model.ollama.OllamaModelDetails;
import dev.langchain4j.model.ollama.RunningOllamaModel;

// Langchain4j core types
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;

Basic Usage

Quick Start - Synchronous Chat

import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;

// Create chat model
OllamaChatModel model = OllamaChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .temperature(0.7)
    .build();

// Send message
ChatRequest request = ChatRequest.builder()
    .messages(UserMessage.from("What is the capital of France?"))
    .build();

ChatResponse response = model.doChat(request);
System.out.println(response.aiMessage().text());

Streaming Chat

import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;

OllamaStreamingChatModel streamingModel = OllamaStreamingChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .build();

streamingModel.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String partialResponse) {
        System.out.print(partialResponse);
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        System.out.println("\nDone!");
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Text Embeddings

import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import java.util.List;

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("nomic-embed-text")
    .build();

List<TextSegment> segments = List.of(
    TextSegment.from("Hello world"),
    TextSegment.from("Goodbye world")
);

Response<List<Embedding>> embeddings = embeddingModel.embedAll(segments);

Architecture

The langchain4j-ollama module is built around several key architectural patterns that enable flexible integration with Ollama's local LLM capabilities.

Key architectural components:

  • Interface Implementation: Implements standard LangChain4j interfaces (ChatModel, LanguageModel, EmbeddingModel) for ecosystem compatibility
  • Builder Pattern: All models use fluent builders with sensible defaults and extensive customization options
  • Base Chat Model: Shared OllamaBaseChatModel base class provides common functionality for chat models
  • HTTP Client Abstraction: Pluggable HTTP client with custom headers, logging, and retry logic
  • Streaming Architecture: Server-Sent Events (SSE) for real-time token delivery in streaming models
  • Parameter Layering: Three-tier parameter system (model defaults, request defaults, per-request overrides)
  • SPI Factories: Service Provider Interface for dependency injection and framework integration
  • Thread Safety: All built model instances are immutable and thread-safe; builders are not thread-safe

Learn more: Architecture Documentation

Capabilities

1. Chat Models

Synchronous and streaming chat interactions with full conversation context support.

// Synchronous chat
OllamaChatModel chatModel = OllamaChatModel.builder()
    .modelName("llama2")
    .temperature(0.8)
    .maxRetries(3)
    .build();

ChatResponse response = chatModel.doChat(request);

Thread Safety: Immutable after build(); safe for concurrent requests

Learn more: Chat Models Documentation

2. Language Models

Simple text completion for prompts without conversation context.

// Synchronous completion
OllamaLanguageModel languageModel = OllamaLanguageModel.builder()
    .modelName("llama2")
    .numPredict(100)
    .build();

Response<String> completion = languageModel.generate("Once upon a time");

Thread Safety: Immutable after build(); safe for concurrent requests

Learn more: Language Models Documentation

3. Embedding Models

Generate vector embeddings for text segments.

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .modelName("nomic-embed-text")
    .build();

Response<List<Embedding>> embeddings = embeddingModel.embedAll(textSegments);
String modelName = embeddingModel.modelName();

Returns: Deterministic embeddings (same input → same output for same model)

Thread Safety: Immutable after build(); safe for concurrent requests

Learn more: Embedding Model Documentation

4. Model Management

List, inspect, and manage Ollama models.

OllamaModels ollamaModels = OllamaModels.builder()
    .baseUrl("http://localhost:11434")
    .build();

// List available models
Response<List<OllamaModel>> models = ollamaModels.availableModels();

// Get model details
Response<OllamaModelCard> card = ollamaModels.modelCard("llama2");

// List running models
Response<List<RunningOllamaModel>> running = ollamaModels.runningModels();

// Delete a model
ollamaModels.deleteModel("old-model");

Thread Safety: Immutable after build(); safe for concurrent operations

Learn more: Model Management Documentation

5. Advanced Configuration

Ollama-specific parameters for fine-tuned control.

// Mirostat sampling
OllamaChatModel model = OllamaChatModel.builder()
    .modelName("llama2")
    .mirostat(2)              // Mirostat 2.0
    .mirostatEta(0.1)         // Learning rate
    .mirostatTau(5.0)         // Tau parameter
    .build();

// Reasoning/thinking mode
OllamaChatModel reasoningModel = OllamaChatModel.builder()
    .modelName("deepseek-r1")
    .think(true)              // Enable thinking
    .returnThinking(true)     // Return thinking text
    .build();

// Context window and repetition control
OllamaChatModel configuredModel = OllamaChatModel.builder()
    .numCtx(4096)            // Context window size
    .repeatPenalty(1.1)      // Repetition penalty
    .repeatLastN(64)         // Check last N tokens
    .minP(0.05)              // Minimum probability
    .seed(42)                // Reproducibility
    .build();

Parameter Validation: Invalid values throw IllegalArgumentException at build time

Learn more: Request Parameters Documentation

6. Type System

Complete type definitions for Ollama model metadata.

// Model information
OllamaModel model = OllamaModel.builder()
    .name("llama2")
    .size(3826793677L)
    .digest("sha256:...")
    .build();

// Model details
OllamaModelDetails details = OllamaModelDetails.builder()
    .format("gguf")
    .family("llama")
    .parameterSize("7B")
    .quantizationLevel("Q4_0")
    .build();

// Model card with full metadata
OllamaModelCard card = OllamaModelCard.builder()
    .license("Apache 2.0")
    .template("{{ .Prompt }}")
    .details(details)
    .build();

Nullability: All fields can be null except where noted in type documentation

Learn more: Types Documentation

Key Features

Thinking/Reasoning Mode

Support for reasoning models like DeepSeek R1:

OllamaChatModel model = OllamaChatModel.builder()
    .modelName("deepseek-r1")
    .think(true)              // Enable structured thinking
    .returnThinking(true)     // Return thinking in AiMessage
    .build();

Thinking modes:

  • think(true): LLM thinks and returns thoughts in separate field
  • think(false): LLM does not think
  • think(null) (default): Reasoning LLMs prepend thoughts with <think> tags

Mirostat Sampling

Advanced perplexity control:

OllamaChatModel model = OllamaChatModel.builder()
    .mirostat(2)              // 0=disabled, 1=Mirostat, 2=Mirostat 2.0
    .mirostatEta(0.1)         // Learning rate (default: 0.1)
    .mirostatTau(5.0)         // Coherence/diversity balance (default: 5.0)
    .build();

Valid Values:

  • mirostat: 0, 1, or 2 only
  • mirostatEta: > 0.0 (typically 0.01 to 1.0)
  • mirostatTau: > 0.0 (typically 1.0 to 10.0)

Retry Logic

Automatic retry with configurable attempts:

OllamaChatModel model = OllamaChatModel.builder()
    .maxRetries(3)            // Default: 2
    .build();

Note: Retry only applies to non-streaming models; streaming models do not retry

Custom Headers

Static or dynamic HTTP headers:

// Static headers
Map<String, String> headers = Map.of("Authorization", "Bearer token");
OllamaChatModel model = OllamaChatModel.builder()
    .customHeaders(headers)
    .build();

// Dynamic headers (e.g., for token refresh)
Supplier<Map<String, String>> headerSupplier = () ->
    Map.of("Authorization", "Bearer " + getToken());
OllamaChatModel model = OllamaChatModel.builder()
    .customHeaders(headerSupplier)
    .build();

Nullability: Both customHeaders methods accept null (means no custom headers)

Observability

Request/response logging and model listeners:

OllamaChatModel model = OllamaChatModel.builder()
    .logRequests(true)
    .logResponses(true)
    .logger(customLogger)
    .listeners(List.of(chatModelListener))
    .build();

Nullability: logger defaults to SLF4J logger for the class; listeners defaults to empty list

Common Configuration

Connection Settings

builder()
    .baseUrl("http://localhost:11434")           // Ollama server URL (default: http://localhost:11434)
    .timeout(Duration.ofMinutes(5))              // Request timeout (default: no timeout)
    .httpClientBuilder(customHttpClientBuilder)  // Custom HTTP client (default: LangChain4j default)
    .customHeaders(headers)                       // Custom headers (default: none)

Model Parameters

builder()
    .modelName("llama2")           // Model name (required for all models except OllamaModels)
    .temperature(0.7)              // Sampling temperature 0.0-2.0+ (default: model-specific)
    .topP(0.9)                     // Nucleus sampling 0.0-1.0 (default: model-specific)
    .topK(40)                      // Top-K sampling > 0 (default: model-specific)
    .numPredict(512)               // Max output tokens > 0 (default: model-specific)
    .numCtx(2048)                  // Context window size > 0 (default: model-specific)
    .stop(List.of("END"))          // Stop sequences (default: none)
    .seed(42)                      // Random seed (default: random)

Ollama-Specific Parameters

builder()
    .mirostat(2)                   // Mirostat mode: 0, 1, 2 (default: 0)
    .mirostatEta(0.1)             // Mirostat learning rate > 0.0 (default: 0.1)
    .mirostatTau(5.0)             // Mirostat tau > 0.0 (default: 5.0)
    .repeatPenalty(1.1)           // Repetition penalty >= 0.0 (default: 1.0)
    .repeatLastN(64)              // Repeat check window >= 0 (default: 64)
    .minP(0.05)                   // Minimum probability 0.0-1.0 (default: 0.0)
    .think(true)                  // Thinking mode (default: null)
    .returnThinking(true)         // Return thinking text (default: false)

Operational Settings

builder()
    .maxRetries(3)                // Max retry attempts >= 0 (default: 2, N/A for streaming)
    .logRequests(true)            // Log requests (default: false)
    .logResponses(true)           // Log responses (default: false)
    .logger(customLogger)         // Custom logger (default: SLF4J logger)
    .listeners(listeners)         // Chat model listeners (default: empty list)
    .supportedCapabilities(caps)  // Declare capabilities (default: empty set)

API Documentation

Core Classes

Configuration

Types

Service Provider Interface

  • SPI Interfaces - Factory interfaces for dependency injection and framework integration

Error Handling

Common Exceptions

Model Configuration:

  • IllegalArgumentException - Invalid parameter values at build time
  • IllegalStateException - Required parameters (e.g., modelName) not set at build time

Runtime Errors:

  • HttpTimeoutException - Request timeout exceeded
  • IOException - Network connectivity issues
  • RuntimeException - Ollama server errors (wrapped server error responses)

Exception Handling Example

import java.io.IOException;
import java.net.http.HttpTimeoutException;

try {
    ChatResponse response = model.doChat(request);
    // Process response
} catch (HttpTimeoutException e) {
    // Handle timeout - request took too long
    logger.error("Request timed out", e);
} catch (IOException e) {
    // Handle network errors - server unreachable
    logger.error("Network error", e);
} catch (RuntimeException e) {
    // Handle server errors - model not found, invalid request, etc.
    logger.error("Ollama server error", e);
}

References

  • Ollama API Documentation
  • Ollama Model Parameters
  • Ollama Thinking/Reasoning
  • Langchain4j Documentation

Default Values

ParameterDefault ValueDescriptionValid Range
baseUrlhttp://localhost:11434Ollama server URLValid URL
maxRetries2Maximum retry attempts>= 0
mirostat0Mirostat sampling mode0, 1, 2
mirostatEta0.1Mirostat learning rate> 0.0
mirostatTau5.0Mirostat tau parameter> 0.0
minP0.0Minimum probability threshold0.0-1.0
repeatPenalty1.0Repetition penalty>= 0.0
repeatLastN64Repetition check window>= 0
keepAlive300 (5m)Model keep-alive duration (seconds)>= 0
returnThinkingfalseReturn thinking text in responsetrue/false
logRequestsfalseLog outgoing requeststrue/false
logResponsesfalseLog incoming responsestrue/false

Thread Safety

  • Model Instances: All built model instances (OllamaChatModel, OllamaLanguageModel, OllamaEmbeddingModel, etc.) are immutable and thread-safe after calling build(). Multiple threads can safely share and use the same model instance for concurrent requests.

  • Builders: Builder instances are not thread-safe. Each thread should use its own builder instance or synchronize access.

  • Stateless Operations: All model operations are stateless (conversation history must be managed by caller). No shared mutable state between requests.

  • Connection Pooling: HTTP client reuses connections safely across concurrent requests.

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-ollama

docs

architecture.md

chat-models.md

embedding-model.md

index.md

language-models.md

model-management.md

request-parameters.md

spi.md

types.md

README.md

tile.json