tessl/maven-dev-langchain4j--langchain4j-ollama

Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features

Overview

Eval results

Files

Langchain4j-Ollama

Name: tessl/maven-dev-langchain4j--langchain4j-ollama
Author: tessl

Langchain4j-Ollama provides Java integration for Ollama, enabling local LLM interactions through a complete set of model interfaces including chat, language, streaming, and embedding models.

Package Information

Maven Coordinates:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-ollama</artifactId>
    <version>${langchain4j.version}</version>
</dependency>

Package: dev.langchain4j.model.ollama

Java Version: Java 8+ (source and target compatibility level 8)

Base URL: http://localhost:11434 (default Ollama server endpoint)

Core Imports

// Main model classes
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.ollama.OllamaLanguageModel;
import dev.langchain4j.model.ollama.OllamaStreamingLanguageModel;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;

// Model management
import dev.langchain4j.model.ollama.OllamaModels;

// Request parameters
import dev.langchain4j.model.ollama.OllamaChatRequestParameters;

// Supporting types
import dev.langchain4j.model.ollama.OllamaModel;
import dev.langchain4j.model.ollama.OllamaModelCard;
import dev.langchain4j.model.ollama.OllamaModelDetails;
import dev.langchain4j.model.ollama.RunningOllamaModel;

// Langchain4j core types
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;

Basic Usage

Quick Start - Synchronous Chat

import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;

// Create chat model
OllamaChatModel model = OllamaChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .temperature(0.7)
    .build();

// Send message
ChatRequest request = ChatRequest.builder()
    .messages(UserMessage.from("What is the capital of France?"))
    .build();

ChatResponse response = model.doChat(request);
System.out.println(response.aiMessage().text());

Streaming Chat

import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;

OllamaStreamingChatModel streamingModel = OllamaStreamingChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .build();

streamingModel.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String partialResponse) {
        System.out.print(partialResponse);
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        System.out.println("\nDone!");
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Text Embeddings

import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import java.util.List;

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("nomic-embed-text")
    .build();

List<TextSegment> segments = List.of(
    TextSegment.from("Hello world"),
    TextSegment.from("Goodbye world")
);

Response<List<Embedding>> embeddings = embeddingModel.embedAll(segments);

Architecture

The langchain4j-ollama module is built around several key architectural patterns that enable flexible integration with Ollama's local LLM capabilities.

Key architectural components:

Interface Implementation: Implements standard LangChain4j interfaces (ChatModel, LanguageModel, EmbeddingModel) for ecosystem compatibility
Builder Pattern: All models use fluent builders with sensible defaults and extensive customization options
Base Chat Model: Shared OllamaBaseChatModel base class provides common functionality for chat models
HTTP Client Abstraction: Pluggable HTTP client with custom headers, logging, and retry logic
Streaming Architecture: Server-Sent Events (SSE) for real-time token delivery in streaming models
Parameter Layering: Three-tier parameter system (model defaults, request defaults, per-request overrides)
SPI Factories: Service Provider Interface for dependency injection and framework integration
Thread Safety: All built model instances are immutable and thread-safe; builders are not thread-safe

Learn more: Architecture Documentation

Capabilities

1. Chat Models

Synchronous and streaming chat interactions with full conversation context support.

// Synchronous chat
OllamaChatModel chatModel = OllamaChatModel.builder()
    .modelName("llama2")
    .temperature(0.8)
    .maxRetries(3)
    .build();

ChatResponse response = chatModel.doChat(request);

Thread Safety: Immutable after build(); safe for concurrent requests

Learn more: Chat Models Documentation

2. Language Models

Simple text completion for prompts without conversation context.

// Synchronous completion
OllamaLanguageModel languageModel = OllamaLanguageModel.builder()
    .modelName("llama2")
    .numPredict(100)
    .build();

Response<String> completion = languageModel.generate("Once upon a time");

Thread Safety: Immutable after build(); safe for concurrent requests

Learn more: Language Models Documentation

3. Embedding Models

Generate vector embeddings for text segments.

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .modelName("nomic-embed-text")
    .build();

Response<List<Embedding>> embeddings = embeddingModel.embedAll(textSegments);
String modelName = embeddingModel.modelName();

Returns: Deterministic embeddings (same input → same output for same model)

Thread Safety: Immutable after build(); safe for concurrent requests

Learn more: Embedding Model Documentation

4. Model Management

List, inspect, and manage Ollama models.

OllamaModels ollamaModels = OllamaModels.builder()
    .baseUrl("http://localhost:11434")
    .build();

// List available models
Response<List<OllamaModel>> models = ollamaModels.availableModels();

// Get model details
Response<OllamaModelCard> card = ollamaModels.modelCard("llama2");

// List running models
Response<List<RunningOllamaModel>> running = ollamaModels.runningModels();

// Delete a model
ollamaModels.deleteModel("old-model");

Thread Safety: Immutable after build(); safe for concurrent operations

Learn more: Model Management Documentation

5. Advanced Configuration

Ollama-specific parameters for fine-tuned control.

// Mirostat sampling
OllamaChatModel model = OllamaChatModel.builder()
    .modelName("llama2")
    .mirostat(2)              // Mirostat 2.0
    .mirostatEta(0.1)         // Learning rate
    .mirostatTau(5.0)         // Tau parameter
    .build();

// Reasoning/thinking mode
OllamaChatModel reasoningModel = OllamaChatModel.builder()
    .modelName("deepseek-r1")
    .think(true)              // Enable thinking
    .returnThinking(true)     // Return thinking text
    .build();

// Context window and repetition control
OllamaChatModel configuredModel = OllamaChatModel.builder()
    .numCtx(4096)            // Context window size
    .repeatPenalty(1.1)      // Repetition penalty
    .repeatLastN(64)         // Check last N tokens
    .minP(0.05)              // Minimum probability
    .seed(42)                // Reproducibility
    .build();

Parameter Validation: Invalid values throw IllegalArgumentException at build time

Learn more: Request Parameters Documentation

6. Type System

Complete type definitions for Ollama model metadata.

// Model information
OllamaModel model = OllamaModel.builder()
    .name("llama2")
    .size(3826793677L)
    .digest("sha256:...")
    .build();

// Model details
OllamaModelDetails details = OllamaModelDetails.builder()
    .format("gguf")
    .family("llama")
    .parameterSize("7B")
    .quantizationLevel("Q4_0")
    .build();

// Model card with full metadata
OllamaModelCard card = OllamaModelCard.builder()
    .license("Apache 2.0")
    .template("{{ .Prompt }}")
    .details(details)
    .build();

Nullability: All fields can be null except where noted in type documentation

Learn more: Types Documentation

Key Features

Thinking/Reasoning Mode

Support for reasoning models like DeepSeek R1:

OllamaChatModel model = OllamaChatModel.builder()
    .modelName("deepseek-r1")
    .think(true)              // Enable structured thinking
    .returnThinking(true)     // Return thinking in AiMessage
    .build();

Thinking modes:

think(true): LLM thinks and returns thoughts in separate field
think(false): LLM does not think
think(null) (default): Reasoning LLMs prepend thoughts with <think> tags

Mirostat Sampling

Advanced perplexity control:

OllamaChatModel model = OllamaChatModel.builder()
    .mirostat(2)              // 0=disabled, 1=Mirostat, 2=Mirostat 2.0
    .mirostatEta(0.1)         // Learning rate (default: 0.1)
    .mirostatTau(5.0)         // Coherence/diversity balance (default: 5.0)
    .build();

Valid Values:

mirostat: 0, 1, or 2 only
mirostatEta: > 0.0 (typically 0.01 to 1.0)
mirostatTau: > 0.0 (typically 1.0 to 10.0)

Retry Logic

Automatic retry with configurable attempts:

OllamaChatModel model = OllamaChatModel.builder()
    .maxRetries(3)            // Default: 2
    .build();

Note: Retry only applies to non-streaming models; streaming models do not retry

Custom Headers

Static or dynamic HTTP headers:

// Static headers
Map<String, String> headers = Map.of("Authorization", "Bearer token");
OllamaChatModel model = OllamaChatModel.builder()
    .customHeaders(headers)
    .build();

// Dynamic headers (e.g., for token refresh)
Supplier<Map<String, String>> headerSupplier = () ->
    Map.of("Authorization", "Bearer " + getToken());
OllamaChatModel model = OllamaChatModel.builder()
    .customHeaders(headerSupplier)
    .build();

Nullability: Both customHeaders methods accept null (means no custom headers)

Observability

Request/response logging and model listeners:

OllamaChatModel model = OllamaChatModel.builder()
    .logRequests(true)
    .logResponses(true)
    .logger(customLogger)
    .listeners(List.of(chatModelListener))
    .build();

Nullability: logger defaults to SLF4J logger for the class; listeners defaults to empty list

Common Configuration

Connection Settings

builder()
    .baseUrl("http://localhost:11434")           // Ollama server URL (default: http://localhost:11434)
    .timeout(Duration.ofMinutes(5))              // Request timeout (default: no timeout)
    .httpClientBuilder(customHttpClientBuilder)  // Custom HTTP client (default: LangChain4j default)
    .customHeaders(headers)                       // Custom headers (default: none)

Model Parameters

builder()
    .modelName("llama2")           // Model name (required for all models except OllamaModels)
    .temperature(0.7)              // Sampling temperature 0.0-2.0+ (default: model-specific)
    .topP(0.9)                     // Nucleus sampling 0.0-1.0 (default: model-specific)
    .topK(40)                      // Top-K sampling > 0 (default: model-specific)
    .numPredict(512)               // Max output tokens > 0 (default: model-specific)
    .numCtx(2048)                  // Context window size > 0 (default: model-specific)
    .stop(List.of("END"))          // Stop sequences (default: none)
    .seed(42)                      // Random seed (default: random)

Ollama-Specific Parameters

builder()
    .mirostat(2)                   // Mirostat mode: 0, 1, 2 (default: 0)
    .mirostatEta(0.1)             // Mirostat learning rate > 0.0 (default: 0.1)
    .mirostatTau(5.0)             // Mirostat tau > 0.0 (default: 5.0)
    .repeatPenalty(1.1)           // Repetition penalty >= 0.0 (default: 1.0)
    .repeatLastN(64)              // Repeat check window >= 0 (default: 64)
    .minP(0.05)                   // Minimum probability 0.0-1.0 (default: 0.0)
    .think(true)                  // Thinking mode (default: null)
    .returnThinking(true)         // Return thinking text (default: false)

Operational Settings

builder()
    .maxRetries(3)                // Max retry attempts >= 0 (default: 2, N/A for streaming)
    .logRequests(true)            // Log requests (default: false)
    .logResponses(true)           // Log responses (default: false)
    .logger(customLogger)         // Custom logger (default: SLF4J logger)
    .listeners(listeners)         // Chat model listeners (default: empty list)
    .supportedCapabilities(caps)  // Declare capabilities (default: empty set)

API Documentation

Core Classes

OllamaChatModel - Synchronous chat (thread-safe, immutable)
OllamaStreamingChatModel - Streaming chat (thread-safe, immutable)
OllamaLanguageModel - Synchronous completion (thread-safe, immutable)
OllamaStreamingLanguageModel - Streaming completion (thread-safe, immutable)
OllamaEmbeddingModel - Text embeddings (thread-safe, immutable, deterministic)
OllamaModels - Model management (thread-safe, immutable)

Configuration

OllamaChatRequestParameters - Request parameters with Ollama-specific options (immutable)

Types

OllamaModel - Model metadata (mutable)
OllamaModelCard - Detailed model information (mutable)
OllamaModelDetails - Technical model details (mutable)
RunningOllamaModel - Running model instance (mutable)

Service Provider Interface

SPI Interfaces - Factory interfaces for dependency injection and framework integration

Error Handling

Common Exceptions

Model Configuration:

IllegalArgumentException - Invalid parameter values at build time
IllegalStateException - Required parameters (e.g., modelName) not set at build time

Runtime Errors:

HttpTimeoutException - Request timeout exceeded
IOException - Network connectivity issues
RuntimeException - Ollama server errors (wrapped server error responses)

Exception Handling Example

import java.io.IOException;
import java.net.http.HttpTimeoutException;

try {
    ChatResponse response = model.doChat(request);
    // Process response
} catch (HttpTimeoutException e) {
    // Handle timeout - request took too long
    logger.error("Request timed out", e);
} catch (IOException e) {
    // Handle network errors - server unreachable
    logger.error("Network error", e);
} catch (RuntimeException e) {
    // Handle server errors - model not found, invalid request, etc.
    logger.error("Ollama server error", e);
}

References

Default Values

Parameter	Default Value	Description	Valid Range
`baseUrl`	`http://localhost:11434`	Ollama server URL	Valid URL
`maxRetries`	`2`	Maximum retry attempts	`>= 0`
`mirostat`	`0`	Mirostat sampling mode	`0`, `1`, `2`
`mirostatEta`	`0.1`	Mirostat learning rate	`> 0.0`
`mirostatTau`	`5.0`	Mirostat tau parameter	`> 0.0`
`minP`	`0.0`	Minimum probability threshold	`0.0-1.0`
`repeatPenalty`	`1.0`	Repetition penalty	`>= 0.0`
`repeatLastN`	`64`	Repetition check window	`>= 0`
`keepAlive`	`300` (5m)	Model keep-alive duration (seconds)	`>= 0`
`returnThinking`	`false`	Return thinking text in response	`true`/`false`
`logRequests`	`false`	Log outgoing requests	`true`/`false`
`logResponses`	`false`	Log incoming responses	`true`/`false`

Thread Safety

Model Instances: All built model instances (OllamaChatModel, OllamaLanguageModel, OllamaEmbeddingModel, etc.) are immutable and thread-safe after calling build(). Multiple threads can safely share and use the same model instance for concurrent requests.
Builders: Builder instances are not thread-safe. Each thread should use its own builder instance or synchronize access.
Stateless Operations: All model operations are stateless (conversation history must be managed by caller). No shared mutable state between requests.
Connection Pooling: HTTP client reuses connections safely across concurrent requests.

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-ollama

docs

request-parameters.md

tessl/maven-dev-langchain4j--langchain4j-ollama

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Langchain4j-Ollama

Package Information

Core Imports

Basic Usage

Quick Start - Synchronous Chat

Streaming Chat

Text Embeddings

Architecture

Capabilities

1. Chat Models

2. Language Models

3. Embedding Models

4. Model Management

5. Advanced Configuration

6. Type System

Key Features

Thinking/Reasoning Mode

Mirostat Sampling

Retry Logic

Custom Headers

Observability

Common Configuration

Connection Settings

Model Parameters

Ollama-Specific Parameters

Operational Settings

API Documentation

Core Classes

Configuration

Types

Service Provider Interface

Error Handling

Common Exceptions

Exception Handling Example

References

Default Values

Thread Safety

index.mddocs/