tessl/maven-dev-langchain4j--langchain4j-ollama

Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features

Overview

Eval results

Files

Architecture

Name: tessl/maven-dev-langchain4j--langchain4j-ollama
Author: tessl

The langchain4j-ollama module provides a comprehensive integration with Ollama's local LLM API, organized around several key architectural patterns and components.

Core Components

Model Interfaces

The module implements standard LangChain4j interfaces for different model types:

ChatModel - Implemented by OllamaChatModel for synchronous conversational AI
StreamingChatModel - Implemented by OllamaStreamingChatModel for real-time streaming responses
LanguageModel - Implemented by OllamaLanguageModel for stateless text completion
StreamingLanguageModel - Implemented by OllamaStreamingLanguageModel for streaming text generation
DimensionAwareEmbeddingModel - Extended by OllamaEmbeddingModel for vector embeddings

This interface-based design allows seamless integration with the broader LangChain4j ecosystem while providing Ollama-specific capabilities.

Thread Safety: All implemented models are immutable and thread-safe after construction.

Builder Pattern

All model classes use the Builder pattern with fluent configuration:

OllamaChatModel model = OllamaChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .temperature(0.7)
    .build();

Builders provide:

Sensible defaults - Works out-of-the-box with local Ollama installation
Flexible configuration - Extensive parameters for fine-tuning behavior
Type safety - Compile-time validation of configurations
Immutability - Built models are immutable and thread-safe

Thread Safety: Builders are not thread-safe; each thread must use its own builder instance.

Validation: Invalid configurations throw IllegalArgumentException or IllegalStateException at build time.

Base Chat Model

OllamaBaseChatModel serves as the abstract base class for both OllamaChatModel and OllamaStreamingChatModel, providing:

Shared configuration - Common parameters like temperature, topP, numCtx
Parameter management - Default request parameters and overrides
Capability declarations - Supported features (tools, thinking, etc.)
Observability - Listener infrastructure for monitoring

Inheritance Hierarchy:

abstract class OllamaBaseChatModel {
    // Shared configuration and functionality
}

class OllamaChatModel extends OllamaBaseChatModel implements ChatModel {
    // Synchronous implementation with retry logic
}

class OllamaStreamingChatModel extends OllamaBaseChatModel implements StreamingChatModel {
    // Streaming implementation without retry
}

Thread Safety: Abstract base class is stateless; subclasses are immutable and thread-safe.

HTTP Client Abstraction

The module uses LangChain4j's HttpClient interface for communication:

Pluggable implementation - Custom HTTP clients via HttpClientBuilder
Request/response logging - Configurable debug logging
Custom headers - Static or dynamic header injection
Timeout management - Configurable request timeouts (default: no timeout)
Retry logic - Automatic retry with exponential backoff (non-streaming only)

HTTP Configuration:

builder()
    .httpClientBuilder(customHttpClientBuilder)  // Custom client
    .timeout(Duration.ofMinutes(5))              // Request timeout
    .customHeaders(headerSupplier)               // Dynamic headers
    .maxRetries(3)                               // Retry attempts (non-streaming)

Thread Safety: HTTP client implementation is thread-safe with connection pooling.

Error Handling:

HttpTimeoutException - Timeout exceeded
IOException - Network connectivity issues
Retries occur on transient failures (non-streaming only)

Streaming Architecture

Streaming models use Server-Sent Events (SSE) for real-time token delivery:

Client → HTTP Request → Ollama API → SSE Stream → Parser → Handler Callbacks

Components:

OllamaServerSentEventParser - Parses SSE data stream
OllamaStreamingResponseBuilder - Accumulates streaming responses
Handler callbacks - User-provided handlers receive tokens as they arrive

This enables responsive UX for chat applications without blocking.

Threading: SSE parsing occurs on HTTP client thread; handler callbacks execute on same thread

Error Handling: Errors during streaming trigger StreamingResponseHandler.onError()

No Retry: Streaming operations do not retry on failure

Parameter System

Request parameters follow a layered architecture:

DefaultChatRequestParameters - Base LangChain4j parameters (temperature, topP, etc.)
OllamaChatRequestParameters - Extends with Ollama-specific options (mirostat, numCtx, thinking)
Model defaults - Set via builder for all requests
Per-request overrides - Specified in individual requests

Parameters flow through: Model Defaults → Request Defaults → Per-Request Overrides

Example:

// 1. Model defaults
OllamaChatModel model = OllamaChatModel.builder()
    .temperature(0.7)      // Default for all requests
    .numCtx(2048)          // Default context
    .build();

// 2. Per-request override
OllamaChatRequestParameters params = OllamaChatRequestParameters.builder()
    .temperature(0.9)      // Overrides model default
    .build();

ChatRequest request = ChatRequest.builder()
    .parameters(params)    // Apply override
    .build();

Immutability: All parameter objects are immutable; overrides create new instances

Nullability: Null parameter values mean "use default from previous layer"

Model Management

OllamaModels provides administrative operations:

List available models - Query installed models
Model metadata - Retrieve detailed model information via OllamaModelCard
Running models - Check currently loaded models
Model lifecycle - Delete unused models

This enables dynamic model discovery and management without hardcoding model names.

API Operations:

OllamaModels ollamaModels = OllamaModels.builder().build();

// Available models
Response<List<OllamaModel>> models = ollamaModels.availableModels();

// Model details
Response<OllamaModelCard> card = ollamaModels.modelCard("llama2");

// Running models
Response<List<RunningOllamaModel>> running = ollamaModels.runningModels();

// Delete model
ollamaModels.deleteModel("old-model");  // No return value; throws on error

Thread Safety: Immutable and thread-safe; safe for concurrent operations

Error Handling:

RuntimeException - Ollama server errors (model not found, etc.)
IOException - Network issues

Type System

Supporting types provide rich model metadata:

OllamaModel - Basic model info (name, size, digest, modified date)
OllamaModelDetails - Technical details (format, family, parameter size, quantization)
OllamaModelCard - Complete model information (license, template, capabilities)
RunningOllamaModel - Runtime state (VRAM usage, expiration time)

Mutability: Type objects are mutable (have setters); use defensive copying if shared

Nullability: All fields can be null; check before accessing

Service Provider Interface (SPI)

Factory interfaces enable dependency injection and framework integration:

// Factory pattern
OllamaChatModelBuilderFactory → provides → OllamaChatModel.Builder
OllamaEmbeddingModelBuilderFactory → provides → OllamaEmbeddingModel.Builder
// ... (5 factory interfaces total)

This follows the Java ServiceLoader pattern for extensibility.

Usage:

ServiceLoader<OllamaChatModelBuilderFactory> loader =
    ServiceLoader.load(OllamaChatModelBuilderFactory.class);

OllamaChatModelBuilderFactory factory = loader.findFirst()
    .orElseThrow(() -> new IllegalStateException("No factory found"));

OllamaChatModel model = factory.get()
    .modelName("llama2")
    .build();

Thread Safety: Factory instances should be stateless and thread-safe

Design Patterns

1. Builder Pattern

Fluent API for configuring models with sensible defaults and extensive customization options.

Benefits: Type safety, immutability, fluent configuration

2. Strategy Pattern

Different sampling strategies (standard, mirostat) configured via parameters.

Implementation: Parameter objects define strategy; model executes

3. Template Method Pattern

OllamaBaseChatModel defines common structure with doChat() specializations.

Benefits: Code reuse, consistent behavior, specialized implementations

4. Observer Pattern

ChatModelListener interfaces for request/response monitoring.

Benefits: Decoupled observability, extensible monitoring

5. Factory Pattern

SPI factory interfaces for custom model instantiation.

Benefits: Dependency injection, framework integration, testability

Integration Points

LangChain4j Core

The module integrates with LangChain4j through:

Standard model interfaces (ChatModel, LanguageModel, EmbeddingModel)
Shared types (ChatRequest, ChatResponse, Embedding, TextSegment)
Common infrastructure (HttpClient, Response<T>, TokenUsage)

Compatibility: Fully compatible with LangChain4j ecosystem; models are drop-in replacements

Ollama API

Communication with Ollama follows its REST API:

Endpoints:

POST /api/chat - Chat completions (streaming and non-streaming)
POST /api/generate - Text generation (streaming and non-streaming)
POST /api/embeddings - Vector embeddings
GET /api/tags - List models
POST /api/show - Model information
DELETE /api/delete - Remove models
GET /api/ps - Running models

Protocol: HTTP/1.1 with JSON request/response bodies and SSE for streaming

Error Responses: HTTP error codes mapped to exceptions

Advanced Features

Thinking/Reasoning Mode

Support for reasoning models like DeepSeek R1:

think(true) - Model generates structured thinking before answering
returnThinking(true) - Thinking text returned in AiMessage.thinking()
Streaming support - Thinking tokens streamed via onPartialThinking() callback

API:

OllamaChatModel model = OllamaChatModel.builder()
    .think(true)              // Enable thinking
    .returnThinking(true)     // Return thinking text
    .build();

ChatResponse response = model.doChat(request);
String thinking = response.aiMessage().thinking();  // May be null

Nullability: thinking() returns null if thinking not enabled or not available

Mirostat Sampling

Advanced perplexity control for consistent output quality:

mirostat(2) - Enable Mirostat 2.0 algorithm
mirostatEta - Learning rate for dynamic adjustment (0.0-1.0)
mirostatTau - Target perplexity / coherence vs diversity (> 0.0)

Valid Values:

mirostat: 0 (disabled), 1 (Mirostat), 2 (Mirostat 2.0)
mirostatEta: > 0.0 (typically 0.01-1.0)
mirostatTau: > 0.0 (typically 1.0-10.0)

Tool/Function Calling

Integration with LangChain4j's tool system:

Tool specifications - Define available functions
Tool choice - Control which tools to use
Tool execution - Automatic function call handling

Support: Requires Ollama model with tool support capability

Thread Safety

Immutable models - Built model instances are thread-safe
Stateless operations - No shared mutable state between requests
Concurrent requests - Multiple threads can safely share model instances
Builder isolation - Each builder is independent (not thread-safe itself)
HTTP client - Connection pooling is thread-safe
Listeners - Listener callbacks may be invoked concurrently; ensure thread-safe implementations

Guarantees:

Same model instance can be used from multiple threads
Concurrent requests to same model are safe
Request ordering not guaranteed (concurrent execution)
Conversation history must be managed by caller (stateless models)

Error Handling

Build-Time Validation

Exceptions at build():

IllegalArgumentException - Invalid parameter values
IllegalStateException - Required parameters missing (e.g., modelName)

Example:

try {
    OllamaChatModel model = OllamaChatModel.builder()
        .build();  // Missing modelName
} catch (IllegalStateException e) {
    // Handle missing required parameter
}

Runtime Errors

Exceptions during requests:

HttpTimeoutException - Request timeout exceeded
IOException - Network connectivity issues
RuntimeException - Ollama server errors (model not found, invalid request)

Retry Logic:

Non-streaming: Automatic retry with exponential backoff (configurable via maxRetries)
Streaming: No automatic retry; errors trigger onError() callback

Example:

import java.io.IOException;
import java.net.http.HttpTimeoutException;

try {
    ChatResponse response = model.doChat(request);
} catch (HttpTimeoutException e) {
    // Timeout - request took too long
} catch (IOException e) {
    // Network error - server unreachable
} catch (RuntimeException e) {
    // Server error - model not found, etc.
}

Performance Considerations

Model keep-alive - Models stay loaded in Ollama for fast subsequent requests (configurable via keepAlive)
Batch embedding - embedAll() sends multiple texts in single request (more efficient than individual embed() calls)
Streaming - Reduces perceived latency for long generations (first token arrives faster)
Connection pooling - HTTP client reuses connections across requests (reduces connection overhead)
Context window - Larger numCtx uses more memory and is slower; choose minimum required size

Optimization Tips:

// Keep models loaded longer for frequent use
.keepAlive(600)  // 10 minutes

// Batch embeddings
embedModel.embedAll(segments);  // Single request

// Smaller context for faster inference
.numCtx(2048)  // Instead of 4096 if sufficient

Extension Points

Custom HTTP clients - Provide HttpClientBuilder for special networking needs (proxies, authentication, etc.)
Custom headers - Static or dynamic header injection via suppliers (token refresh, authentication)
SPI factories - Override default builder creation for DI frameworks
Model listeners - Monitor all requests and responses for logging/metrics
Custom loggers - Plug in application-specific logging (SLF4J compatible)

Example:

// Custom header supplier for token refresh
Supplier<Map<String, String>> headers = () ->
    Map.of("Authorization", "Bearer " + tokenManager.getToken());

OllamaChatModel model = OllamaChatModel.builder()
    .customHeaders(headers)
    .build();

This architecture enables both simple usage for basic cases and extensive customization for advanced scenarios.

tessl/maven-dev-langchain4j--langchain4j-ollama

architecture.mddocs/

Architecture

Core Components

Model Interfaces

Builder Pattern

Base Chat Model

HTTP Client Abstraction

Streaming Architecture

Parameter System

Model Management

Type System

Service Provider Interface (SPI)

Design Patterns

1. Builder Pattern

2. Strategy Pattern

3. Template Method Pattern

4. Observer Pattern

5. Factory Pattern

Integration Points

LangChain4j Core

Ollama API

Advanced Features

Thinking/Reasoning Mode

Mirostat Sampling

Tool/Function Calling

Thread Safety

Error Handling

Build-Time Validation

Runtime Errors

Performance Considerations

Extension Points

See Also

tessl/maven-dev-langchain4j--langchain4j-ollama

architecture.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Architecture

Core Components

Model Interfaces

Builder Pattern

Base Chat Model

HTTP Client Abstraction

Streaming Architecture

Parameter System

Model Management

Type System

Service Provider Interface (SPI)

Design Patterns

1. Builder Pattern

2. Strategy Pattern

3. Template Method Pattern

4. Observer Pattern

5. Factory Pattern

Integration Points

LangChain4j Core

Ollama API

Advanced Features

Thinking/Reasoning Mode

Mirostat Sampling

Tool/Function Calling

Thread Safety

Error Handling

Build-Time Validation

Runtime Errors

Performance Considerations

Extension Points

See Also

architecture.mddocs/