CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-ollama

Java integration library enabling LangChain4j applications to use Ollama's local language models with support for chat, streaming, embeddings, and advanced reasoning features

Overview
Eval results
Files

architecture.mddocs/

Architecture

The langchain4j-ollama module provides a comprehensive integration with Ollama's local LLM API, organized around several key architectural patterns and components.

Core Components

Model Interfaces

The module implements standard LangChain4j interfaces for different model types:

  • ChatModel - Implemented by OllamaChatModel for synchronous conversational AI
  • StreamingChatModel - Implemented by OllamaStreamingChatModel for real-time streaming responses
  • LanguageModel - Implemented by OllamaLanguageModel for stateless text completion
  • StreamingLanguageModel - Implemented by OllamaStreamingLanguageModel for streaming text generation
  • DimensionAwareEmbeddingModel - Extended by OllamaEmbeddingModel for vector embeddings

This interface-based design allows seamless integration with the broader LangChain4j ecosystem while providing Ollama-specific capabilities.

Thread Safety: All implemented models are immutable and thread-safe after construction.

Builder Pattern

All model classes use the Builder pattern with fluent configuration:

OllamaChatModel model = OllamaChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama2")
    .temperature(0.7)
    .build();

Builders provide:

  • Sensible defaults - Works out-of-the-box with local Ollama installation
  • Flexible configuration - Extensive parameters for fine-tuning behavior
  • Type safety - Compile-time validation of configurations
  • Immutability - Built models are immutable and thread-safe

Thread Safety: Builders are not thread-safe; each thread must use its own builder instance.

Validation: Invalid configurations throw IllegalArgumentException or IllegalStateException at build time.

Base Chat Model

OllamaBaseChatModel serves as the abstract base class for both OllamaChatModel and OllamaStreamingChatModel, providing:

  • Shared configuration - Common parameters like temperature, topP, numCtx
  • Parameter management - Default request parameters and overrides
  • Capability declarations - Supported features (tools, thinking, etc.)
  • Observability - Listener infrastructure for monitoring

Inheritance Hierarchy:

abstract class OllamaBaseChatModel {
    // Shared configuration and functionality
}

class OllamaChatModel extends OllamaBaseChatModel implements ChatModel {
    // Synchronous implementation with retry logic
}

class OllamaStreamingChatModel extends OllamaBaseChatModel implements StreamingChatModel {
    // Streaming implementation without retry
}

Thread Safety: Abstract base class is stateless; subclasses are immutable and thread-safe.

HTTP Client Abstraction

The module uses LangChain4j's HttpClient interface for communication:

  • Pluggable implementation - Custom HTTP clients via HttpClientBuilder
  • Request/response logging - Configurable debug logging
  • Custom headers - Static or dynamic header injection
  • Timeout management - Configurable request timeouts (default: no timeout)
  • Retry logic - Automatic retry with exponential backoff (non-streaming only)

HTTP Configuration:

builder()
    .httpClientBuilder(customHttpClientBuilder)  // Custom client
    .timeout(Duration.ofMinutes(5))              // Request timeout
    .customHeaders(headerSupplier)               // Dynamic headers
    .maxRetries(3)                               // Retry attempts (non-streaming)

Thread Safety: HTTP client implementation is thread-safe with connection pooling.

Error Handling:

  • HttpTimeoutException - Timeout exceeded
  • IOException - Network connectivity issues
  • Retries occur on transient failures (non-streaming only)

Streaming Architecture

Streaming models use Server-Sent Events (SSE) for real-time token delivery:

Client → HTTP Request → Ollama API → SSE Stream → Parser → Handler Callbacks

Components:

  • OllamaServerSentEventParser - Parses SSE data stream
  • OllamaStreamingResponseBuilder - Accumulates streaming responses
  • Handler callbacks - User-provided handlers receive tokens as they arrive

This enables responsive UX for chat applications without blocking.

Threading: SSE parsing occurs on HTTP client thread; handler callbacks execute on same thread

Error Handling: Errors during streaming trigger StreamingResponseHandler.onError()

No Retry: Streaming operations do not retry on failure

Parameter System

Request parameters follow a layered architecture:

  1. DefaultChatRequestParameters - Base LangChain4j parameters (temperature, topP, etc.)
  2. OllamaChatRequestParameters - Extends with Ollama-specific options (mirostat, numCtx, thinking)
  3. Model defaults - Set via builder for all requests
  4. Per-request overrides - Specified in individual requests

Parameters flow through: Model Defaults → Request Defaults → Per-Request Overrides

Example:

// 1. Model defaults
OllamaChatModel model = OllamaChatModel.builder()
    .temperature(0.7)      // Default for all requests
    .numCtx(2048)          // Default context
    .build();

// 2. Per-request override
OllamaChatRequestParameters params = OllamaChatRequestParameters.builder()
    .temperature(0.9)      // Overrides model default
    .build();

ChatRequest request = ChatRequest.builder()
    .parameters(params)    // Apply override
    .build();

Immutability: All parameter objects are immutable; overrides create new instances

Nullability: Null parameter values mean "use default from previous layer"

Model Management

OllamaModels provides administrative operations:

  • List available models - Query installed models
  • Model metadata - Retrieve detailed model information via OllamaModelCard
  • Running models - Check currently loaded models
  • Model lifecycle - Delete unused models

This enables dynamic model discovery and management without hardcoding model names.

API Operations:

OllamaModels ollamaModels = OllamaModels.builder().build();

// Available models
Response<List<OllamaModel>> models = ollamaModels.availableModels();

// Model details
Response<OllamaModelCard> card = ollamaModels.modelCard("llama2");

// Running models
Response<List<RunningOllamaModel>> running = ollamaModels.runningModels();

// Delete model
ollamaModels.deleteModel("old-model");  // No return value; throws on error

Thread Safety: Immutable and thread-safe; safe for concurrent operations

Error Handling:

  • RuntimeException - Ollama server errors (model not found, etc.)
  • IOException - Network issues

Type System

Supporting types provide rich model metadata:

  • OllamaModel - Basic model info (name, size, digest, modified date)
  • OllamaModelDetails - Technical details (format, family, parameter size, quantization)
  • OllamaModelCard - Complete model information (license, template, capabilities)
  • RunningOllamaModel - Runtime state (VRAM usage, expiration time)

Mutability: Type objects are mutable (have setters); use defensive copying if shared

Nullability: All fields can be null; check before accessing

Service Provider Interface (SPI)

Factory interfaces enable dependency injection and framework integration:

// Factory pattern
OllamaChatModelBuilderFactory → provides → OllamaChatModel.Builder
OllamaEmbeddingModelBuilderFactory → provides → OllamaEmbeddingModel.Builder
// ... (5 factory interfaces total)

This follows the Java ServiceLoader pattern for extensibility.

Usage:

ServiceLoader<OllamaChatModelBuilderFactory> loader =
    ServiceLoader.load(OllamaChatModelBuilderFactory.class);

OllamaChatModelBuilderFactory factory = loader.findFirst()
    .orElseThrow(() -> new IllegalStateException("No factory found"));

OllamaChatModel model = factory.get()
    .modelName("llama2")
    .build();

Thread Safety: Factory instances should be stateless and thread-safe

Design Patterns

1. Builder Pattern

Fluent API for configuring models with sensible defaults and extensive customization options.

Benefits: Type safety, immutability, fluent configuration

2. Strategy Pattern

Different sampling strategies (standard, mirostat) configured via parameters.

Implementation: Parameter objects define strategy; model executes

3. Template Method Pattern

OllamaBaseChatModel defines common structure with doChat() specializations.

Benefits: Code reuse, consistent behavior, specialized implementations

4. Observer Pattern

ChatModelListener interfaces for request/response monitoring.

Benefits: Decoupled observability, extensible monitoring

5. Factory Pattern

SPI factory interfaces for custom model instantiation.

Benefits: Dependency injection, framework integration, testability

Integration Points

LangChain4j Core

The module integrates with LangChain4j through:

  • Standard model interfaces (ChatModel, LanguageModel, EmbeddingModel)
  • Shared types (ChatRequest, ChatResponse, Embedding, TextSegment)
  • Common infrastructure (HttpClient, Response<T>, TokenUsage)

Compatibility: Fully compatible with LangChain4j ecosystem; models are drop-in replacements

Ollama API

Communication with Ollama follows its REST API:

Endpoints:

  • POST /api/chat - Chat completions (streaming and non-streaming)
  • POST /api/generate - Text generation (streaming and non-streaming)
  • POST /api/embeddings - Vector embeddings
  • GET /api/tags - List models
  • POST /api/show - Model information
  • DELETE /api/delete - Remove models
  • GET /api/ps - Running models

Protocol: HTTP/1.1 with JSON request/response bodies and SSE for streaming

Error Responses: HTTP error codes mapped to exceptions

Advanced Features

Thinking/Reasoning Mode

Support for reasoning models like DeepSeek R1:

  • think(true) - Model generates structured thinking before answering
  • returnThinking(true) - Thinking text returned in AiMessage.thinking()
  • Streaming support - Thinking tokens streamed via onPartialThinking() callback

API:

OllamaChatModel model = OllamaChatModel.builder()
    .think(true)              // Enable thinking
    .returnThinking(true)     // Return thinking text
    .build();

ChatResponse response = model.doChat(request);
String thinking = response.aiMessage().thinking();  // May be null

Nullability: thinking() returns null if thinking not enabled or not available

Mirostat Sampling

Advanced perplexity control for consistent output quality:

  • mirostat(2) - Enable Mirostat 2.0 algorithm
  • mirostatEta - Learning rate for dynamic adjustment (0.0-1.0)
  • mirostatTau - Target perplexity / coherence vs diversity (> 0.0)

Valid Values:

  • mirostat: 0 (disabled), 1 (Mirostat), 2 (Mirostat 2.0)
  • mirostatEta: > 0.0 (typically 0.01-1.0)
  • mirostatTau: > 0.0 (typically 1.0-10.0)

Tool/Function Calling

Integration with LangChain4j's tool system:

  • Tool specifications - Define available functions
  • Tool choice - Control which tools to use
  • Tool execution - Automatic function call handling

Support: Requires Ollama model with tool support capability

Thread Safety

  • Immutable models - Built model instances are thread-safe
  • Stateless operations - No shared mutable state between requests
  • Concurrent requests - Multiple threads can safely share model instances
  • Builder isolation - Each builder is independent (not thread-safe itself)
  • HTTP client - Connection pooling is thread-safe
  • Listeners - Listener callbacks may be invoked concurrently; ensure thread-safe implementations

Guarantees:

  • Same model instance can be used from multiple threads
  • Concurrent requests to same model are safe
  • Request ordering not guaranteed (concurrent execution)
  • Conversation history must be managed by caller (stateless models)

Error Handling

Build-Time Validation

Exceptions at build():

  • IllegalArgumentException - Invalid parameter values
  • IllegalStateException - Required parameters missing (e.g., modelName)

Example:

try {
    OllamaChatModel model = OllamaChatModel.builder()
        .build();  // Missing modelName
} catch (IllegalStateException e) {
    // Handle missing required parameter
}

Runtime Errors

Exceptions during requests:

  • HttpTimeoutException - Request timeout exceeded
  • IOException - Network connectivity issues
  • RuntimeException - Ollama server errors (model not found, invalid request)

Retry Logic:

  • Non-streaming: Automatic retry with exponential backoff (configurable via maxRetries)
  • Streaming: No automatic retry; errors trigger onError() callback

Example:

import java.io.IOException;
import java.net.http.HttpTimeoutException;

try {
    ChatResponse response = model.doChat(request);
} catch (HttpTimeoutException e) {
    // Timeout - request took too long
} catch (IOException e) {
    // Network error - server unreachable
} catch (RuntimeException e) {
    // Server error - model not found, etc.
}

Performance Considerations

  • Model keep-alive - Models stay loaded in Ollama for fast subsequent requests (configurable via keepAlive)
  • Batch embedding - embedAll() sends multiple texts in single request (more efficient than individual embed() calls)
  • Streaming - Reduces perceived latency for long generations (first token arrives faster)
  • Connection pooling - HTTP client reuses connections across requests (reduces connection overhead)
  • Context window - Larger numCtx uses more memory and is slower; choose minimum required size

Optimization Tips:

// Keep models loaded longer for frequent use
.keepAlive(600)  // 10 minutes

// Batch embeddings
embedModel.embedAll(segments);  // Single request

// Smaller context for faster inference
.numCtx(2048)  // Instead of 4096 if sufficient

Extension Points

  1. Custom HTTP clients - Provide HttpClientBuilder for special networking needs (proxies, authentication, etc.)
  2. Custom headers - Static or dynamic header injection via suppliers (token refresh, authentication)
  3. SPI factories - Override default builder creation for DI frameworks
  4. Model listeners - Monitor all requests and responses for logging/metrics
  5. Custom loggers - Plug in application-specific logging (SLF4J compatible)

Example:

// Custom header supplier for token refresh
Supplier<Map<String, String>> headers = () ->
    Map.of("Authorization", "Bearer " + tokenManager.getToken());

OllamaChatModel model = OllamaChatModel.builder()
    .customHeaders(headers)
    .build();

This architecture enables both simple usage for basic cases and extensive customization for advanced scenarios.

See Also

  • Chat Models - Chat model implementations
  • Language Models - Language model implementations
  • Embedding Model - Embedding model implementation
  • Model Management - Model management utilities
  • Request Parameters - Parameter configuration
  • SPI - Service Provider Interface for extensibility

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-ollama@1.11.0

docs

architecture.md

chat-models.md

embedding-model.md

index.md

language-models.md

model-management.md

request-parameters.md

spi.md

types.md

README.md

tile.json