CtrlK

Community Documentation Log in Get started

tessl/maven-org-springframework-ai--spring-ai-ollama

Spring Boot-compatible Ollama integration providing ChatModel and EmbeddingModel implementations for running large language models locally with support for streaming, tool calling, model management, and observability.

Overview

Eval results

Files

OllamaChatOptions

Name: tessl/maven-org-springframework-ai--spring-ai-ollama
Author: tessl

Configuration options for Ollama chat model operations.

Overview

OllamaChatOptions provides comprehensive configuration for chat model behavior, including model selection, generation parameters, GPU/memory management, sampling control, and tool calling capabilities.

Class Information

package org.springframework.ai.ollama.api;

public class OllamaChatOptions implements ToolCallingChatOptions

Implements: org.springframework.ai.model.tool.ToolCallingChatOptions

Creating Options

Basic Creation

// Using builder
OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAMA3.id())
    .temperature(0.7)
    .build();

// Copy from existing options
OllamaChatOptions copy = OllamaChatOptions.fromOptions(existingOptions);

Builder Method Overloads

The builder provides overloaded methods for convenient configuration.

Model Selection:

// Accepts String model name
public Builder model(String model);

// Accepts OllamaModel enum
public Builder model(OllamaModel model);

// Using String
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("llama3")
    .build();

// Using enum (recommended)
OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.MISTRAL)
    .build();

Tool Callbacks:

// Accepts List
public Builder toolCallbacks(List<ToolCallback> toolCallbacks);

// Accepts varargs
public Builder toolCallbacks(ToolCallback... toolCallbacks);

// Using List
.toolCallbacks(List.of(callback1, callback2))

// Using varargs
.toolCallbacks(callback1, callback2, callback3)

Tool Names:

// Accepts Set
public Builder toolNames(Set<String> toolNames);

// Accepts varargs
public Builder toolNames(String... toolNames);

// Using Set
.toolNames(Set.of("getTool1", "getTool2"))

// Using varargs
.toolNames("getTool1", "getTool2")

Configuration Categories

Model Selection

Controls which model to use and response format.

OllamaChatOptions options = OllamaChatOptions.builder()
    // Model name (required)
    .model("llama3")

    // Response format: "json" or JSON Schema Map
    .format("json")

    // How long to keep model in memory (e.g., "5m", "1h")
    .keepAlive("10m")

    // Truncate inputs to fit context length
    .truncate(true)
    .build();

Parameters:

model (String): Model name from Ollama library
format (Object): Response format - String "json" or Map containing JSON Schema
keepAlive (String): Duration in Go format (e.g., "5m", "30s", "1h")
truncate (Boolean): Auto-truncate to context length (default: true)

Generation Parameters

Control text generation behavior.

OllamaChatOptions options = OllamaChatOptions.builder()
    // Sampling temperature (0.0 - 2.0)
    .temperature(0.8)

    // Maximum tokens to generate
    .numPredict(256)

    // Random seed for reproducibility
    .seed(42)

    // Top-k sampling
    .topK(40)

    // Top-p (nucleus) sampling
    .topP(0.9)

    // Minimum probability threshold
    .minP(0.05)

    // Repetition penalties
    .repeatPenalty(1.1)
    .presencePenalty(0.0)
    .frequencyPenalty(0.0)

    // Stop sequences
    .stop(List.of("Human:", "Assistant:"))
    .build();

Key Parameters:

temperature (Double): Creativity control (default: 0.8)
- Lower (0.0-0.5): More focused and deterministic
- Higher (0.8-2.0): More creative and diverse
numPredict (Integer): Max tokens (default: 128, -1=infinite, -2=fill context)
seed (Integer): Random seed (default: -1 for random)
topK (Integer): Consider top K tokens (default: 40)
topP (Double): Nucleus sampling threshold (default: 0.9)
minP (Double): Minimum probability relative to top token (default: 0.0)
repeatPenalty (Double): Penalize repetitions (default: 1.1)
presencePenalty (Double): Presence penalty (default: 0.0)
frequencyPenalty (Double): Frequency penalty (default: 0.0)
stop (List<String>): Stop sequences

Advanced Sampling

Fine-grained control over token sampling.

OllamaChatOptions options = OllamaChatOptions.builder()
    // Tail-free sampling
    .tfsZ(1.0)

    // Typical sampling
    .typicalP(1.0)

    // Repetition context window
    .repeatLastN(64)

    // Mirostat sampling (0=disabled, 1=Mirostat, 2=Mirostat 2.0)
    .mirostat(0)
    .mirostatTau(5.0f)
    .mirostatEta(0.1f)

    // Penalize newlines in output
    .penalizeNewline(true)

    // Number of tokens to keep from prompt
    .numKeep(4)
    .build();

Advanced Parameters:

tfsZ (Float): Tail-free sampling (default: 1.0, disabled)
typicalP (Float): Typical sampling (default: 1.0)
repeatLastN (Integer): Look-back window for penalties (default: 64, 0=disabled, -1=numCtx)
mirostat (Integer): Mirostat mode (0/1/2)
mirostatTau (Float): Target entropy (default: 5.0)
mirostatEta (Float): Learning rate (default: 0.1)
penalizeNewline (Boolean): Penalize newlines (default: true)
numKeep (Integer): Tokens to keep from prompt (default: 4)

Memory and GPU Management

Configure hardware resource usage.

OllamaChatOptions options = OllamaChatOptions.builder()
    // Context window size
    .numCtx(4096)

    // Batch size for prompt processing
    .numBatch(512)

    // GPU layers (-1 = auto, 0 = CPU only)
    .numGPU(-1)

    // Main GPU for multi-GPU setups
    .mainGPU(0)

    // Low VRAM mode
    .lowVRAM(false)

    // FP16 for KV cache
    .f16KV(true)

    // Return logits for all tokens
    .logitsAll(false)

    // Load only vocabulary
    .vocabOnly(false)

    // Memory mapping
    .useMMap(true)
    .useMLock(false)

    // NUMA support
    .useNUMA(false)

    // Thread count (default: auto-detect)
    .numThread(8)
    .build();

Hardware Parameters:

numCtx (Integer): Context window tokens (default: 2048)
numBatch (Integer): Prompt batch size (default: 512)
numGPU (Integer): GPU layers (default: -1 auto, 0=CPU)
mainGPU (Integer): Primary GPU index (default: 0)
lowVRAM (Boolean): Low VRAM mode (default: false)
f16KV (Boolean): Use FP16 for KV cache (default: true)
logitsAll (Boolean): Return logits for all tokens, not just the last one. Required for completions to return logprobs (default: not set/null)
vocabOnly (Boolean): Load only the vocabulary, not the weights (default: not set/null)
useMMap (Boolean): Memory-map model (default: null)
useMLock (Boolean): Lock model in RAM (default: false)
useNUMA (Boolean): Enable NUMA (default: false)
numThread (Integer): CPU threads (default: auto)

Thinking/Reasoning Models

Enable thinking mode for reasoning models.

// Boolean enable/disable (Qwen 3, DeepSeek-v3.1, DeepSeek R1)
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .enableThinking()  // Enable reasoning traces
    .build();

// Disable thinking explicitly
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .disableThinking()
    .build();

// String levels (GPT-OSS model)
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("gpt-oss")
    .thinkHigh()  // or .thinkLow(), .thinkMedium()
    .build();

Thinking Methods:

enableThinking(): Enable reasoning (returns ThinkOption.ThinkBoolean.ENABLED)
disableThinking(): Disable reasoning
thinkLow(): Low thinking level (GPT-OSS)
thinkMedium(): Medium thinking level (GPT-OSS)
thinkHigh(): High thinking level (GPT-OSS)
thinkOption(ThinkOption): Set custom think option

See thinking.md for detailed usage.

Tool/Function Calling

Configure tools that the model can use.

OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAMA3)

    // Register tool callbacks
    .toolCallbacks(List.of(
        FunctionToolCallback.builder("getWeather", weatherService)
            .description("Get weather for a location")
            .inputType(WeatherRequest.class)
            .build()
    ))

    // Specify which tools to enable
    .toolNames("getWeather", "getTime")

    // Enable internal tool execution
    .internalToolExecutionEnabled(true)

    // Tool context (shared data)
    .toolContext(Map.of("apiKey", "xyz123"))
    .build();

Tool Parameters:

toolCallbacks (List<ToolCallback>): Tool implementations
toolNames (Set<String>): Enabled tool names
internalToolExecutionEnabled (Boolean): Auto-execute tools
toolContext (Map<String, Object>): Shared context data

See tool-calling.md for detailed usage.

Usage Examples

Basic Chat

OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAMA3.id())
    .temperature(0.7)
    .numPredict(512)
    .build();

OllamaChatModel chatModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(options)
    .build();

ChatResponse response = chatModel.call(new Prompt("Hello!"));

JSON Output Format

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("llama3")
    .format("json")
    .build();

String prompt = "List 3 colors as JSON array with 'name' and 'hex' fields";
ChatResponse response = chatModel.call(new Prompt(prompt, options));
// Response will be valid JSON

High-Performance Configuration

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("llama3")
    .numCtx(8192)          // Large context window
    .numBatch(1024)        // Large batch size
    .numGPU(-1)            // Use all GPU layers
    .useMLock(true)        // Lock in RAM for speed
    .numThread(16)         // Use 16 CPU threads
    .keepAlive("30m")      // Keep model loaded longer
    .build();

Per-Request Override

// Default options
OllamaChatOptions defaultOptions = OllamaChatOptions.builder()
    .model("llama3")
    .temperature(0.7)
    .build();

// Override for specific request
OllamaChatOptions requestOptions = OllamaChatOptions.builder()
    .temperature(0.2)      // More deterministic for this request
    .numPredict(100)
    .build();

ChatResponse response = chatModel.call(
    new Prompt("Summarize this text...", requestOptions)
);

Utility Methods

OllamaChatOptions provides several static and instance utility methods for working with options.

Static Methods

// Filter non-supported fields from options map
public static Map<String, Object> filterNonSupportedFields(Map<String, Object> options);

// Create from existing options (deep copy)
public static OllamaChatOptions fromOptions(OllamaChatOptions options);

Instance Methods

// Convert options to Map for API requests
public Map<String, Object> toMap();

// Create a copy of these options
public OllamaChatOptions copy();

Convert to Map

OllamaChatOptions options = OllamaChatOptions.builder()
    .temperature(0.8)
    .topP(0.9)
    .build();

Map<String, Object> optionsMap = options.toMap();
// Use in API requests or serialization

Copy Options

OllamaChatOptions original = OllamaChatOptions.builder()
    .model("llama3")
    .temperature(0.7)
    .build();

// Create a copy (instance method)
OllamaChatOptions copy = original.copy();

// Or use static fromOptions method
OllamaChatOptions copy2 = OllamaChatOptions.fromOptions(original);

// Modify the copy
copy.setTemperature(0.9);

Filter Non-Supported Fields

Removes fields that are not part of the Ollama options API but are managed separately in the request (model, format, keep_alive, truncate).

Map<String, Object> allOptions = Map.of(
    "temperature", 0.8,
    "model", "llama3",       // Non-supported - part of request
    "format", "json",        // Non-supported - part of request
    "keep_alive", "5m",      // Non-supported - part of request
    "truncate", true,        // Non-supported - part of request
    "top_p", 0.9
);

// Remove fields that aren't part of Ollama options API
Map<String, Object> filtered = OllamaChatOptions.filterNonSupportedFields(allOptions);
// Returns only: {"temperature": 0.8, "top_p": 0.9}

Non-Supported Fields: The following fields are filtered out because they're part of the ChatRequest, not the options:

model
format
keep_alive
truncate

Default Values

The following defaults are used when options are not explicitly set:

numCtx: 2048
numBatch: 512
numGPU: -1 (auto)
mainGPU: 0
lowVRAM: false
f16KV: true
numKeep: 4
seed: -1
numPredict: 128
topK: 40
topP: 0.9
minP: 0.0
temperature: 0.8
repeatPenalty: 1.1
presencePenalty: 0.0
frequencyPenalty: 0.0
mirostat: 0
mirostatTau: 5.0
mirostatEta: 0.1
penalizeNewline: true
truncate: true

Best Practices

Temperature Selection

// Factual, deterministic responses
.temperature(0.1)

// Balanced (default)
.temperature(0.8)

// Creative writing
.temperature(1.2)

Context Window Management

// Small context for simple queries (faster, less memory)
.numCtx(2048)

// Large context for long conversations or documents
.numCtx(8192)

// Match context to model capabilities
.numCtx(32768)  // For models that support it

GPU Optimization

// CPU-only execution
.numGPU(0)

// Auto-detect optimal GPU layers
.numGPU(-1)

// Manual GPU allocation for specific models
.numGPU(35)  // Load specific number of layers to GPU

Notes

Not all options are supported by all models
Options in OllamaChatOptions are separate from request-level options in ChatRequest
Some fields (model, format, keepAlive, truncate) are "synthetic" - they're part of the request but managed through options for convenience
Tool-related options (toolCallbacks, toolNames, etc.) are inherited from ToolCallingChatOptions
Thinking models (e.g., qwen3:4b-thinking) auto-enable thinking by default in Ollama 0.12+

tessl i tessl/maven-org-springframework-ai--spring-ai-ollama@1.1.1

docs

examples

guides

reference

tessl/maven-org-springframework-ai--spring-ai-ollama

chat-options.mddocs/reference/

OllamaChatOptions

Overview

Class Information

Creating Options

Basic Creation

Builder Method Overloads

Configuration Categories

Model Selection

Generation Parameters

Advanced Sampling

Memory and GPU Management

Thinking/Reasoning Models

Tool/Function Calling

Usage Examples

Basic Chat

JSON Output Format

High-Performance Configuration

Per-Request Override

Utility Methods

Static Methods

Instance Methods

Convert to Map

Copy Options

Filter Non-Supported Fields

Default Values

Best Practices

Temperature Selection

Context Window Management

GPU Optimization

Related Documentation

Notes

tessl/maven-org-springframework-ai--spring-ai-ollama

chat-options.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/reference/

OllamaChatOptions

Overview

Class Information

Creating Options

Basic Creation

Builder Method Overloads

Configuration Categories

Model Selection

Generation Parameters

Advanced Sampling

Memory and GPU Management

Thinking/Reasoning Models

Tool/Function Calling

Usage Examples

Basic Chat

JSON Output Format

High-Performance Configuration

Per-Request Override

Utility Methods

Static Methods

Instance Methods

Convert to Map

Copy Options

Filter Non-Supported Fields

Default Values

Best Practices

Temperature Selection

Context Window Management

GPU Optimization

Related Documentation

Notes

chat-options.mddocs/reference/