CtrlK

Community Documentation Log in Get started

tessl/maven-org-springframework-ai--spring-ai-ollama

Spring Boot-compatible Ollama integration providing ChatModel and EmbeddingModel implementations for running large language models locally with support for streaming, tool calling, model management, and observability.

Overview

Eval results

Files

Thinking and Reasoning Models

Name: tessl/maven-org-springframework-ai--spring-ai-ollama
Author: tessl

Enable models to show their reasoning process before providing answers.

Overview

Thinking/reasoning models (like Qwen3, DeepSeek, GPT-OSS) can emit their internal reasoning traces before generating a final answer. This improves answer quality for complex problems and provides transparency into the model's thought process.

Supported Models

Thinking capabilities are available in specific models:

Qwen 3 - qwen3:4b-thinking (auto-enables thinking in Ollama 0.12+)
DeepSeek R1 - deepseek-r1
DeepSeek v3.1 - deepseek-v3.1
GPT-OSS - gpt-oss (requires level: low/medium/high)
QwQ - qwq (Qwen reasoning model)

ThinkOption Interface

Controls thinking behavior with two implementations.

Class Information

package org.springframework.ai.ollama.api;

public sealed interface ThinkOption {
    Object toJsonValue();
}

ThinkOption.ThinkBoolean

Boolean enable/disable for most thinking models.

public record ThinkBoolean(boolean enabled) implements ThinkOption

Constants:

ThinkBoolean.ENABLED - Enable thinking
ThinkBoolean.DISABLED - Disable thinking

Supported by:

Qwen 3 models
DeepSeek R1
DeepSeek v3.1

ThinkOption.ThinkLevel

String-level control for GPT-OSS model.

public record ThinkLevel(String level) implements ThinkOption

Constants:

ThinkLevel.LOW - Low thinking intensity
ThinkLevel.MEDIUM - Medium thinking intensity
ThinkLevel.HIGH - High thinking intensity

Supported by:

GPT-OSS model only

Valid values: "low", "medium", "high"

Basic Usage

Enabling Thinking (Boolean)

For Qwen 3, DeepSeek models:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .enableThinking()  // Enable reasoning traces
    .build();

ChatResponse response = chatModel.call(
    new Prompt("Solve: If a car travels 60 miles in 90 minutes, what is its speed in mph?", options)
);

// Response includes thinking process

Disabling Thinking

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .disableThinking()  // Disable for this request
    .build();

Setting Thinking Level (GPT-OSS)

For GPT-OSS model:

// Low intensity
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("gpt-oss")
    .thinkLow()
    .build();

// Medium intensity
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("gpt-oss")
    .thinkMedium()
    .build();

// High intensity
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("gpt-oss")
    .thinkHigh()
    .build();

Accessing Thinking Output

From ChatResponse

The thinking process is included in the response message:

ChatResponse response = chatModel.call(new Prompt(complexQuestion, options));

// Get the main response
String answer = response.getResult().getOutput().getText();

// Access thinking trace (if available)
AssistantMessage message = response.getResult().getOutput();
String thinkingTrace = message.getMetadata().get("thinking");

if (thinkingTrace != null) {
    System.out.println("Model's reasoning:");
    System.out.println(thinkingTrace);
    System.out.println("\nFinal answer:");
    System.out.println(answer);
}

From Low-Level API

Using OllamaApi directly:

ChatRequest request = ChatRequest.builder("qwen3:4b-thinking")
    .messages(List.of(
        Message.builder(Role.USER)
            .content("What is 15% of 240?")
            .build()
    ))
    .enableThinking()
    .build();

ChatResponse response = ollamaApi.chat(request);

// Thinking is in the message
String thinking = response.message().thinking();
String content = response.message().content();

System.out.println("Reasoning: " + thinking);
System.out.println("Answer: " + content);

Configuration Options

OllamaChatOptions Methods

enableThinking()

Enable thinking mode (boolean true).

public Builder enableThinking()

Returns: Builder

Example:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .enableThinking()
    .build();

disableThinking()

Disable thinking mode (boolean false).

public Builder disableThinking()

Returns: Builder

Example:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .disableThinking()
    .build();

thinkLow()

Set thinking level to "low" (GPT-OSS only).

public Builder thinkLow()

Returns: Builder

thinkMedium()

Set thinking level to "medium" (GPT-OSS only).

public Builder thinkMedium()

Returns: Builder

thinkHigh()

Set thinking level to "high" (GPT-OSS only).

public Builder thinkHigh()

Returns: Builder

thinkOption()

Set think option explicitly.

public Builder thinkOption(ThinkOption thinkOption)

Parameters:

thinkOption (ThinkOption): ThinkBoolean or ThinkLevel instance

Returns: Builder

Example:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .thinkOption(ThinkOption.ThinkBoolean.ENABLED)
    .build();

ChatRequest Methods

Same methods available on ChatRequest.Builder:

ChatRequest request = ChatRequest.builder("qwen3:4b-thinking")
    .messages(messages)
    .enableThinking()  // or .disableThinking(), .thinkLow(), etc.
    .build();

Default Behavior

In Ollama 0.12+, thinking-capable models auto-enable thinking:

// These models auto-enable thinking by default:
// - qwen3:*-thinking
// - deepseek-r1
// - deepseek-v3.1

// No need to call enableThinking() - already enabled
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .build();

// Thinking is enabled by default

Standard models (llama3, mistral, etc.) don't support thinking:

// This has no effect - model doesn't support thinking
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("llama3")
    .enableThinking()  // Ignored
    .build();

Complete Examples

Problem Solving with Thinking

@Service
public class MathSolver {

    private final OllamaChatModel thinkingModel;

    public MathSolver(OllamaApi ollamaApi) {
        this.thinkingModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b-thinking")
                .enableThinking()
                .temperature(0.2)  // Lower temp for math
                .build())
            .build();
    }

    public record Solution(String reasoning, String answer) {}

    public Solution solve(String problem) {
        ChatResponse response = thinkingModel.call(new Prompt(problem));

        AssistantMessage message = response.getResult().getOutput();
        String reasoning = message.getMetadata().get("thinking");
        String answer = message.getText();

        return new Solution(reasoning, answer);
    }
}

// Usage
MathSolver solver = new MathSolver(ollamaApi);
Solution solution = solver.solve(
    "A train leaves Chicago at 3pm traveling at 60mph. " +
    "Another train leaves New York (800 miles away) at 4pm traveling at 70mph. " +
    "When do they meet?"
);

System.out.println("Reasoning:");
System.out.println(solution.reasoning());
System.out.println("\nAnswer:");
System.out.println(solution.answer());

Comparing Thinking vs Non-Thinking

public class ModelComparison {

    public void compareModels(String question) {
        OllamaApi ollamaApi = OllamaApi.builder().build();

        // Standard model
        OllamaChatModel standardModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b")  // Non-thinking version
                .build())
            .build();

        // Thinking model
        OllamaChatModel thinkingModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b-thinking")
                .enableThinking()
                .build())
            .build();

        // Compare responses
        Prompt prompt = new Prompt(question);

        System.out.println("=== Standard Model ===");
        ChatResponse standardResponse = standardModel.call(prompt);
        System.out.println(standardResponse.getResult().getOutput().getText());

        System.out.println("\n=== Thinking Model ===");
        ChatResponse thinkingResponse = thinkingModel.call(prompt);
        AssistantMessage message = thinkingResponse.getResult().getOutput();

        String reasoning = message.getMetadata().get("thinking");
        if (reasoning != null) {
            System.out.println("Reasoning: " + reasoning);
        }
        System.out.println("Answer: " + message.getText());
    }
}

Dynamic Thinking Control

public class AdaptiveChat {

    private final OllamaChatModel chatModel;

    public AdaptiveChat(OllamaApi ollamaApi) {
        this.chatModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b-thinking")
                .build())
            .build();
    }

    public String chat(String message, boolean needsReasoning) {
        OllamaChatOptions options = OllamaChatOptions.builder()
            .thinkOption(needsReasoning
                ? ThinkOption.ThinkBoolean.ENABLED
                : ThinkOption.ThinkBoolean.DISABLED)
            .build();

        ChatResponse response = chatModel.call(new Prompt(message, options));
        return response.getResult().getOutput().getText();
    }

    public String complexQuestion(String question) {
        return chat(question, true);  // Enable thinking
    }

    public String simpleQuestion(String question) {
        return chat(question, false);  // Disable thinking for speed
    }
}

// Usage
AdaptiveChat chat = new AdaptiveChat(ollamaApi);

// Simple question - fast response without thinking
String greeting = chat.simpleQuestion("Hello, how are you?");

// Complex question - detailed reasoning
String solution = chat.complexQuestion(
    "If I invest $10,000 at 5% annual interest, " +
    "compounded quarterly, how much will I have after 3 years?"
);

GPT-OSS with Thinking Levels

public class GPTOSSThinking {

    private final OllamaApi ollamaApi;

    public GPTOSSThinking(OllamaApi ollamaApi) {
        this.ollamaApi = ollamaApi;
    }

    public String solveWithLevel(String problem, ThinkOption.ThinkLevel level) {
        OllamaChatModel model = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("gpt-oss")
                .thinkOption(level)
                .build())
            .build();

        ChatResponse response = model.call(new Prompt(problem));
        return response.getResult().getOutput().getText();
    }

    public void compareThinkingLevels(String problem) {
        System.out.println("=== Low Thinking ===");
        System.out.println(solveWithLevel(problem, ThinkOption.ThinkLevel.LOW));

        System.out.println("\n=== Medium Thinking ===");
        System.out.println(solveWithLevel(problem, ThinkOption.ThinkLevel.MEDIUM));

        System.out.println("\n=== High Thinking ===");
        System.out.println(solveWithLevel(problem, ThinkOption.ThinkLevel.HIGH));
    }
}

Use Cases

Complex Problem Solving

Enable thinking for math, logic, and reasoning problems:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .enableThinking()
    .build();

// Math problems
// Logic puzzles
// Multi-step reasoning
// Strategy planning

Code Analysis

Use thinking for code review and debugging:

String codeReview = """
Review this code and explain any issues:

public int divide(int a, int b) {
    return a / b;
}
""";

// Model will reason about edge cases, errors, etc.

Decision Making

Help users understand decision reasoning:

String decision = """
I need to choose between two job offers:
- Job A: $100k salary, 2 weeks vacation, close to home
- Job B: $120k salary, 3 weeks vacation, 1 hour commute
What factors should I consider?
""";

// Model shows reasoning about trade-offs

Educational Content

Show step-by-step problem solving:

String teaching = """
Explain how to solve this algebra problem step by step:
Solve for x: 3x + 5 = 2x + 12
""";

// Thinking trace shows each step of the solution

Best Practices

Enable for Complex Tasks: Use thinking for problems requiring multi-step reasoning
Disable for Simple Queries: Skip thinking for fast, simple responses
Lower Temperature: Use lower temperature (0.1-0.3) for logical/math problems
Extract Reasoning: Always check for thinking traces in responses
Model Selection: Choose thinking-capable models for reasoning tasks
Level Selection: Start with low/medium levels for GPT-OSS, increase if needed
Monitor Performance: Thinking increases response time and token usage
Validate Reasoning: Review thinking traces for logic errors
User Transparency: Show reasoning to users for trust and understanding
Fallback Logic: Handle models that don't support thinking gracefully

Performance Considerations

Response Time: Thinking adds latency due to additional tokens
Token Usage: Reasoning traces consume more tokens
Quality vs Speed: Balance reasoning depth with response time
Model Size: Larger models generally produce better reasoning
Temperature: Lower temperature improves reasoning consistency

Limitations

Model Support: Limited to specific thinking-capable models
Boolean vs Levels: Different models require different formats
Quality Variance: Reasoning quality varies by model and problem
No Guarantee: Models may not always produce useful reasoning
Streaming: Thinking in streaming responses requires special handling
Not All Problems: Some problems don't benefit from explicit reasoning

Notes

Thinking is auto-enabled in Ollama 0.12+ for thinking models
ThinkOption is a sealed interface with two implementations
Boolean format works for most models (Qwen 3, DeepSeek)
Level format is GPT-OSS specific
Thinking field is in the Message record
Not all Ollama models support thinking
Thinking traces appear before the final answer
Custom serialization/deserialization handles both boolean and string formats

tessl i tessl/maven-org-springframework-ai--spring-ai-ollama@1.1.1

docs

examples

guides

reference

tessl/maven-org-springframework-ai--spring-ai-ollama

thinking.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/reference/

Thinking and Reasoning Models

Overview

Supported Models

ThinkOption Interface

Class Information

ThinkOption.ThinkBoolean

ThinkOption.ThinkLevel

Basic Usage

Enabling Thinking (Boolean)

Disabling Thinking

Setting Thinking Level (GPT-OSS)

Accessing Thinking Output

From ChatResponse

From Low-Level API

Configuration Options

OllamaChatOptions Methods

enableThinking()

disableThinking()

thinkLow()

thinkMedium()

thinkHigh()

thinkOption()

ChatRequest Methods

Default Behavior

Complete Examples

Problem Solving with Thinking

Comparing Thinking vs Non-Thinking

Dynamic Thinking Control

GPT-OSS with Thinking Levels

Use Cases

Complex Problem Solving

Code Analysis

Decision Making

Educational Content

Best Practices

Performance Considerations

Limitations

Related Documentation

Notes

thinking.mddocs/reference/