CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/maven-org-springframework-ai--spring-ai-ollama

Spring Boot-compatible Ollama integration providing ChatModel and EmbeddingModel implementations for running large language models locally with support for streaming, tool calling, model management, and observability.

Overview
Eval results
Files

thinking.mddocs/reference/

Thinking and Reasoning Models

Enable models to show their reasoning process before providing answers.

Overview

Thinking/reasoning models (like Qwen3, DeepSeek, GPT-OSS) can emit their internal reasoning traces before generating a final answer. This improves answer quality for complex problems and provides transparency into the model's thought process.

Supported Models

Thinking capabilities are available in specific models:

  • Qwen 3 - qwen3:4b-thinking (auto-enables thinking in Ollama 0.12+)
  • DeepSeek R1 - deepseek-r1
  • DeepSeek v3.1 - deepseek-v3.1
  • GPT-OSS - gpt-oss (requires level: low/medium/high)
  • QwQ - qwq (Qwen reasoning model)

ThinkOption Interface

Controls thinking behavior with two implementations.

Class Information

package org.springframework.ai.ollama.api;

public sealed interface ThinkOption {
    Object toJsonValue();
}

ThinkOption.ThinkBoolean

Boolean enable/disable for most thinking models.

public record ThinkBoolean(boolean enabled) implements ThinkOption

Constants:

  • ThinkBoolean.ENABLED - Enable thinking
  • ThinkBoolean.DISABLED - Disable thinking

Supported by:

  • Qwen 3 models
  • DeepSeek R1
  • DeepSeek v3.1

ThinkOption.ThinkLevel

String-level control for GPT-OSS model.

public record ThinkLevel(String level) implements ThinkOption

Constants:

  • ThinkLevel.LOW - Low thinking intensity
  • ThinkLevel.MEDIUM - Medium thinking intensity
  • ThinkLevel.HIGH - High thinking intensity

Supported by:

  • GPT-OSS model only

Valid values: "low", "medium", "high"

Basic Usage

Enabling Thinking (Boolean)

For Qwen 3, DeepSeek models:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .enableThinking()  // Enable reasoning traces
    .build();

ChatResponse response = chatModel.call(
    new Prompt("Solve: If a car travels 60 miles in 90 minutes, what is its speed in mph?", options)
);

// Response includes thinking process

Disabling Thinking

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .disableThinking()  // Disable for this request
    .build();

Setting Thinking Level (GPT-OSS)

For GPT-OSS model:

// Low intensity
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("gpt-oss")
    .thinkLow()
    .build();

// Medium intensity
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("gpt-oss")
    .thinkMedium()
    .build();

// High intensity
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("gpt-oss")
    .thinkHigh()
    .build();

Accessing Thinking Output

From ChatResponse

The thinking process is included in the response message:

ChatResponse response = chatModel.call(new Prompt(complexQuestion, options));

// Get the main response
String answer = response.getResult().getOutput().getText();

// Access thinking trace (if available)
AssistantMessage message = response.getResult().getOutput();
String thinkingTrace = message.getMetadata().get("thinking");

if (thinkingTrace != null) {
    System.out.println("Model's reasoning:");
    System.out.println(thinkingTrace);
    System.out.println("\nFinal answer:");
    System.out.println(answer);
}

From Low-Level API

Using OllamaApi directly:

ChatRequest request = ChatRequest.builder("qwen3:4b-thinking")
    .messages(List.of(
        Message.builder(Role.USER)
            .content("What is 15% of 240?")
            .build()
    ))
    .enableThinking()
    .build();

ChatResponse response = ollamaApi.chat(request);

// Thinking is in the message
String thinking = response.message().thinking();
String content = response.message().content();

System.out.println("Reasoning: " + thinking);
System.out.println("Answer: " + content);

Configuration Options

OllamaChatOptions Methods

enableThinking()

Enable thinking mode (boolean true).

public Builder enableThinking()

Returns: Builder

Example:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .enableThinking()
    .build();

disableThinking()

Disable thinking mode (boolean false).

public Builder disableThinking()

Returns: Builder

Example:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .disableThinking()
    .build();

thinkLow()

Set thinking level to "low" (GPT-OSS only).

public Builder thinkLow()

Returns: Builder

thinkMedium()

Set thinking level to "medium" (GPT-OSS only).

public Builder thinkMedium()

Returns: Builder

thinkHigh()

Set thinking level to "high" (GPT-OSS only).

public Builder thinkHigh()

Returns: Builder

thinkOption()

Set think option explicitly.

public Builder thinkOption(ThinkOption thinkOption)

Parameters:

  • thinkOption (ThinkOption): ThinkBoolean or ThinkLevel instance

Returns: Builder

Example:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .thinkOption(ThinkOption.ThinkBoolean.ENABLED)
    .build();

ChatRequest Methods

Same methods available on ChatRequest.Builder:

ChatRequest request = ChatRequest.builder("qwen3:4b-thinking")
    .messages(messages)
    .enableThinking()  // or .disableThinking(), .thinkLow(), etc.
    .build();

Default Behavior

In Ollama 0.12+, thinking-capable models auto-enable thinking:

// These models auto-enable thinking by default:
// - qwen3:*-thinking
// - deepseek-r1
// - deepseek-v3.1

// No need to call enableThinking() - already enabled
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .build();

// Thinking is enabled by default

Standard models (llama3, mistral, etc.) don't support thinking:

// This has no effect - model doesn't support thinking
OllamaChatOptions options = OllamaChatOptions.builder()
    .model("llama3")
    .enableThinking()  // Ignored
    .build();

Complete Examples

Problem Solving with Thinking

@Service
public class MathSolver {

    private final OllamaChatModel thinkingModel;

    public MathSolver(OllamaApi ollamaApi) {
        this.thinkingModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b-thinking")
                .enableThinking()
                .temperature(0.2)  // Lower temp for math
                .build())
            .build();
    }

    public record Solution(String reasoning, String answer) {}

    public Solution solve(String problem) {
        ChatResponse response = thinkingModel.call(new Prompt(problem));

        AssistantMessage message = response.getResult().getOutput();
        String reasoning = message.getMetadata().get("thinking");
        String answer = message.getText();

        return new Solution(reasoning, answer);
    }
}

// Usage
MathSolver solver = new MathSolver(ollamaApi);
Solution solution = solver.solve(
    "A train leaves Chicago at 3pm traveling at 60mph. " +
    "Another train leaves New York (800 miles away) at 4pm traveling at 70mph. " +
    "When do they meet?"
);

System.out.println("Reasoning:");
System.out.println(solution.reasoning());
System.out.println("\nAnswer:");
System.out.println(solution.answer());

Comparing Thinking vs Non-Thinking

public class ModelComparison {

    public void compareModels(String question) {
        OllamaApi ollamaApi = OllamaApi.builder().build();

        // Standard model
        OllamaChatModel standardModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b")  // Non-thinking version
                .build())
            .build();

        // Thinking model
        OllamaChatModel thinkingModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b-thinking")
                .enableThinking()
                .build())
            .build();

        // Compare responses
        Prompt prompt = new Prompt(question);

        System.out.println("=== Standard Model ===");
        ChatResponse standardResponse = standardModel.call(prompt);
        System.out.println(standardResponse.getResult().getOutput().getText());

        System.out.println("\n=== Thinking Model ===");
        ChatResponse thinkingResponse = thinkingModel.call(prompt);
        AssistantMessage message = thinkingResponse.getResult().getOutput();

        String reasoning = message.getMetadata().get("thinking");
        if (reasoning != null) {
            System.out.println("Reasoning: " + reasoning);
        }
        System.out.println("Answer: " + message.getText());
    }
}

Dynamic Thinking Control

public class AdaptiveChat {

    private final OllamaChatModel chatModel;

    public AdaptiveChat(OllamaApi ollamaApi) {
        this.chatModel = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b-thinking")
                .build())
            .build();
    }

    public String chat(String message, boolean needsReasoning) {
        OllamaChatOptions options = OllamaChatOptions.builder()
            .thinkOption(needsReasoning
                ? ThinkOption.ThinkBoolean.ENABLED
                : ThinkOption.ThinkBoolean.DISABLED)
            .build();

        ChatResponse response = chatModel.call(new Prompt(message, options));
        return response.getResult().getOutput().getText();
    }

    public String complexQuestion(String question) {
        return chat(question, true);  // Enable thinking
    }

    public String simpleQuestion(String question) {
        return chat(question, false);  // Disable thinking for speed
    }
}

// Usage
AdaptiveChat chat = new AdaptiveChat(ollamaApi);

// Simple question - fast response without thinking
String greeting = chat.simpleQuestion("Hello, how are you?");

// Complex question - detailed reasoning
String solution = chat.complexQuestion(
    "If I invest $10,000 at 5% annual interest, " +
    "compounded quarterly, how much will I have after 3 years?"
);

GPT-OSS with Thinking Levels

public class GPTOSSThinking {

    private final OllamaApi ollamaApi;

    public GPTOSSThinking(OllamaApi ollamaApi) {
        this.ollamaApi = ollamaApi;
    }

    public String solveWithLevel(String problem, ThinkOption.ThinkLevel level) {
        OllamaChatModel model = OllamaChatModel.builder()
            .ollamaApi(ollamaApi)
            .defaultOptions(OllamaChatOptions.builder()
                .model("gpt-oss")
                .thinkOption(level)
                .build())
            .build();

        ChatResponse response = model.call(new Prompt(problem));
        return response.getResult().getOutput().getText();
    }

    public void compareThinkingLevels(String problem) {
        System.out.println("=== Low Thinking ===");
        System.out.println(solveWithLevel(problem, ThinkOption.ThinkLevel.LOW));

        System.out.println("\n=== Medium Thinking ===");
        System.out.println(solveWithLevel(problem, ThinkOption.ThinkLevel.MEDIUM));

        System.out.println("\n=== High Thinking ===");
        System.out.println(solveWithLevel(problem, ThinkOption.ThinkLevel.HIGH));
    }
}

Use Cases

Complex Problem Solving

Enable thinking for math, logic, and reasoning problems:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model("qwen3:4b-thinking")
    .enableThinking()
    .build();

// Math problems
// Logic puzzles
// Multi-step reasoning
// Strategy planning

Code Analysis

Use thinking for code review and debugging:

String codeReview = """
Review this code and explain any issues:

public int divide(int a, int b) {
    return a / b;
}
""";

// Model will reason about edge cases, errors, etc.

Decision Making

Help users understand decision reasoning:

String decision = """
I need to choose between two job offers:
- Job A: $100k salary, 2 weeks vacation, close to home
- Job B: $120k salary, 3 weeks vacation, 1 hour commute
What factors should I consider?
""";

// Model shows reasoning about trade-offs

Educational Content

Show step-by-step problem solving:

String teaching = """
Explain how to solve this algebra problem step by step:
Solve for x: 3x + 5 = 2x + 12
""";

// Thinking trace shows each step of the solution

Best Practices

  1. Enable for Complex Tasks: Use thinking for problems requiring multi-step reasoning
  2. Disable for Simple Queries: Skip thinking for fast, simple responses
  3. Lower Temperature: Use lower temperature (0.1-0.3) for logical/math problems
  4. Extract Reasoning: Always check for thinking traces in responses
  5. Model Selection: Choose thinking-capable models for reasoning tasks
  6. Level Selection: Start with low/medium levels for GPT-OSS, increase if needed
  7. Monitor Performance: Thinking increases response time and token usage
  8. Validate Reasoning: Review thinking traces for logic errors
  9. User Transparency: Show reasoning to users for trust and understanding
  10. Fallback Logic: Handle models that don't support thinking gracefully

Performance Considerations

  1. Response Time: Thinking adds latency due to additional tokens
  2. Token Usage: Reasoning traces consume more tokens
  3. Quality vs Speed: Balance reasoning depth with response time
  4. Model Size: Larger models generally produce better reasoning
  5. Temperature: Lower temperature improves reasoning consistency

Limitations

  1. Model Support: Limited to specific thinking-capable models
  2. Boolean vs Levels: Different models require different formats
  3. Quality Variance: Reasoning quality varies by model and problem
  4. No Guarantee: Models may not always produce useful reasoning
  5. Streaming: Thinking in streaming responses requires special handling
  6. Not All Problems: Some problems don't benefit from explicit reasoning

Related Documentation

  • OllamaChatOptions - Thinking configuration options
  • OllamaModel - Thinking-capable model constants
  • OllamaChatModel - Using the chat model
  • API Types - ThinkOption types

Notes

  1. Thinking is auto-enabled in Ollama 0.12+ for thinking models
  2. ThinkOption is a sealed interface with two implementations
  3. Boolean format works for most models (Qwen 3, DeepSeek)
  4. Level format is GPT-OSS specific
  5. Thinking field is in the Message record
  6. Not all Ollama models support thinking
  7. Thinking traces appear before the final answer
  8. Custom serialization/deserialization handles both boolean and string formats
tessl i tessl/maven-org-springframework-ai--spring-ai-ollama@1.1.1

docs

index.md

tile.json