CtrlK

Community Documentation Log in Get started

tessl/maven-org-springframework-ai--spring-ai-ollama

Spring Boot-compatible Ollama integration providing ChatModel and EmbeddingModel implementations for running large language models locally with support for streaming, tool calling, model management, and observability.

Overview

Eval results

Files

OllamaModel Enum

Name: tessl/maven-org-springframework-ai--spring-ai-ollama
Author: tessl

Pre-configured model identifiers for Ollama.

Overview

OllamaModel is an enum providing type-safe constants for popular Ollama models. It implements ChatModelDescription and provides consistent model names across your application.

Class Information

package org.springframework.ai.ollama.api;

public enum OllamaModel implements ChatModelDescription

Implements: org.springframework.ai.model.ChatModelDescription

Using Model Constants

In Options

// With chat options
OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAMA3)  // Type-safe model selection
    .temperature(0.7)
    .build();

// With embedding options
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
    .model(OllamaModel.NOMIC_EMBED_TEXT)
    .build();

Getting Model ID

String modelId = OllamaModel.LLAMA3.id();  // "llama3"
String modelName = OllamaModel.LLAMA3.getName();  // Same as id()

Direct String Use

// Using string directly
.model(OllamaModel.MISTRAL.id())

// Or let the builder handle it
.model(OllamaModel.MISTRAL)

Available Models

Qwen Family

Chinese language models with strong multilingual capabilities.

// Qwen 2.5 models
OllamaModel.QWEN_2_5_3B          // "qwen2.5:3b" - 3B parameter model
OllamaModel.QWEN_2_5_7B          // "qwen2.5" - 7B model (default)

// Vision-language model
OllamaModel.QWEN2_5_VL           // "qwen2.5vl" - Multimodal model

// Qwen 3 models
OllamaModel.QWEN3_7B             // "qwen3:7b" - Latest generation 7B
OllamaModel.QWEN3_4B             // "qwen3:4b" - 4B model
OllamaModel.QWEN3_4B_THINKING    // "qwen3:4b-thinking" - With reasoning
OllamaModel.QWEN_3_1_7_B         // "qwen3:1.7b" - 1.7B model
OllamaModel.QWEN_3_06B           // "qwen3:0.6b" - Smallest Qwen3

// Reasoning model
OllamaModel.QWQ                  // "qwq" - Qwen reasoning model

Key Features:

Strong multilingual support (especially Chinese/English)
Vision capabilities (Qwen2.5VL)
Reasoning/thinking support (Qwen3 thinking variants)
Range of sizes for different use cases

Llama Family

Meta's open-source models, widely used and well-supported.

// Standard models
OllamaModel.LLAMA2               // "llama2" - 7B-70B range
OllamaModel.LLAMA3               // "llama3" - 8B-70B range
OllamaModel.LLAMA3_1             // "llama3.1" - 8B model

// Llama 3.2 variants
OllamaModel.LLAMA3_2             // "llama3.2" - 3B model
OllamaModel.LLAMA3_2_1B          // "llama3.2:1b" - 1B model
OllamaModel.LLAMA3_2_3B          // "llama3.2:3b" - 3B model

// Vision models
OllamaModel.LLAMA3_2_VISION_11b  // "llama3.2-vision" - 11B vision model
OllamaModel.LLAMA3_2_VISION_90b  // "llama3.2-vision:90b" - 90B vision model

// Uncensored variant
OllamaModel.LLAMA2_UNCENSORED    // "llama2-uncensored"

// Code-specialized
OllamaModel.CODELLAMA            // "codellama" - Code generation

Key Features:

Excellent general-purpose models
Strong instruction following
Vision capabilities (Llama 3.2 Vision)
Code specialization (CodeLlama)
Wide range of sizes

Mistral Family

High-performance models from Mistral AI.

OllamaModel.MISTRAL              // "mistral" - 7B model
OllamaModel.MISTRAL_NEMO         // "mistral-nemo" - 12B with 128k context

Key Features:

High quality output
Long context (Mistral Nemo: 128k tokens)
Efficient inference
Strong tool calling support

Phi Family

Microsoft's compact, efficient models.

OllamaModel.PHI                  // "phi" - Phi-2 2.7B
OllamaModel.PHI3                 // "phi3" - Phi-3 3.8B
OllamaModel.DOLPHIN_PHI          // "dolphin-phi" - Uncensored 2.7B

Key Features:

Small but capable
Fast inference
Good for resource-constrained environments

Gemma Family

Google's lightweight models.

OllamaModel.GEMMA                // "gemma" - 2B-7B range
OllamaModel.GEMMA3               // "gemma3" - Latest generation

Key Features:

Lightweight and fast
Strong performance for size
Good for edge deployment

Vision/Multimodal Models

Models with image understanding capabilities.

// Dedicated vision models
OllamaModel.LLAVA                // "llava" - LLaVA vision model
OllamaModel.MOONDREAM            // "moondream" - Efficient edge vision model

// Vision-capable variants (see Qwen and Llama sections)
OllamaModel.QWEN2_5_VL
OllamaModel.LLAMA3_2_VISION_11b
OllamaModel.LLAMA3_2_VISION_90b

Usage:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAVA)
    .build();

// Use with images in messages
UserMessage message = UserMessage.builder()
    .text("What's in this image?")
    .media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource)))
    .build();

Embedding Models

Specialized models for generating embeddings.

OllamaModel.NOMIC_EMBED_TEXT     // "nomic-embed-text" - Large context
OllamaModel.MXBAI_EMBED_LARGE    // "mxbai-embed-large" - State-of-the-art

Usage:

OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
    .model(OllamaModel.NOMIC_EMBED_TEXT)
    .build();

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(options)
    .build();

Features:

Nomic Embed Text: High-quality, large context (8192 tokens)
MxBAI Embed Large: State-of-the-art embeddings from MixedBread AI

Specialized Models

Models fine-tuned for specific tasks.

OllamaModel.NEURAL_CHAT          // "neural-chat" - Conversational
OllamaModel.STARLING_LM          // "starling-lm" - Starling-7B
OllamaModel.ORCA_MINI            // "orca-mini" - 3B-70B range

Model Selection Guide

By Size

Tiny (< 1B parameters)

QWEN_3_06B - 0.6B

Small (1-3B parameters)

LLAMA3_2_1B - 1B
QWEN_3_1_7_B - 1.7B
PHI - 2.7B
LLAMA3_2_3B - 3B
QWEN_2_5_3B - 3B
GEMMA - 2B

Medium (4-8B parameters)

QWEN3_4B - 4B
MISTRAL - 7B
LLAMA3 - 8B
QWEN_2_5_7B - 7B

Large (10B+ parameters)

LLAMA3_2_VISION_11b - 11B
MISTRAL_NEMO - 12B
LLAMA3_2_VISION_90b - 90B
LLAMA2 - up to 70B

By Capability

General Chat

LLAMA3 - Excellent all-around
MISTRAL - High quality
QWEN3_7B - Strong multilingual

Code Generation

CODELLAMA - Specialized for code
LLAMA3 - Good general coding
MISTRAL - Strong logical reasoning

Long Context

MISTRAL_NEMO - 128k tokens
NOMIC_EMBED_TEXT - 8192 tokens (embeddings)

Vision/Multimodal

LLAVA - Dedicated vision
LLAMA3_2_VISION_11b - Balance of size/capability
QWEN2_5_VL - Multimodal + multilingual
MOONDREAM - Efficient edge vision

Reasoning/Thinking

QWQ - Qwen reasoning model
QWEN3_4B_THINKING - With thinking traces

Embeddings

NOMIC_EMBED_TEXT - Large context
MXBAI_EMBED_LARGE - State-of-the-art

Multilingual

QWEN3_7B - Strong Chinese/English
QWEN_2_5_7B - Multilingual
LLAMA3 - Good multilingual support

By Resource Requirements

Edge/Mobile (< 2GB RAM)

QWEN_3_06B
LLAMA3_2_1B
MOONDREAM (vision)

Consumer Hardware (4-8GB RAM)

PHI3
QWEN3_4B
MISTRAL
LLAMA3_2_3B

Workstation (16GB+ RAM)

LLAMA3 (8B)
QWEN3_7B
MISTRAL_NEMO

Server (32GB+ RAM)

LLAMA3 (70B)
LLAMA3_2_VISION_90b

Usage Examples

Basic Model Selection

OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAMA3)
    .temperature(0.7)
    .build();

OllamaChatModel chatModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(options)
    .build();

Model Switching

// Default model
OllamaChatModel chatModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaChatOptions.builder()
        .model(OllamaModel.LLAMA3)
        .build())
    .build();

// Override for specific request
OllamaChatOptions requestOptions = OllamaChatOptions.builder()
    .model(OllamaModel.QWEN3_4B_THINKING)  // Use thinking model
    .enableThinking()
    .build();

ChatResponse response = chatModel.call(
    new Prompt("Solve this puzzle...", requestOptions)
);

Vision Model Usage

OllamaChatModel visionModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaChatOptions.builder()
        .model(OllamaModel.LLAVA)
        .build())
    .build();

UserMessage message = UserMessage.builder()
    .text("Describe this image")
    .media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource)))
    .build();

ChatResponse response = visionModel.call(new Prompt(message));

Code Generation

OllamaChatModel codeModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaChatOptions.builder()
        .model(OllamaModel.CODELLAMA)
        .temperature(0.2)  // Lower temp for more deterministic code
        .build())
    .build();

String code = codeModel.call(
    new Prompt("Write a function to sort an array")
).getResult().getOutput().getText();

Embedding Generation

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaEmbeddingOptions.builder()
        .model(OllamaModel.NOMIC_EMBED_TEXT)
        .build())
    .build();

float[] embedding = embeddingModel.embed("Hello, world!");

Model Comparison

List<OllamaModel> modelsToTest = List.of(
    OllamaModel.LLAMA3,
    OllamaModel.MISTRAL,
    OllamaModel.QWEN3_7B
);

String prompt = "Explain quantum computing";

for (OllamaModel model : modelsToTest) {
    OllamaChatOptions options = OllamaChatOptions.builder()
        .model(model)
        .build();

    ChatResponse response = chatModel.call(new Prompt(prompt, options));
    System.out.println(model.id() + ": " + response.getResult().getOutput().getText());
}

Methods

id()

Get the model identifier string.

String id = OllamaModel.LLAMA3.id();  // "llama3"

Returns: String model identifier

getName()

Get the model name (same as id()).

String name = OllamaModel.LLAMA3.getName();  // "llama3"

Returns: String model name

Note: This method comes from the ChatModelDescription interface.

Best Practices

Use Constants: Prefer enum constants over string literals

// Good
.model(OllamaModel.LLAMA3)

// Avoid
.model("llama3")

Select Appropriate Size: Match model size to your resources

// Edge device
.model(OllamaModel.QWEN_3_06B)

// Workstation
.model(OllamaModel.LLAMA3)

Use Specialized Models: Choose models optimized for your task

// Code generation
.model(OllamaModel.CODELLAMA)

// Vision tasks
.model(OllamaModel.LLAVA)

// Embeddings
.model(OllamaModel.NOMIC_EMBED_TEXT)

Consider Context Length: For long documents, use models with large context windows
```
.model(OllamaModel.MISTRAL_NEMO)  // 128k context
```

Model Management: Ensure models are available before use

ModelManagementOptions options = ModelManagementOptions.builder()
    .pullModelStrategy(PullModelStrategy.WHEN_MISSING)
    .additionalModels(List.of(
        OllamaModel.LLAMA3.id(),
        OllamaModel.NOMIC_EMBED_TEXT.id()
    ))
    .build();

Notes

Model availability depends on your Ollama installation
Not all models support all features (e.g., tool calling, vision, thinking)
Model names may include version tags (e.g., "llama3:70b" vs "llama3")
The enum provides common models - you can still use custom model names as strings
Model performance and size vary - check Ollama documentation for details
Some models require significant disk space and RAM

tessl/maven-org-springframework-ai--spring-ai-ollama

models.mddocs/reference/

OllamaModel Enum

Overview

Class Information

Using Model Constants

In Options

Getting Model ID

Direct String Use

Available Models

Qwen Family

Llama Family

Mistral Family

Phi Family

Gemma Family

Vision/Multimodal Models

Embedding Models

Specialized Models

Model Selection Guide

By Size

By Capability

By Resource Requirements

Usage Examples

Basic Model Selection

Model Switching

Vision Model Usage

Code Generation

Embedding Generation

Model Comparison

Methods

id()

getName()

Best Practices

Notes

Related Documentation

tessl/maven-org-springframework-ai--spring-ai-ollama

models.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/reference/

OllamaModel Enum

Overview

Class Information

Using Model Constants

In Options

Getting Model ID

Direct String Use

Available Models

Qwen Family

Llama Family

Mistral Family

Phi Family

Gemma Family

Vision/Multimodal Models

Embedding Models

Specialized Models

Model Selection Guide

By Size

By Capability

By Resource Requirements

Usage Examples

Basic Model Selection

Model Switching

Vision Model Usage

Code Generation

Embedding Generation

Model Comparison

Methods

id()

getName()

Best Practices

Notes

Related Documentation

models.mddocs/reference/