CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/maven-org-springframework-ai--spring-ai-ollama

Spring Boot-compatible Ollama integration providing ChatModel and EmbeddingModel implementations for running large language models locally with support for streaming, tool calling, model management, and observability.

Overview
Eval results
Files

models.mddocs/reference/

OllamaModel Enum

Pre-configured model identifiers for Ollama.

Overview

OllamaModel is an enum providing type-safe constants for popular Ollama models. It implements ChatModelDescription and provides consistent model names across your application.

Class Information

package org.springframework.ai.ollama.api;

public enum OllamaModel implements ChatModelDescription

Implements: org.springframework.ai.model.ChatModelDescription

Using Model Constants

In Options

// With chat options
OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAMA3)  // Type-safe model selection
    .temperature(0.7)
    .build();

// With embedding options
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
    .model(OllamaModel.NOMIC_EMBED_TEXT)
    .build();

Getting Model ID

String modelId = OllamaModel.LLAMA3.id();  // "llama3"
String modelName = OllamaModel.LLAMA3.getName();  // Same as id()

Direct String Use

// Using string directly
.model(OllamaModel.MISTRAL.id())

// Or let the builder handle it
.model(OllamaModel.MISTRAL)

Available Models

Qwen Family

Chinese language models with strong multilingual capabilities.

// Qwen 2.5 models
OllamaModel.QWEN_2_5_3B          // "qwen2.5:3b" - 3B parameter model
OllamaModel.QWEN_2_5_7B          // "qwen2.5" - 7B model (default)

// Vision-language model
OllamaModel.QWEN2_5_VL           // "qwen2.5vl" - Multimodal model

// Qwen 3 models
OllamaModel.QWEN3_7B             // "qwen3:7b" - Latest generation 7B
OllamaModel.QWEN3_4B             // "qwen3:4b" - 4B model
OllamaModel.QWEN3_4B_THINKING    // "qwen3:4b-thinking" - With reasoning
OllamaModel.QWEN_3_1_7_B         // "qwen3:1.7b" - 1.7B model
OllamaModel.QWEN_3_06B           // "qwen3:0.6b" - Smallest Qwen3

// Reasoning model
OllamaModel.QWQ                  // "qwq" - Qwen reasoning model

Key Features:

  • Strong multilingual support (especially Chinese/English)
  • Vision capabilities (Qwen2.5VL)
  • Reasoning/thinking support (Qwen3 thinking variants)
  • Range of sizes for different use cases

Llama Family

Meta's open-source models, widely used and well-supported.

// Standard models
OllamaModel.LLAMA2               // "llama2" - 7B-70B range
OllamaModel.LLAMA3               // "llama3" - 8B-70B range
OllamaModel.LLAMA3_1             // "llama3.1" - 8B model

// Llama 3.2 variants
OllamaModel.LLAMA3_2             // "llama3.2" - 3B model
OllamaModel.LLAMA3_2_1B          // "llama3.2:1b" - 1B model
OllamaModel.LLAMA3_2_3B          // "llama3.2:3b" - 3B model

// Vision models
OllamaModel.LLAMA3_2_VISION_11b  // "llama3.2-vision" - 11B vision model
OllamaModel.LLAMA3_2_VISION_90b  // "llama3.2-vision:90b" - 90B vision model

// Uncensored variant
OllamaModel.LLAMA2_UNCENSORED    // "llama2-uncensored"

// Code-specialized
OllamaModel.CODELLAMA            // "codellama" - Code generation

Key Features:

  • Excellent general-purpose models
  • Strong instruction following
  • Vision capabilities (Llama 3.2 Vision)
  • Code specialization (CodeLlama)
  • Wide range of sizes

Mistral Family

High-performance models from Mistral AI.

OllamaModel.MISTRAL              // "mistral" - 7B model
OllamaModel.MISTRAL_NEMO         // "mistral-nemo" - 12B with 128k context

Key Features:

  • High quality output
  • Long context (Mistral Nemo: 128k tokens)
  • Efficient inference
  • Strong tool calling support

Phi Family

Microsoft's compact, efficient models.

OllamaModel.PHI                  // "phi" - Phi-2 2.7B
OllamaModel.PHI3                 // "phi3" - Phi-3 3.8B
OllamaModel.DOLPHIN_PHI          // "dolphin-phi" - Uncensored 2.7B

Key Features:

  • Small but capable
  • Fast inference
  • Good for resource-constrained environments

Gemma Family

Google's lightweight models.

OllamaModel.GEMMA                // "gemma" - 2B-7B range
OllamaModel.GEMMA3               // "gemma3" - Latest generation

Key Features:

  • Lightweight and fast
  • Strong performance for size
  • Good for edge deployment

Vision/Multimodal Models

Models with image understanding capabilities.

// Dedicated vision models
OllamaModel.LLAVA                // "llava" - LLaVA vision model
OllamaModel.MOONDREAM            // "moondream" - Efficient edge vision model

// Vision-capable variants (see Qwen and Llama sections)
OllamaModel.QWEN2_5_VL
OllamaModel.LLAMA3_2_VISION_11b
OllamaModel.LLAMA3_2_VISION_90b

Usage:

OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAVA)
    .build();

// Use with images in messages
UserMessage message = UserMessage.builder()
    .text("What's in this image?")
    .media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource)))
    .build();

Embedding Models

Specialized models for generating embeddings.

OllamaModel.NOMIC_EMBED_TEXT     // "nomic-embed-text" - Large context
OllamaModel.MXBAI_EMBED_LARGE    // "mxbai-embed-large" - State-of-the-art

Usage:

OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
    .model(OllamaModel.NOMIC_EMBED_TEXT)
    .build();

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(options)
    .build();

Features:

  • Nomic Embed Text: High-quality, large context (8192 tokens)
  • MxBAI Embed Large: State-of-the-art embeddings from MixedBread AI

Specialized Models

Models fine-tuned for specific tasks.

OllamaModel.NEURAL_CHAT          // "neural-chat" - Conversational
OllamaModel.STARLING_LM          // "starling-lm" - Starling-7B
OllamaModel.ORCA_MINI            // "orca-mini" - 3B-70B range

Model Selection Guide

By Size

Tiny (< 1B parameters)

  • QWEN_3_06B - 0.6B

Small (1-3B parameters)

  • LLAMA3_2_1B - 1B
  • QWEN_3_1_7_B - 1.7B
  • PHI - 2.7B
  • LLAMA3_2_3B - 3B
  • QWEN_2_5_3B - 3B
  • GEMMA - 2B

Medium (4-8B parameters)

  • QWEN3_4B - 4B
  • MISTRAL - 7B
  • LLAMA3 - 8B
  • QWEN_2_5_7B - 7B

Large (10B+ parameters)

  • LLAMA3_2_VISION_11b - 11B
  • MISTRAL_NEMO - 12B
  • LLAMA3_2_VISION_90b - 90B
  • LLAMA2 - up to 70B

By Capability

General Chat

  • LLAMA3 - Excellent all-around
  • MISTRAL - High quality
  • QWEN3_7B - Strong multilingual

Code Generation

  • CODELLAMA - Specialized for code
  • LLAMA3 - Good general coding
  • MISTRAL - Strong logical reasoning

Long Context

  • MISTRAL_NEMO - 128k tokens
  • NOMIC_EMBED_TEXT - 8192 tokens (embeddings)

Vision/Multimodal

  • LLAVA - Dedicated vision
  • LLAMA3_2_VISION_11b - Balance of size/capability
  • QWEN2_5_VL - Multimodal + multilingual
  • MOONDREAM - Efficient edge vision

Reasoning/Thinking

  • QWQ - Qwen reasoning model
  • QWEN3_4B_THINKING - With thinking traces

Embeddings

  • NOMIC_EMBED_TEXT - Large context
  • MXBAI_EMBED_LARGE - State-of-the-art

Multilingual

  • QWEN3_7B - Strong Chinese/English
  • QWEN_2_5_7B - Multilingual
  • LLAMA3 - Good multilingual support

By Resource Requirements

Edge/Mobile (< 2GB RAM)

  • QWEN_3_06B
  • LLAMA3_2_1B
  • MOONDREAM (vision)

Consumer Hardware (4-8GB RAM)

  • PHI3
  • QWEN3_4B
  • MISTRAL
  • LLAMA3_2_3B

Workstation (16GB+ RAM)

  • LLAMA3 (8B)
  • QWEN3_7B
  • MISTRAL_NEMO

Server (32GB+ RAM)

  • LLAMA3 (70B)
  • LLAMA3_2_VISION_90b

Usage Examples

Basic Model Selection

OllamaChatOptions options = OllamaChatOptions.builder()
    .model(OllamaModel.LLAMA3)
    .temperature(0.7)
    .build();

OllamaChatModel chatModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(options)
    .build();

Model Switching

// Default model
OllamaChatModel chatModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaChatOptions.builder()
        .model(OllamaModel.LLAMA3)
        .build())
    .build();

// Override for specific request
OllamaChatOptions requestOptions = OllamaChatOptions.builder()
    .model(OllamaModel.QWEN3_4B_THINKING)  // Use thinking model
    .enableThinking()
    .build();

ChatResponse response = chatModel.call(
    new Prompt("Solve this puzzle...", requestOptions)
);

Vision Model Usage

OllamaChatModel visionModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaChatOptions.builder()
        .model(OllamaModel.LLAVA)
        .build())
    .build();

UserMessage message = UserMessage.builder()
    .text("Describe this image")
    .media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource)))
    .build();

ChatResponse response = visionModel.call(new Prompt(message));

Code Generation

OllamaChatModel codeModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaChatOptions.builder()
        .model(OllamaModel.CODELLAMA)
        .temperature(0.2)  // Lower temp for more deterministic code
        .build())
    .build();

String code = codeModel.call(
    new Prompt("Write a function to sort an array")
).getResult().getOutput().getText();

Embedding Generation

OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(OllamaEmbeddingOptions.builder()
        .model(OllamaModel.NOMIC_EMBED_TEXT)
        .build())
    .build();

float[] embedding = embeddingModel.embed("Hello, world!");

Model Comparison

List<OllamaModel> modelsToTest = List.of(
    OllamaModel.LLAMA3,
    OllamaModel.MISTRAL,
    OllamaModel.QWEN3_7B
);

String prompt = "Explain quantum computing";

for (OllamaModel model : modelsToTest) {
    OllamaChatOptions options = OllamaChatOptions.builder()
        .model(model)
        .build();

    ChatResponse response = chatModel.call(new Prompt(prompt, options));
    System.out.println(model.id() + ": " + response.getResult().getOutput().getText());
}

Methods

id()

Get the model identifier string.

String id = OllamaModel.LLAMA3.id();  // "llama3"

Returns: String model identifier

getName()

Get the model name (same as id()).

String name = OllamaModel.LLAMA3.getName();  // "llama3"

Returns: String model name

Note: This method comes from the ChatModelDescription interface.

Best Practices

  1. Use Constants: Prefer enum constants over string literals

    // Good
    .model(OllamaModel.LLAMA3)
    
    // Avoid
    .model("llama3")
  2. Select Appropriate Size: Match model size to your resources

    // Edge device
    .model(OllamaModel.QWEN_3_06B)
    
    // Workstation
    .model(OllamaModel.LLAMA3)
  3. Use Specialized Models: Choose models optimized for your task

    // Code generation
    .model(OllamaModel.CODELLAMA)
    
    // Vision tasks
    .model(OllamaModel.LLAVA)
    
    // Embeddings
    .model(OllamaModel.NOMIC_EMBED_TEXT)
  4. Consider Context Length: For long documents, use models with large context windows

    .model(OllamaModel.MISTRAL_NEMO)  // 128k context
  5. Model Management: Ensure models are available before use

    ModelManagementOptions options = ModelManagementOptions.builder()
        .pullModelStrategy(PullModelStrategy.WHEN_MISSING)
        .additionalModels(List.of(
            OllamaModel.LLAMA3.id(),
            OllamaModel.NOMIC_EMBED_TEXT.id()
        ))
        .build();

Notes

  1. Model availability depends on your Ollama installation
  2. Not all models support all features (e.g., tool calling, vision, thinking)
  3. Model names may include version tags (e.g., "llama3:70b" vs "llama3")
  4. The enum provides common models - you can still use custom model names as strings
  5. Model performance and size vary - check Ollama documentation for details
  6. Some models require significant disk space and RAM

Related Documentation

  • OllamaChatOptions - Model configuration options
  • OllamaEmbeddingOptions - Embedding model options
  • Model Management - Pulling and managing models
  • Multimodal Support - Using vision models
  • Thinking Models - Reasoning model usage
  • Tool Calling - Models that support function calling
tessl i tessl/maven-org-springframework-ai--spring-ai-ollama@1.1.1

docs

index.md

tile.json