Spring Boot-compatible Ollama integration providing ChatModel and EmbeddingModel implementations for running large language models locally with support for streaming, tool calling, model management, and observability.
Pre-configured model identifiers for Ollama.
OllamaModel is an enum providing type-safe constants for popular Ollama models. It implements ChatModelDescription and provides consistent model names across your application.
package org.springframework.ai.ollama.api;
public enum OllamaModel implements ChatModelDescriptionImplements: org.springframework.ai.model.ChatModelDescription
// With chat options
OllamaChatOptions options = OllamaChatOptions.builder()
.model(OllamaModel.LLAMA3) // Type-safe model selection
.temperature(0.7)
.build();
// With embedding options
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model(OllamaModel.NOMIC_EMBED_TEXT)
.build();String modelId = OllamaModel.LLAMA3.id(); // "llama3"
String modelName = OllamaModel.LLAMA3.getName(); // Same as id()// Using string directly
.model(OllamaModel.MISTRAL.id())
// Or let the builder handle it
.model(OllamaModel.MISTRAL)Chinese language models with strong multilingual capabilities.
// Qwen 2.5 models
OllamaModel.QWEN_2_5_3B // "qwen2.5:3b" - 3B parameter model
OllamaModel.QWEN_2_5_7B // "qwen2.5" - 7B model (default)
// Vision-language model
OllamaModel.QWEN2_5_VL // "qwen2.5vl" - Multimodal model
// Qwen 3 models
OllamaModel.QWEN3_7B // "qwen3:7b" - Latest generation 7B
OllamaModel.QWEN3_4B // "qwen3:4b" - 4B model
OllamaModel.QWEN3_4B_THINKING // "qwen3:4b-thinking" - With reasoning
OllamaModel.QWEN_3_1_7_B // "qwen3:1.7b" - 1.7B model
OllamaModel.QWEN_3_06B // "qwen3:0.6b" - Smallest Qwen3
// Reasoning model
OllamaModel.QWQ // "qwq" - Qwen reasoning modelKey Features:
Meta's open-source models, widely used and well-supported.
// Standard models
OllamaModel.LLAMA2 // "llama2" - 7B-70B range
OllamaModel.LLAMA3 // "llama3" - 8B-70B range
OllamaModel.LLAMA3_1 // "llama3.1" - 8B model
// Llama 3.2 variants
OllamaModel.LLAMA3_2 // "llama3.2" - 3B model
OllamaModel.LLAMA3_2_1B // "llama3.2:1b" - 1B model
OllamaModel.LLAMA3_2_3B // "llama3.2:3b" - 3B model
// Vision models
OllamaModel.LLAMA3_2_VISION_11b // "llama3.2-vision" - 11B vision model
OllamaModel.LLAMA3_2_VISION_90b // "llama3.2-vision:90b" - 90B vision model
// Uncensored variant
OllamaModel.LLAMA2_UNCENSORED // "llama2-uncensored"
// Code-specialized
OllamaModel.CODELLAMA // "codellama" - Code generationKey Features:
High-performance models from Mistral AI.
OllamaModel.MISTRAL // "mistral" - 7B model
OllamaModel.MISTRAL_NEMO // "mistral-nemo" - 12B with 128k contextKey Features:
Microsoft's compact, efficient models.
OllamaModel.PHI // "phi" - Phi-2 2.7B
OllamaModel.PHI3 // "phi3" - Phi-3 3.8B
OllamaModel.DOLPHIN_PHI // "dolphin-phi" - Uncensored 2.7BKey Features:
Google's lightweight models.
OllamaModel.GEMMA // "gemma" - 2B-7B range
OllamaModel.GEMMA3 // "gemma3" - Latest generationKey Features:
Models with image understanding capabilities.
// Dedicated vision models
OllamaModel.LLAVA // "llava" - LLaVA vision model
OllamaModel.MOONDREAM // "moondream" - Efficient edge vision model
// Vision-capable variants (see Qwen and Llama sections)
OllamaModel.QWEN2_5_VL
OllamaModel.LLAMA3_2_VISION_11b
OllamaModel.LLAMA3_2_VISION_90bUsage:
OllamaChatOptions options = OllamaChatOptions.builder()
.model(OllamaModel.LLAVA)
.build();
// Use with images in messages
UserMessage message = UserMessage.builder()
.text("What's in this image?")
.media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource)))
.build();Specialized models for generating embeddings.
OllamaModel.NOMIC_EMBED_TEXT // "nomic-embed-text" - Large context
OllamaModel.MXBAI_EMBED_LARGE // "mxbai-embed-large" - State-of-the-artUsage:
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model(OllamaModel.NOMIC_EMBED_TEXT)
.build();
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(options)
.build();Features:
Models fine-tuned for specific tasks.
OllamaModel.NEURAL_CHAT // "neural-chat" - Conversational
OllamaModel.STARLING_LM // "starling-lm" - Starling-7B
OllamaModel.ORCA_MINI // "orca-mini" - 3B-70B rangeTiny (< 1B parameters)
QWEN_3_06B - 0.6BSmall (1-3B parameters)
LLAMA3_2_1B - 1BQWEN_3_1_7_B - 1.7BPHI - 2.7BLLAMA3_2_3B - 3BQWEN_2_5_3B - 3BGEMMA - 2BMedium (4-8B parameters)
QWEN3_4B - 4BMISTRAL - 7BLLAMA3 - 8BQWEN_2_5_7B - 7BLarge (10B+ parameters)
LLAMA3_2_VISION_11b - 11BMISTRAL_NEMO - 12BLLAMA3_2_VISION_90b - 90BLLAMA2 - up to 70BGeneral Chat
LLAMA3 - Excellent all-aroundMISTRAL - High qualityQWEN3_7B - Strong multilingualCode Generation
CODELLAMA - Specialized for codeLLAMA3 - Good general codingMISTRAL - Strong logical reasoningLong Context
MISTRAL_NEMO - 128k tokensNOMIC_EMBED_TEXT - 8192 tokens (embeddings)Vision/Multimodal
LLAVA - Dedicated visionLLAMA3_2_VISION_11b - Balance of size/capabilityQWEN2_5_VL - Multimodal + multilingualMOONDREAM - Efficient edge visionReasoning/Thinking
QWQ - Qwen reasoning modelQWEN3_4B_THINKING - With thinking tracesEmbeddings
NOMIC_EMBED_TEXT - Large contextMXBAI_EMBED_LARGE - State-of-the-artMultilingual
QWEN3_7B - Strong Chinese/EnglishQWEN_2_5_7B - MultilingualLLAMA3 - Good multilingual supportEdge/Mobile (< 2GB RAM)
QWEN_3_06BLLAMA3_2_1BMOONDREAM (vision)Consumer Hardware (4-8GB RAM)
PHI3QWEN3_4BMISTRALLLAMA3_2_3BWorkstation (16GB+ RAM)
LLAMA3 (8B)QWEN3_7BMISTRAL_NEMOServer (32GB+ RAM)
LLAMA3 (70B)LLAMA3_2_VISION_90bOllamaChatOptions options = OllamaChatOptions.builder()
.model(OllamaModel.LLAMA3)
.temperature(0.7)
.build();
OllamaChatModel chatModel = OllamaChatModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(options)
.build();// Default model
OllamaChatModel chatModel = OllamaChatModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(OllamaChatOptions.builder()
.model(OllamaModel.LLAMA3)
.build())
.build();
// Override for specific request
OllamaChatOptions requestOptions = OllamaChatOptions.builder()
.model(OllamaModel.QWEN3_4B_THINKING) // Use thinking model
.enableThinking()
.build();
ChatResponse response = chatModel.call(
new Prompt("Solve this puzzle...", requestOptions)
);OllamaChatModel visionModel = OllamaChatModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(OllamaChatOptions.builder()
.model(OllamaModel.LLAVA)
.build())
.build();
UserMessage message = UserMessage.builder()
.text("Describe this image")
.media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource)))
.build();
ChatResponse response = visionModel.call(new Prompt(message));OllamaChatModel codeModel = OllamaChatModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(OllamaChatOptions.builder()
.model(OllamaModel.CODELLAMA)
.temperature(0.2) // Lower temp for more deterministic code
.build())
.build();
String code = codeModel.call(
new Prompt("Write a function to sort an array")
).getResult().getOutput().getText();OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(OllamaEmbeddingOptions.builder()
.model(OllamaModel.NOMIC_EMBED_TEXT)
.build())
.build();
float[] embedding = embeddingModel.embed("Hello, world!");List<OllamaModel> modelsToTest = List.of(
OllamaModel.LLAMA3,
OllamaModel.MISTRAL,
OllamaModel.QWEN3_7B
);
String prompt = "Explain quantum computing";
for (OllamaModel model : modelsToTest) {
OllamaChatOptions options = OllamaChatOptions.builder()
.model(model)
.build();
ChatResponse response = chatModel.call(new Prompt(prompt, options));
System.out.println(model.id() + ": " + response.getResult().getOutput().getText());
}Get the model identifier string.
String id = OllamaModel.LLAMA3.id(); // "llama3"Returns: String model identifier
Get the model name (same as id()).
String name = OllamaModel.LLAMA3.getName(); // "llama3"Returns: String model name
Note: This method comes from the ChatModelDescription interface.
Use Constants: Prefer enum constants over string literals
// Good
.model(OllamaModel.LLAMA3)
// Avoid
.model("llama3")Select Appropriate Size: Match model size to your resources
// Edge device
.model(OllamaModel.QWEN_3_06B)
// Workstation
.model(OllamaModel.LLAMA3)Use Specialized Models: Choose models optimized for your task
// Code generation
.model(OllamaModel.CODELLAMA)
// Vision tasks
.model(OllamaModel.LLAVA)
// Embeddings
.model(OllamaModel.NOMIC_EMBED_TEXT)Consider Context Length: For long documents, use models with large context windows
.model(OllamaModel.MISTRAL_NEMO) // 128k contextModel Management: Ensure models are available before use
ModelManagementOptions options = ModelManagementOptions.builder()
.pullModelStrategy(PullModelStrategy.WHEN_MISSING)
.additionalModels(List.of(
OllamaModel.LLAMA3.id(),
OllamaModel.NOMIC_EMBED_TEXT.id()
))
.build();tessl i tessl/maven-org-springframework-ai--spring-ai-ollama@1.1.1