Spring Boot-compatible Ollama integration providing ChatModel and EmbeddingModel implementations for running large language models locally with support for streaming, tool calling, model management, and observability.
Configuration options for Ollama embedding model operations.
OllamaEmbeddingOptions provides configuration for embedding generation, including model selection, memory management, and GPU allocation. It's a simpler configuration than chat options since embeddings don't require generation parameters.
package org.springframework.ai.ollama.api;
public class OllamaEmbeddingOptions implements EmbeddingOptionsImplements: org.springframework.ai.embedding.EmbeddingOptions
// Using builder
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model(OllamaModel.NOMIC_EMBED_TEXT.id())
.keepAlive("10m")
.build();
// Copy from existing options
OllamaEmbeddingOptions copy = OllamaEmbeddingOptions.fromOptions(existingOptions);The builder provides overloaded methods for convenient model selection.
Model Selection:
// Accepts String model name
public Builder model(String model);
// Accepts OllamaModel enum
public Builder model(OllamaModel model);// Using String
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.build();
// Using enum (recommended)
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model(OllamaModel.MXBAI_EMBED_LARGE)
.build();Controls which embedding model to use.
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
// Model name (required)
.model("nomic-embed-text")
// How long to keep model in memory
.keepAlive("5m")
// Truncate inputs to fit context length
.truncate(true)
.build();Parameters:
model (String): Embedding model name from Ollama librarykeepAlive (String): Duration in Go format (e.g., "5m", "30s", "1h")truncate (Boolean): Auto-truncate to context length (default: true)Configure hardware resource usage for embedding generation.
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
// GPU configuration
.numGPU(-1) // -1 = auto, 0 = CPU only
.mainGPU(0) // Main GPU for multi-GPU setups
.lowVRAM(false) // Low VRAM mode
// Memory management
.useMMap(true) // Memory-map model files
.useMLock(false) // Lock model in RAM
.useNUMA(false) // Enable NUMA
// Processing configuration
.numBatch(512) // Batch size for processing
.numThread(8) // CPU threads to use
.vocabOnly(false) // Load only vocabulary
.build();Hardware Parameters:
numGPU (Integer): Number of GPU layers (default: -1 auto, 0=CPU)mainGPU (Integer): Primary GPU index for multi-GPU (default: 0)lowVRAM (Boolean): Enable low VRAM mode (default: false)useMMap (Boolean): Memory-map model files (default: null)useMLock (Boolean): Lock model in RAM (default: false)useNUMA (Boolean): Enable NUMA support (default: false)numBatch (Integer): Batch size (default: 512)numThread (Integer): CPU threads (default: auto-detect)vocabOnly (Boolean): Load only vocabulary, not weightsOllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model(OllamaModel.NOMIC_EMBED_TEXT.id())
.build();
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(options)
.build();
float[] embedding = embeddingModel.embed("Hello, world!");OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.numBatch(1024) // Larger batch size for throughput
.keepAlive("15m") // Keep model loaded for batch jobs
.build();
List<String> texts = List.of("text1", "text2", "text3");
EmbeddingResponse response = embeddingModel.call(
new EmbeddingRequest(texts, options)
);OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.numGPU(0) // Use CPU only
.numThread(16) // Use 16 CPU threads
.useMMap(true) // Memory-map for efficient loading
.build();OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("mxbai-embed-large")
.numGPU(-1) // Use all available GPU layers
.numBatch(2048) // Large batch size
.useMLock(true) // Lock in RAM for faster access
.keepAlive("30m") // Keep loaded for multiple operations
.build();// Default options
OllamaEmbeddingOptions defaultOptions = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.numBatch(512)
.build();
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.ollamaApi(ollamaApi)
.defaultOptions(defaultOptions)
.build();
// Override for specific request
OllamaEmbeddingOptions requestOptions = OllamaEmbeddingOptions.builder()
.truncate(false) // Don't truncate for this request
.build();
EmbeddingResponse response = embeddingModel.call(
new EmbeddingRequest(List.of("long text..."), requestOptions)
);OllamaEmbeddingOptions provides several static and instance utility methods for working with options.
// Filter non-supported fields from options map
public static Map<String, Object> filterNonSupportedFields(Map<String, Object> options);
// Create from existing options (deep copy)
public static OllamaEmbeddingOptions fromOptions(OllamaEmbeddingOptions options);// Convert options to Map for API requests
public Map<String, Object> toMap();
// Create a copy of these options
public OllamaEmbeddingOptions copy();
// Get embedding dimensions (always returns null for Ollama)
public Integer getDimensions();OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.numGPU(-1)
.build();
Map<String, Object> optionsMap = options.toMap();
// Use in API requests or serializationOllamaEmbeddingOptions original = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.numBatch(512)
.build();
// Create a copy (instance method)
OllamaEmbeddingOptions copy = original.copy();
// Or use static fromOptions method
OllamaEmbeddingOptions copy2 = OllamaEmbeddingOptions.fromOptions(original);
// Modify the copy
copy.setNumBatch(1024);Removes fields that are not part of the Ollama options API but are managed separately in the request (model, keep_alive, truncate).
Map<String, Object> allOptions = Map.of(
"num_batch", 512,
"model", "nomic-embed-text", // Non-supported - part of request
"keep_alive", "5m", // Non-supported - part of request
"truncate", true, // Non-supported - part of request
"num_gpu", -1
);
// Remove fields that aren't part of Ollama options API
Map<String, Object> filtered = OllamaEmbeddingOptions.filterNonSupportedFields(allOptions);
// Returns only: {"num_batch": 512, "num_gpu": -1}Non-Supported Fields: The following fields are filtered out because they're part of the EmbeddingsRequest, not the options:
modelkeep_alivetruncateOllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.build();
// getDimensions() always returns null for Ollama
// Ollama determines dimensions based on the model
Integer dimensions = options.getDimensions(); // nullWhen options are not explicitly set:
numBatch: 512numGPU: -1 (auto-detect)mainGPU: 0lowVRAM: falseuseMMap: null (Ollama default)useMLock: falseuseNUMA: falsenumThread: auto-detectvocabOnly: falsetruncate: trueOllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model(OllamaModel.NOMIC_EMBED_TEXT.id()) // "nomic-embed-text"
.build();
// High-quality, large context window (8192 tokens)OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model(OllamaModel.MXBAI_EMBED_LARGE.id()) // "mxbai-embed-large"
.build();
// State-of-the-art embeddings// For large batches or long-running processes
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.useMLock(true) // Lock in RAM to avoid swapping
.keepAlive("1h") // Keep loaded for extended operations
.build();
// For memory-constrained environments
OllamaEmbeddingOptions options = OllamaEmbeddingOptions.builder()
.model("nomic-embed-text")
.lowVRAM(true) // Reduce VRAM usage
.numGPU(0) // Use CPU if needed
.keepAlive("1m") // Unload quickly after use
.build();// Small batches (lower memory, higher latency)
.numBatch(256)
// Default batch size
.numBatch(512)
// Large batches (higher memory, better throughput)
.numBatch(2048)// CPU-only (no GPU required)
.numGPU(0)
.numThread(8)
// Auto-detect optimal GPU usage
.numGPU(-1)
// Multi-GPU setup
.numGPU(-1)
.mainGPU(0) // Use GPU 0 for small tensors// Auto-truncate long texts (default, safer)
.truncate(true)
// Error on texts exceeding context length (stricter)
.truncate(false)OllamaEmbeddingOptions is simpler than OllamaChatOptions because embeddings don't require:
Both share common parameters:
getDimensions() always returns null - Ollama determines embedding dimensions based on the modelnomic-embed-textvocabOnly option is rarely used - it loads only vocabulary without weightsmodel, keepAlive, and truncate are "synthetic" - they're part of the request but managed through optionstessl i tessl/maven-org-springframework-ai--spring-ai-ollama@1.1.1