Quarkus extension deployment module for integrating Ollama LLM models with Quarkus applications through the LangChain4j framework
The Quarkus LangChain4j Ollama extension provides comprehensive runtime configuration for connecting to Ollama services and configuring model behavior. Runtime configuration properties are processed when the application starts and can be changed without rebuilding the application.
Runtime configuration operates at two phases:
RUN_TIME): Properties that can change between application restarts (base URL, timeouts, logging, model parameters)BUILD_AND_RUN_TIME_FIXED): Properties that are read during build but can also be accessed at runtime (model IDs)The root configuration interface provides access to default and named Ollama configurations.
package io.quarkiverse.langchain4j.ollama.runtime.config;
import java.util.Map;
import io.quarkus.runtime.annotations.ConfigRoot;
import io.smallrye.config.ConfigMapping;
import static io.quarkus.runtime.annotations.ConfigPhase.RUN_TIME;
@ConfigRoot(phase = RUN_TIME)
@ConfigMapping(prefix = "quarkus.langchain4j.ollama")
public interface LangChain4jOllamaConfig {
/**
* Default Ollama configuration
*/
OllamaConfig defaultConfig();
/**
* Named Ollama configurations
*/
Map<String, OllamaConfig> namedConfig();
}Configuration Prefix: quarkus.langchain4j.ollama
Phase: RUN_TIME - Properties are read at application startup
Methods:
defaultConfig() - Returns the default (unnamed) Ollama configurationnamedConfig() - Returns map of named Ollama configurations keyed by configuration nameThe OllamaConfig interface defines connection and behavioral settings for an Ollama instance.
public interface OllamaConfig {
/**
* Base URL where Ollama is running
*/
Optional<String> baseUrl();
/**
* Named TLS configuration to apply
*/
Optional<String> tlsConfigurationName();
/**
* Request timeout
*/
Optional<Duration> timeout();
/**
* Whether to log requests
*/
Optional<Boolean> logRequests();
/**
* Whether to log responses
*/
Optional<Boolean> logResponses();
/**
* Whether to log requests as cURL commands
*/
Optional<Boolean> logRequestsCurl();
/**
* Whether to enable the integration
*/
@WithDefault("true")
Boolean enableIntegration();
/**
* Chat model configuration
*/
ChatModelConfig chatModel();
/**
* Embedding model configuration
*/
EmbeddingModelConfig embeddingModel();
}Connection Settings:
baseUrl() - Ollama service endpoint (default: http://localhost:11434 if not configured)tlsConfigurationName() - Named TLS configuration for secure connectionstimeout() - Request timeout duration (default: inherited from quarkus.langchain4j.timeout, typically 10s)Logging Settings:
logRequests() - Enable request logging (default: inherited from quarkus.langchain4j.log-requests, typically false)logResponses() - Enable response logging (default: inherited from quarkus.langchain4j.log-responses, typically false)logRequestsCurl() - Log requests as cURL commands for debugging (default: inherited from quarkus.langchain4j.log-requests-curl, typically false)Integration Control:
enableIntegration() - Master switch to enable/disable Ollama integration (default: true)Model Configuration:
chatModel() - Returns chat model runtime configurationembeddingModel() - Returns embedding model runtime configurationRuntime configuration for chat model behavior and parameters.
package io.quarkiverse.langchain4j.ollama.runtime.config;
import java.util.List;
import java.util.Optional;
import java.util.OptionalInt;
import io.quarkus.runtime.annotations.ConfigDocDefault;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;
@ConfigGroup
public interface ChatModelConfig {
/**
* Model temperature (0-2). Controls randomness in responses.
* Lower values make responses more deterministic, higher values more creative.
*/
@WithDefault("${quarkus.langchain4j.temperature:0.8}")
@ConfigDocDefault("0.8")
Double temperature();
/**
* Maximum number of tokens to predict
*/
OptionalInt numPredict();
/**
* Stop sequences - model stops generating when these sequences are encountered
*/
Optional<List<String>> stop();
/**
* Top-p sampling parameter (0-1). Controls diversity via nucleus sampling.
*/
@WithDefault("0.9")
Double topP();
/**
* Top-k sampling parameter. Limits vocabulary to top k tokens by probability.
*/
@WithDefault("40")
Integer topK();
/**
* Random seed for reproducible results. Same seed produces same output.
*/
Optional<Integer> seed();
/**
* Response format. Use "json" for JSON output or provide JSON schema.
*/
Optional<String> format();
/**
* Whether to log requests for this model
*/
@WithDefault("false")
Optional<Boolean> logRequests();
/**
* Whether to log responses for this model
*/
@WithDefault("false")
Optional<Boolean> logResponses();
}Sampling Parameters:
temperature() - Controls randomness (0.0 = deterministic, 2.0 = very creative). Default: 0.8topP() - Nucleus sampling threshold (default: 0.9)topK() - Top-k sampling limit (default: 40)seed() - Random seed for reproducibility (default: none, non-deterministic)Generation Control:
numPredict() - Maximum tokens to generate (default: no limit)stop() - Stop sequences that halt generation (default: none)format() - Response format specification (default: none, free text)Logging:
logRequests() - Override request logging for this model (default: false)logResponses() - Override response logging for this model (default: false)Runtime configuration for embedding model behavior.
package io.quarkiverse.langchain4j.ollama.runtime.config;
import java.util.List;
import java.util.Optional;
import io.quarkus.runtime.annotations.ConfigDocDefault;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;
@ConfigGroup
public interface EmbeddingModelConfig {
/**
* Model temperature (0-2)
*/
@WithDefault("${quarkus.langchain4j.temperature:0.8}")
@ConfigDocDefault("0.8")
Double temperature();
/**
* Maximum number of tokens to predict
*/
@WithDefault("128")
Integer numPredict();
/**
* Stop sequences
*/
Optional<List<String>> stop();
/**
* Top-p sampling parameter (0-1)
*/
@WithDefault("0.9")
Double topP();
/**
* Top-k sampling parameter
*/
@WithDefault("40")
Integer topK();
/**
* Whether to log requests for this model
*/
@WithDefault("false")
Optional<Boolean> logRequests();
/**
* Whether to log responses for this model
*/
@WithDefault("false")
Optional<Boolean> logResponses();
}Sampling Parameters:
temperature() - Controls randomness (default: 0.8)topP() - Nucleus sampling threshold (default: 0.9)topK() - Top-k sampling limit (default: 40)Generation Control:
numPredict() - Maximum tokens to process (default: 128)stop() - Stop sequences (default: none)Logging:
logRequests() - Override request logging for this model (default: false)logResponses() - Override response logging for this model (default: false)Fixed runtime configuration is read during build but can be accessed at runtime. It contains properties that require application rebuild to change.
package io.quarkiverse.langchain4j.ollama.runtime.config;
import java.util.Map;
import io.quarkus.runtime.annotations.ConfigPhase;
import io.quarkus.runtime.annotations.ConfigRoot;
import io.smallrye.config.ConfigMapping;
@ConfigRoot(phase = ConfigPhase.BUILD_AND_RUN_TIME_FIXED)
@ConfigMapping(prefix = "quarkus.langchain4j.ollama")
public interface LangChain4jOllamaFixedRuntimeConfig {
/**
* Default Ollama fixed configuration
*/
OllamaConfig defaultConfig();
/**
* Named Ollama fixed configurations
*/
Map<String, OllamaConfig> namedConfig();
}Phase: BUILD_AND_RUN_TIME_FIXED - Read during build, accessible at runtime
Inner Configuration:
public interface OllamaConfig {
/**
* Chat model fixed configuration
*/
ChatModelFixedRuntimeConfig chatModel();
/**
* Embedding model fixed configuration
*/
EmbeddingModelFixedRuntimeConfig embeddingModel();
}Fixed runtime configuration for chat models.
package io.quarkiverse.langchain4j.ollama.runtime.config;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;
@ConfigGroup
public interface ChatModelFixedRuntimeConfig {
/**
* Model ID to use for chat. Default is llama3.2.
*/
@WithDefault("llama3.2")
String modelId();
}Property: quarkus.langchain4j.ollama.chat-model.model-id
Type: String
Default: llama3.2
Description: Specifies which Ollama model to use for chat operations. Common values include:
llama3.2 - Default Llama 3.2 modelllama3.1 - Llama 3.1 modelllama3 - Llama 3 base modelmistral - Mistral modelcodellama - Code-specialized Llama modelFixed runtime configuration for embedding models.
package io.quarkiverse.langchain4j.ollama.runtime.config;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;
@ConfigGroup
public interface EmbeddingModelFixedRuntimeConfig {
/**
* Model ID to use for embeddings. Default is nomic-embed-text.
*/
@WithDefault("nomic-embed-text")
String modelId();
}Property: quarkus.langchain4j.ollama.embedding-model.model-id
Type: String
Default: nomic-embed-text
Description: Specifies which Ollama model to use for generating embeddings. Common values include:
nomic-embed-text - Default Nomic embedding modelmxbai-embed-large - MixedBread.ai large embedding modelall-minilm - Sentence transformer model# Default Ollama instance
quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.chat-model.temperature=0.7
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text# Default instance
quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
# Production instance
quarkus.langchain4j.ollama.prod.base-url=https://ollama.example.com
quarkus.langchain4j.ollama.prod.chat-model.model-id=llama3.1
quarkus.langchain4j.ollama.prod.timeout=30s
# Code generation instance
quarkus.langchain4j.ollama.codegen.base-url=http://localhost:11434
quarkus.langchain4j.ollama.codegen.chat-model.model-id=codellama
quarkus.langchain4j.ollama.codegen.chat-model.temperature=0.2# Connection settings
quarkus.langchain4j.ollama.base-url=https://ollama.example.com:11434
quarkus.langchain4j.ollama.tls-configuration-name=ollama-tls
quarkus.langchain4j.ollama.timeout=60s
# Chat model settings
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.chat-model.temperature=0.8
quarkus.langchain4j.ollama.chat-model.top-p=0.9
quarkus.langchain4j.ollama.chat-model.top-k=40
quarkus.langchain4j.ollama.chat-model.num-predict=2048
quarkus.langchain4j.ollama.chat-model.seed=42
quarkus.langchain4j.ollama.chat-model.format=json
quarkus.langchain4j.ollama.chat-model.stop=</s>,<|endoftext|>
# Embedding model settings
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text
quarkus.langchain4j.ollama.embedding-model.temperature=0.0
quarkus.langchain4j.ollama.embedding-model.num-predict=512
# Logging
quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=false
quarkus.langchain4j.ollama.log-requests-curl=true# Development (application.properties)
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.log-requests=true
# Production (application-prod.properties or environment variables)
%prod.quarkus.langchain4j.ollama.base-url=https://ollama-prod.example.com
%prod.quarkus.langchain4j.ollama.timeout=30s
%prod.quarkus.langchain4j.ollama.log-requests=false
%prod.quarkus.langchain4j.ollama.log-responses=false| Property | Type | Default | Phase | Description |
|---|---|---|---|---|
quarkus.langchain4j.ollama.base-url | Optional<String> | http://localhost:11434 | RUN_TIME | Ollama service URL |
quarkus.langchain4j.ollama.tls-configuration-name | Optional<String> | (none) | RUN_TIME | Named TLS configuration |
quarkus.langchain4j.ollama.timeout | Optional<Duration> | 10s | RUN_TIME | Request timeout |
quarkus.langchain4j.ollama.enable-integration | Boolean | true | RUN_TIME | Enable Ollama integration |
| Property | Type | Default | Phase | Description |
|---|---|---|---|---|
quarkus.langchain4j.ollama.chat-model.model-id | String | llama3.2 | FIXED | Model identifier |
quarkus.langchain4j.ollama.chat-model.temperature | Double | 0.8 | RUN_TIME | Sampling temperature (0-2) |
quarkus.langchain4j.ollama.chat-model.top-p | Double | 0.9 | RUN_TIME | Nucleus sampling threshold |
quarkus.langchain4j.ollama.chat-model.top-k | Integer | 40 | RUN_TIME | Top-k sampling limit |
quarkus.langchain4j.ollama.chat-model.num-predict | OptionalInt | (unlimited) | RUN_TIME | Max tokens to generate |
quarkus.langchain4j.ollama.chat-model.seed | Optional<Integer> | (none) | RUN_TIME | Random seed |
quarkus.langchain4j.ollama.chat-model.format | Optional<String> | (none) | RUN_TIME | Response format |
quarkus.langchain4j.ollama.chat-model.stop | Optional<List<String>> | (none) | RUN_TIME | Stop sequences |
| Property | Type | Default | Phase | Description |
|---|---|---|---|---|
quarkus.langchain4j.ollama.embedding-model.model-id | String | nomic-embed-text | FIXED | Model identifier |
quarkus.langchain4j.ollama.embedding-model.temperature | Double | 0.8 | RUN_TIME | Sampling temperature |
quarkus.langchain4j.ollama.embedding-model.top-p | Double | 0.9 | RUN_TIME | Nucleus sampling threshold |
quarkus.langchain4j.ollama.embedding-model.top-k | Integer | 40 | RUN_TIME | Top-k sampling limit |
quarkus.langchain4j.ollama.embedding-model.num-predict | Integer | 128 | RUN_TIME | Max tokens to process |
quarkus.langchain4j.ollama.embedding-model.stop | Optional<List<String>> | (none) | RUN_TIME | Stop sequences |
| Property | Type | Default | Phase | Description |
|---|---|---|---|---|
quarkus.langchain4j.ollama.log-requests | Optional<Boolean> | false | RUN_TIME | Log all requests |
quarkus.langchain4j.ollama.log-responses | Optional<Boolean> | false | RUN_TIME | Log all responses |
quarkus.langchain4j.ollama.log-requests-curl | Optional<Boolean> | false | RUN_TIME | Log requests as cURL |
quarkus.langchain4j.ollama.chat-model.log-requests | Optional<Boolean> | false | RUN_TIME | Log chat requests |
quarkus.langchain4j.ollama.chat-model.log-responses | Optional<Boolean> | false | RUN_TIME | Log chat responses |
quarkus.langchain4j.ollama.embedding-model.log-requests | Optional<Boolean> | false | RUN_TIME | Log embedding requests |
quarkus.langchain4j.ollama.embedding-model.log-responses | Optional<Boolean> | false | RUN_TIME | Log embedding responses |
All properties support named instances using the pattern:
quarkus.langchain4j.ollama.<config-name>.<property>For example:
quarkus.langchain4j.ollama.my-instance.base-url=http://localhost:11434
quarkus.langchain4j.ollama.my-instance.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.my-instance.chat-model.temperature=0.7All properties can be configured via environment variables using uppercase with underscores:
QUARKUS_LANGCHAIN4J_OLLAMA_BASE_URL=http://ollama:11434
QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_MODEL_ID=llama3.2
QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TEMPERATURE=0.7
QUARKUS_LANGCHAIN4J_OLLAMA_TIMEOUT=30smodel-id) require application rebuild to changebase-url when enabled in development modeInstall with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama-deployment