CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama-deployment

Quarkus extension deployment module for integrating Ollama LLM models with Quarkus applications through the LangChain4j framework

Overview
Eval results
Files

runtime-configuration.mddocs/

Runtime Configuration

The Quarkus LangChain4j Ollama extension provides comprehensive runtime configuration for connecting to Ollama services and configuring model behavior. Runtime configuration properties are processed when the application starts and can be changed without rebuilding the application.

Configuration Overview

Runtime configuration operates at two phases:

  1. Runtime Phase (RUN_TIME): Properties that can change between application restarts (base URL, timeouts, logging, model parameters)
  2. Fixed Runtime Phase (BUILD_AND_RUN_TIME_FIXED): Properties that are read during build but can also be accessed at runtime (model IDs)

Runtime Configuration Root

The root configuration interface provides access to default and named Ollama configurations.

package io.quarkiverse.langchain4j.ollama.runtime.config;

import java.util.Map;
import io.quarkus.runtime.annotations.ConfigRoot;
import io.smallrye.config.ConfigMapping;
import static io.quarkus.runtime.annotations.ConfigPhase.RUN_TIME;

@ConfigRoot(phase = RUN_TIME)
@ConfigMapping(prefix = "quarkus.langchain4j.ollama")
public interface LangChain4jOllamaConfig {
    /**
     * Default Ollama configuration
     */
    OllamaConfig defaultConfig();

    /**
     * Named Ollama configurations
     */
    Map<String, OllamaConfig> namedConfig();
}

Configuration Prefix: quarkus.langchain4j.ollama

Phase: RUN_TIME - Properties are read at application startup

Methods:

  • defaultConfig() - Returns the default (unnamed) Ollama configuration
  • namedConfig() - Returns map of named Ollama configurations keyed by configuration name

Ollama Configuration

The OllamaConfig interface defines connection and behavioral settings for an Ollama instance.

public interface OllamaConfig {
    /**
     * Base URL where Ollama is running
     */
    Optional<String> baseUrl();

    /**
     * Named TLS configuration to apply
     */
    Optional<String> tlsConfigurationName();

    /**
     * Request timeout
     */
    Optional<Duration> timeout();

    /**
     * Whether to log requests
     */
    Optional<Boolean> logRequests();

    /**
     * Whether to log responses
     */
    Optional<Boolean> logResponses();

    /**
     * Whether to log requests as cURL commands
     */
    Optional<Boolean> logRequestsCurl();

    /**
     * Whether to enable the integration
     */
    @WithDefault("true")
    Boolean enableIntegration();

    /**
     * Chat model configuration
     */
    ChatModelConfig chatModel();

    /**
     * Embedding model configuration
     */
    EmbeddingModelConfig embeddingModel();
}

Connection Settings:

  • baseUrl() - Ollama service endpoint (default: http://localhost:11434 if not configured)
  • tlsConfigurationName() - Named TLS configuration for secure connections
  • timeout() - Request timeout duration (default: inherited from quarkus.langchain4j.timeout, typically 10s)

Logging Settings:

  • logRequests() - Enable request logging (default: inherited from quarkus.langchain4j.log-requests, typically false)
  • logResponses() - Enable response logging (default: inherited from quarkus.langchain4j.log-responses, typically false)
  • logRequestsCurl() - Log requests as cURL commands for debugging (default: inherited from quarkus.langchain4j.log-requests-curl, typically false)

Integration Control:

  • enableIntegration() - Master switch to enable/disable Ollama integration (default: true)

Model Configuration:

  • chatModel() - Returns chat model runtime configuration
  • embeddingModel() - Returns embedding model runtime configuration

Chat Model Configuration

Runtime configuration for chat model behavior and parameters.

package io.quarkiverse.langchain4j.ollama.runtime.config;

import java.util.List;
import java.util.Optional;
import java.util.OptionalInt;
import io.quarkus.runtime.annotations.ConfigDocDefault;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;

@ConfigGroup
public interface ChatModelConfig {
    /**
     * Model temperature (0-2). Controls randomness in responses.
     * Lower values make responses more deterministic, higher values more creative.
     */
    @WithDefault("${quarkus.langchain4j.temperature:0.8}")
    @ConfigDocDefault("0.8")
    Double temperature();

    /**
     * Maximum number of tokens to predict
     */
    OptionalInt numPredict();

    /**
     * Stop sequences - model stops generating when these sequences are encountered
     */
    Optional<List<String>> stop();

    /**
     * Top-p sampling parameter (0-1). Controls diversity via nucleus sampling.
     */
    @WithDefault("0.9")
    Double topP();

    /**
     * Top-k sampling parameter. Limits vocabulary to top k tokens by probability.
     */
    @WithDefault("40")
    Integer topK();

    /**
     * Random seed for reproducible results. Same seed produces same output.
     */
    Optional<Integer> seed();

    /**
     * Response format. Use "json" for JSON output or provide JSON schema.
     */
    Optional<String> format();

    /**
     * Whether to log requests for this model
     */
    @WithDefault("false")
    Optional<Boolean> logRequests();

    /**
     * Whether to log responses for this model
     */
    @WithDefault("false")
    Optional<Boolean> logResponses();
}

Sampling Parameters:

  • temperature() - Controls randomness (0.0 = deterministic, 2.0 = very creative). Default: 0.8
  • topP() - Nucleus sampling threshold (default: 0.9)
  • topK() - Top-k sampling limit (default: 40)
  • seed() - Random seed for reproducibility (default: none, non-deterministic)

Generation Control:

  • numPredict() - Maximum tokens to generate (default: no limit)
  • stop() - Stop sequences that halt generation (default: none)
  • format() - Response format specification (default: none, free text)

Logging:

  • logRequests() - Override request logging for this model (default: false)
  • logResponses() - Override response logging for this model (default: false)

Embedding Model Configuration

Runtime configuration for embedding model behavior.

package io.quarkiverse.langchain4j.ollama.runtime.config;

import java.util.List;
import java.util.Optional;
import io.quarkus.runtime.annotations.ConfigDocDefault;
import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;

@ConfigGroup
public interface EmbeddingModelConfig {
    /**
     * Model temperature (0-2)
     */
    @WithDefault("${quarkus.langchain4j.temperature:0.8}")
    @ConfigDocDefault("0.8")
    Double temperature();

    /**
     * Maximum number of tokens to predict
     */
    @WithDefault("128")
    Integer numPredict();

    /**
     * Stop sequences
     */
    Optional<List<String>> stop();

    /**
     * Top-p sampling parameter (0-1)
     */
    @WithDefault("0.9")
    Double topP();

    /**
     * Top-k sampling parameter
     */
    @WithDefault("40")
    Integer topK();

    /**
     * Whether to log requests for this model
     */
    @WithDefault("false")
    Optional<Boolean> logRequests();

    /**
     * Whether to log responses for this model
     */
    @WithDefault("false")
    Optional<Boolean> logResponses();
}

Sampling Parameters:

  • temperature() - Controls randomness (default: 0.8)
  • topP() - Nucleus sampling threshold (default: 0.9)
  • topK() - Top-k sampling limit (default: 40)

Generation Control:

  • numPredict() - Maximum tokens to process (default: 128)
  • stop() - Stop sequences (default: none)

Logging:

  • logRequests() - Override request logging for this model (default: false)
  • logResponses() - Override response logging for this model (default: false)

Fixed Runtime Configuration

Fixed runtime configuration is read during build but can be accessed at runtime. It contains properties that require application rebuild to change.

package io.quarkiverse.langchain4j.ollama.runtime.config;

import java.util.Map;
import io.quarkus.runtime.annotations.ConfigPhase;
import io.quarkus.runtime.annotations.ConfigRoot;
import io.smallrye.config.ConfigMapping;

@ConfigRoot(phase = ConfigPhase.BUILD_AND_RUN_TIME_FIXED)
@ConfigMapping(prefix = "quarkus.langchain4j.ollama")
public interface LangChain4jOllamaFixedRuntimeConfig {
    /**
     * Default Ollama fixed configuration
     */
    OllamaConfig defaultConfig();

    /**
     * Named Ollama fixed configurations
     */
    Map<String, OllamaConfig> namedConfig();
}

Phase: BUILD_AND_RUN_TIME_FIXED - Read during build, accessible at runtime

Inner Configuration:

public interface OllamaConfig {
    /**
     * Chat model fixed configuration
     */
    ChatModelFixedRuntimeConfig chatModel();

    /**
     * Embedding model fixed configuration
     */
    EmbeddingModelFixedRuntimeConfig embeddingModel();
}

Chat Model Fixed Configuration

Fixed runtime configuration for chat models.

package io.quarkiverse.langchain4j.ollama.runtime.config;

import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;

@ConfigGroup
public interface ChatModelFixedRuntimeConfig {
    /**
     * Model ID to use for chat. Default is llama3.2.
     */
    @WithDefault("llama3.2")
    String modelId();
}

Property: quarkus.langchain4j.ollama.chat-model.model-id

Type: String

Default: llama3.2

Description: Specifies which Ollama model to use for chat operations. Common values include:

  • llama3.2 - Default Llama 3.2 model
  • llama3.1 - Llama 3.1 model
  • llama3 - Llama 3 base model
  • mistral - Mistral model
  • codellama - Code-specialized Llama model
  • Any other model available in your Ollama installation

Embedding Model Fixed Configuration

Fixed runtime configuration for embedding models.

package io.quarkiverse.langchain4j.ollama.runtime.config;

import io.quarkus.runtime.annotations.ConfigGroup;
import io.smallrye.config.WithDefault;

@ConfigGroup
public interface EmbeddingModelFixedRuntimeConfig {
    /**
     * Model ID to use for embeddings. Default is nomic-embed-text.
     */
    @WithDefault("nomic-embed-text")
    String modelId();
}

Property: quarkus.langchain4j.ollama.embedding-model.model-id

Type: String

Default: nomic-embed-text

Description: Specifies which Ollama model to use for generating embeddings. Common values include:

  • nomic-embed-text - Default Nomic embedding model
  • mxbai-embed-large - MixedBread.ai large embedding model
  • all-minilm - Sentence transformer model
  • Any other embedding model available in your Ollama installation

Configuration Examples

Basic Configuration

# Default Ollama instance
quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.chat-model.temperature=0.7
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text

Named Configuration

# Default instance
quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2

# Production instance
quarkus.langchain4j.ollama.prod.base-url=https://ollama.example.com
quarkus.langchain4j.ollama.prod.chat-model.model-id=llama3.1
quarkus.langchain4j.ollama.prod.timeout=30s

# Code generation instance
quarkus.langchain4j.ollama.codegen.base-url=http://localhost:11434
quarkus.langchain4j.ollama.codegen.chat-model.model-id=codellama
quarkus.langchain4j.ollama.codegen.chat-model.temperature=0.2

Advanced Configuration

# Connection settings
quarkus.langchain4j.ollama.base-url=https://ollama.example.com:11434
quarkus.langchain4j.ollama.tls-configuration-name=ollama-tls
quarkus.langchain4j.ollama.timeout=60s

# Chat model settings
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.chat-model.temperature=0.8
quarkus.langchain4j.ollama.chat-model.top-p=0.9
quarkus.langchain4j.ollama.chat-model.top-k=40
quarkus.langchain4j.ollama.chat-model.num-predict=2048
quarkus.langchain4j.ollama.chat-model.seed=42
quarkus.langchain4j.ollama.chat-model.format=json
quarkus.langchain4j.ollama.chat-model.stop=</s>,<|endoftext|>

# Embedding model settings
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text
quarkus.langchain4j.ollama.embedding-model.temperature=0.0
quarkus.langchain4j.ollama.embedding-model.num-predict=512

# Logging
quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=false
quarkus.langchain4j.ollama.log-requests-curl=true

Development vs Production Configuration

# Development (application.properties)
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.log-requests=true

# Production (application-prod.properties or environment variables)
%prod.quarkus.langchain4j.ollama.base-url=https://ollama-prod.example.com
%prod.quarkus.langchain4j.ollama.timeout=30s
%prod.quarkus.langchain4j.ollama.log-requests=false
%prod.quarkus.langchain4j.ollama.log-responses=false

Complete Property Reference

Connection Properties

PropertyTypeDefaultPhaseDescription
quarkus.langchain4j.ollama.base-urlOptional<String>http://localhost:11434RUN_TIMEOllama service URL
quarkus.langchain4j.ollama.tls-configuration-nameOptional<String>(none)RUN_TIMENamed TLS configuration
quarkus.langchain4j.ollama.timeoutOptional<Duration>10sRUN_TIMERequest timeout
quarkus.langchain4j.ollama.enable-integrationBooleantrueRUN_TIMEEnable Ollama integration

Chat Model Properties

PropertyTypeDefaultPhaseDescription
quarkus.langchain4j.ollama.chat-model.model-idStringllama3.2FIXEDModel identifier
quarkus.langchain4j.ollama.chat-model.temperatureDouble0.8RUN_TIMESampling temperature (0-2)
quarkus.langchain4j.ollama.chat-model.top-pDouble0.9RUN_TIMENucleus sampling threshold
quarkus.langchain4j.ollama.chat-model.top-kInteger40RUN_TIMETop-k sampling limit
quarkus.langchain4j.ollama.chat-model.num-predictOptionalInt(unlimited)RUN_TIMEMax tokens to generate
quarkus.langchain4j.ollama.chat-model.seedOptional<Integer>(none)RUN_TIMERandom seed
quarkus.langchain4j.ollama.chat-model.formatOptional<String>(none)RUN_TIMEResponse format
quarkus.langchain4j.ollama.chat-model.stopOptional<List<String>>(none)RUN_TIMEStop sequences

Embedding Model Properties

PropertyTypeDefaultPhaseDescription
quarkus.langchain4j.ollama.embedding-model.model-idStringnomic-embed-textFIXEDModel identifier
quarkus.langchain4j.ollama.embedding-model.temperatureDouble0.8RUN_TIMESampling temperature
quarkus.langchain4j.ollama.embedding-model.top-pDouble0.9RUN_TIMENucleus sampling threshold
quarkus.langchain4j.ollama.embedding-model.top-kInteger40RUN_TIMETop-k sampling limit
quarkus.langchain4j.ollama.embedding-model.num-predictInteger128RUN_TIMEMax tokens to process
quarkus.langchain4j.ollama.embedding-model.stopOptional<List<String>>(none)RUN_TIMEStop sequences

Logging Properties

PropertyTypeDefaultPhaseDescription
quarkus.langchain4j.ollama.log-requestsOptional<Boolean>falseRUN_TIMELog all requests
quarkus.langchain4j.ollama.log-responsesOptional<Boolean>falseRUN_TIMELog all responses
quarkus.langchain4j.ollama.log-requests-curlOptional<Boolean>falseRUN_TIMELog requests as cURL
quarkus.langchain4j.ollama.chat-model.log-requestsOptional<Boolean>falseRUN_TIMELog chat requests
quarkus.langchain4j.ollama.chat-model.log-responsesOptional<Boolean>falseRUN_TIMELog chat responses
quarkus.langchain4j.ollama.embedding-model.log-requestsOptional<Boolean>falseRUN_TIMELog embedding requests
quarkus.langchain4j.ollama.embedding-model.log-responsesOptional<Boolean>falseRUN_TIMELog embedding responses

Named Configuration Pattern

All properties support named instances using the pattern:

quarkus.langchain4j.ollama.<config-name>.<property>

For example:

quarkus.langchain4j.ollama.my-instance.base-url=http://localhost:11434
quarkus.langchain4j.ollama.my-instance.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.my-instance.chat-model.temperature=0.7

Environment Variable Configuration

All properties can be configured via environment variables using uppercase with underscores:

QUARKUS_LANGCHAIN4J_OLLAMA_BASE_URL=http://ollama:11434
QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_MODEL_ID=llama3.2
QUARKUS_LANGCHAIN4J_OLLAMA_CHAT_MODEL_TEMPERATURE=0.7
QUARKUS_LANGCHAIN4J_OLLAMA_TIMEOUT=30s

Notes

  • Runtime configuration properties can be changed without rebuilding the application
  • Fixed runtime properties (model-id) require application rebuild to change
  • Named configurations allow multiple Ollama instances with different settings
  • Logging properties can be overridden at the model level for fine-grained control
  • Temperature values typically range from 0.0 (deterministic) to 1.0 (creative), but can go up to 2.0
  • DevServices automatically configure base-url when enabled in development mode

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama-deployment

docs

architecture.md

build-step-processing.md

build-time-configuration.md

devservices.md

index.md

native-image-support.md

runtime-configuration.md

runtime-model-types.md

synthetic-beans.md

tile.json