CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama-deployment

Quarkus extension deployment module for integrating Ollama LLM models with Quarkus applications through the LangChain4j framework

Overview
Eval results
Files

runtime-model-types.mddocs/

Runtime Model Types

The Ollama runtime module provides type definitions and data structures used for interacting with Ollama models. These types are part of the runtime API and are used internally by the model implementations.

This document covers:

  1. Runtime Bean APIs - The public API methods available on injected ChatModel, StreamingChatModel, and EmbeddingModel beans
  2. Internal Model Types - The Role enum and Options record used internally by the implementation

Runtime Bean APIs

When you inject ChatModel, StreamingChatModel, or EmbeddingModel beans into your application, you interact with the LangChain4j interfaces. These interfaces provide the actual methods you'll call to use the models.

ChatModel Interface

The ChatModel interface provides synchronous text generation methods.

package dev.langchain4j.model.chat;

import dev.langchain4j.data.message.*;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;

/**
 * Synchronous chat model interface for text generation.
 * Beans of this type are created by the Ollama deployment module
 * and can be injected into your application.
 */
public interface ChatModel {
    /**
     * Simple text-to-text generation.
     * Takes a user message string and returns the AI response as a string.
     *
     * @param userMessage the user's input text
     * @return the AI-generated response text
     */
    String chat(String userMessage);

    /**
     * Structured request-based generation with advanced options.
     * Supports tool specifications, response formats, and detailed parameters.
     *
     * @param request the chat request with messages and parameters
     * @return the chat response with AI message and metadata
     */
    ChatResponse chat(ChatRequest request);

    /**
     * Multi-message generation with system and user context.
     * Allows explicit system message for instructions/context.
     *
     * @param systemMessage instructions or context for the model
     * @param userMessage the user's input
     * @return the chat response with AI message and metadata
     */
    ChatResponse chat(SystemMessage systemMessage, UserMessage userMessage);
}

Package: dev.langchain4j.model.chat

Methods:

MethodParametersReturn TypeDescription
chatString userMessageStringSimple text input/output. Most common use case.
chatChatRequest requestChatResponseAdvanced features: tools, formats, parameters.
chatSystemMessage, UserMessageChatResponseExplicit system context with user message.

Usage Examples:

import jakarta.inject.Inject;
import dev.langchain4j.model.chat.ChatModel;

@ApplicationScoped
public class MyService {
    @Inject
    ChatModel chatModel;

    // Simple text generation
    public String simpleChat(String userInput) {
        return chatModel.chat(userInput);
    }

    // With system message
    public String chatWithContext(String context, String userInput) {
        SystemMessage system = SystemMessage.from(context);
        UserMessage user = UserMessage.from(userInput);
        ChatResponse response = chatModel.chat(system, user);
        return response.aiMessage().text();
    }

    // Advanced request with parameters
    public String advancedChat(String userInput) {
        ChatRequest request = ChatRequest.builder()
            .messages(List.of(UserMessage.from(userInput)))
            .parameters(ChatRequestParameters.builder()
                .temperature(0.7)
                .maxTokens(500)
                .build())
            .build();
        ChatResponse response = chatModel.chat(request);
        return response.aiMessage().text();
    }
}

Required Imports:

import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.request.ChatRequestParameters;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;

StreamingChatModel Interface

The StreamingChatModel interface provides asynchronous streaming text generation.

package dev.langchain4j.model.chat;

import dev.langchain4j.data.message.*;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import java.util.List;

/**
 * Streaming chat model interface for real-time text generation.
 * Beans of this type are created by the Ollama deployment module
 * and can be injected into your application.
 */
public interface StreamingChatModel {
    /**
     * Simple streaming generation from user message.
     * Response is delivered incrementally via the handler callback.
     *
     * @param userMessage the user's input text
     * @param handler callback to receive streaming response chunks
     */
    void chat(String userMessage, StreamingChatResponseHandler handler);

    /**
     * Structured streaming request with advanced options.
     * Supports tool specifications, response formats, and detailed parameters.
     *
     * @param request the chat request with messages and parameters
     * @param handler callback to receive streaming response chunks
     */
    void chat(ChatRequest request, StreamingChatResponseHandler handler);

    /**
     * Streaming generation from a list of chat messages.
     *
     * @param messages list of messages in the conversation
     * @param handler callback to receive streaming response chunks
     */
    void chat(List<ChatMessage> messages, StreamingChatResponseHandler handler);
}

Package: dev.langchain4j.model.chat

Methods:

MethodParametersReturn TypeDescription
chatString, StreamingChatResponseHandlervoidSimple streaming text generation.
chatChatRequest, StreamingChatResponseHandlervoidAdvanced streaming with parameters.
chatList<ChatMessage>, StreamingChatResponseHandlervoidStream from message list.

StreamingChatResponseHandler Interface:

package dev.langchain4j.model.chat.response;

/**
 * Handler for receiving streaming chat responses.
 */
public interface StreamingChatResponseHandler {
    /**
     * Called for each partial response chunk as it arrives.
     * @param partialResponse the text chunk received
     */
    void onPartialResponse(String partialResponse);

    /**
     * Called when the complete response is ready.
     * @param completeResponse the final chat response with metadata
     */
    void onCompleteResponse(ChatResponse completeResponse);

    /**
     * Called if an error occurs during streaming.
     * @param error the error that occurred
     */
    void onError(Throwable error);
}

Usage Examples:

import jakarta.inject.Inject;
import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;

@ApplicationScoped
public class MyStreamingService {
    @Inject
    StreamingChatModel streamingChatModel;

    // Simple streaming
    public void streamResponse(String userInput) {
        streamingChatModel.chat(userInput, new StreamingChatResponseHandler() {
            @Override
            public void onPartialResponse(String partialResponse) {
                // Process each chunk as it arrives
                System.out.print(partialResponse);
            }

            @Override
            public void onCompleteResponse(ChatResponse completeResponse) {
                // Called when generation completes
                System.out.println("\nComplete!");
                System.out.println("Full text: " + completeResponse.aiMessage().text());
            }

            @Override
            public void onError(Throwable error) {
                // Handle errors
                System.err.println("Error: " + error.getMessage());
            }
        });
    }

    // Accumulating streaming response
    public CompletableFuture<String> streamAndAccumulate(String userInput) {
        CompletableFuture<String> future = new CompletableFuture<>();
        StringBuilder accumulated = new StringBuilder();

        streamingChatModel.chat(userInput, new StreamingChatResponseHandler() {
            @Override
            public void onPartialResponse(String partialResponse) {
                accumulated.append(partialResponse);
            }

            @Override
            public void onCompleteResponse(ChatResponse completeResponse) {
                future.complete(accumulated.toString());
            }

            @Override
            public void onError(Throwable error) {
                future.completeExceptionally(error);
            }
        });

        return future;
    }
}

Required Imports:

import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;

EmbeddingModel Interface

The EmbeddingModel interface provides text-to-vector embedding generation.

package dev.langchain4j.model.embedding;

import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import java.util.List;

/**
 * Embedding model interface for converting text to vector embeddings.
 * Beans of this type are created by the Ollama deployment module
 * and can be injected into your application.
 */
public interface EmbeddingModel {
    /**
     * Generate embedding for a single text.
     *
     * @param text the text to embed
     * @return response containing the embedding vector
     */
    Response<Embedding> embed(String text);

    /**
     * Generate embeddings for multiple texts in a batch.
     * More efficient than calling embed() multiple times.
     *
     * @param texts list of texts to embed
     * @return response containing list of embeddings
     */
    Response<List<Embedding>> embedAll(List<String> texts);
}

Package: dev.langchain4j.model.embedding

Methods:

MethodParametersReturn TypeDescription
embedString textResponse<Embedding>Single text to embedding vector.
embedAllList<String> textsResponse<List<Embedding>>Batch embedding generation.

Embedding Type:

public class Embedding {
    public float[] vector();        // Get embedding as float array
    public List<Float> vectorAsList(); // Get embedding as Float list
    public int dimension();         // Get embedding dimension
}

Response Type:

public class Response<T> {
    public T content();            // Get the main content (Embedding or List<Embedding>)
    public TokenUsage tokenUsage(); // Get token usage statistics
    public FinishReason finishReason(); // Get completion reason
}

Usage Examples:

import jakarta.inject.Inject;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;

@ApplicationScoped
public class MyEmbeddingService {
    @Inject
    EmbeddingModel embeddingModel;

    // Embed single text
    public float[] embedText(String text) {
        Response<Embedding> response = embeddingModel.embed(text);
        Embedding embedding = response.content();
        return embedding.vector();
    }

    // Embed multiple texts
    public List<float[]> embedTexts(List<String> texts) {
        Response<List<Embedding>> response = embeddingModel.embedAll(texts);
        List<Embedding> embeddings = response.content();
        return embeddings.stream()
            .map(Embedding::vector)
            .toList();
    }

    // Get embedding dimension
    public int getEmbeddingDimension() {
        Response<Embedding> response = embeddingModel.embed("test");
        return response.content().dimension();
    }

    // Compute similarity between two texts
    public double cosineSimilarity(String text1, String text2) {
        float[] vec1 = embedText(text1);
        float[] vec2 = embedText(text2);
        return computeCosineSimilarity(vec1, vec2);
    }

    private double computeCosineSimilarity(float[] a, float[] b) {
        double dotProduct = 0.0;
        double normA = 0.0;
        double normB = 0.0;
        for (int i = 0; i < a.length; i++) {
            dotProduct += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
    }
}

Required Imports:

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.model.output.TokenUsage;
import dev.langchain4j.model.output.FinishReason;

Named Model Configuration

When using named configurations, inject models with the @ModelName qualifier:

import jakarta.inject.Inject;
import io.quarkiverse.langchain4j.ModelName;
import dev.langchain4j.model.chat.ChatModel;

@ApplicationScoped
public class MultiModelService {
    // Default model
    @Inject
    ChatModel defaultModel;

    // Named model "creative"
    @Inject
    @ModelName("creative")
    ChatModel creativeModel;

    // Named model "precise"
    @Inject
    @ModelName("precise")
    ChatModel preciseModel;

    public String generateCreativeResponse(String input) {
        return creativeModel.chat(input);
    }

    public String generatePreciseResponse(String input) {
        return preciseModel.chat(input);
    }
}

Configuration for named models:

# Default model
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.chat-model.temperature=0.7

# Named model "creative"
quarkus.langchain4j.ollama.creative.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.creative.chat-model.temperature=1.2

# Named model "precise"
quarkus.langchain4j.ollama.precise.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.precise.chat-model.temperature=0.1

Internal Model Types

Role Enum

The Role enum defines the possible roles for messages in chat conversations.

package io.quarkiverse.langchain4j.ollama;

import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
import com.fasterxml.jackson.databind.annotation.JsonSerialize;

@JsonDeserialize(using = RoleDeserializer.class)
@JsonSerialize(using = RoleSerializer.class)
public enum Role {
    /**
     * System message role - for system instructions and context
     */
    SYSTEM,

    /**
     * User message role - for user input and questions
     */
    USER,

    /**
     * Assistant message role - for AI model responses
     */
    ASSISTANT,

    /**
     * Tool message role - for tool/function call results
     */
    TOOL
}

Package: io.quarkiverse.langchain4j.ollama

Values:

  • SYSTEM - System messages provide instructions, context, or behavior guidelines to the model
  • USER - User messages contain input from the end user
  • ASSISTANT - Assistant messages are responses generated by the AI model
  • TOOL - Tool messages contain results from function/tool executions

Serialization: Custom Jackson serializers/deserializers handle JSON conversion

Usage Context: This enum is used internally by the Ollama chat implementation when constructing message payloads. Application developers typically use LangChain4j's higher-level message types (SystemMessage, UserMessage, AiMessage, ToolExecutionResultMessage) which are automatically mapped to the appropriate Role values.

Options Record

The Options record encapsulates advanced model parameters for Ollama requests.

package io.quarkiverse.langchain4j.ollama;

import java.util.List;

/**
 * Advanced options for Ollama model requests
 */
public record Options(
    Double temperature,
    Integer topK,
    Double topP,
    Double repeatPenalty,
    Integer seed,
    Integer numPredict,
    Integer numCtx,
    List<String> stop
) {
    /**
     * Creates a new builder for Options
     */
    public static Builder builder() {
        return new Builder();
    }

    /**
     * Builder for constructing Options instances
     */
    public static class Builder {
        private Double temperature;
        private Integer topK;
        private Double topP;
        private Double repeatPenalty;
        private Integer seed;
        private Integer numPredict;
        private Integer numCtx;
        private List<String> stop;

        /**
         * Sets the temperature parameter (0-2).
         * Lower values make responses more deterministic, higher values more creative.
         *
         * @param temperature the temperature value
         * @return this builder
         */
        public Builder temperature(Double temperature) {
            this.temperature = temperature;
            return this;
        }

        /**
         * Sets the top-k sampling parameter.
         * Limits vocabulary to the top k most probable tokens.
         *
         * @param topK the top-k value
         * @return this builder
         */
        public Builder topK(Integer topK) {
            this.topK = topK;
            return this;
        }

        /**
         * Sets the top-p sampling parameter (0-1).
         * Controls diversity via nucleus sampling.
         *
         * @param topP the top-p value
         * @return this builder
         */
        public Builder topP(Double topP) {
            this.topP = topP;
            return this;
        }

        /**
         * Sets the repeat penalty parameter.
         * Penalizes repetition in generated text (1.0 = no penalty).
         *
         * @param repeatPenalty the repeat penalty value
         * @return this builder
         */
        public Builder repeatPenalty(Double repeatPenalty) {
            this.repeatPenalty = repeatPenalty;
            return this;
        }

        /**
         * Sets the random seed for reproducible results.
         * Same seed with same inputs produces same output.
         *
         * @param seed the seed value
         * @return this builder
         */
        public Builder seed(Integer seed) {
            this.seed = seed;
            return this;
        }

        /**
         * Sets the maximum number of tokens to predict/generate.
         *
         * @param numPredict the maximum token count
         * @return this builder
         */
        public Builder numPredict(Integer numPredict) {
            this.numPredict = numPredict;
            return this;
        }

        /**
         * Sets the context window size in tokens.
         * Determines how much previous context the model considers.
         *
         * @param numCtx the context window size
         * @return this builder
         */
        public Builder numCtx(Integer numCtx) {
            this.numCtx = numCtx;
            return this;
        }

        /**
         * Sets the stop sequences.
         * Model stops generating when any of these sequences is encountered.
         *
         * @param stop list of stop sequences
         * @return this builder
         */
        public Builder stop(List<String> stop) {
            this.stop = stop;
            return this;
        }

        /**
         * Builds the Options instance
         *
         * @return the constructed Options
         */
        public Options build() {
            return new Options(
                temperature,
                topK,
                topP,
                repeatPenalty,
                seed,
                numPredict,
                numCtx,
                stop
            );
        }
    }
}

Package: io.quarkiverse.langchain4j.ollama

Record Components:

ComponentTypeDescription
temperatureDoubleSampling temperature (0-2). Controls randomness.
topKIntegerTop-k sampling limit. Restricts token selection to k most probable.
topPDoubleTop-p/nucleus sampling threshold (0-1). Controls diversity.
repeatPenaltyDoubleRepeat penalty factor. Reduces repetition (1.0 = no penalty).
seedIntegerRandom seed for reproducibility.
numPredictIntegerMaximum tokens to generate.
numCtxIntegerContext window size in tokens.
stopList<String>Stop sequences that halt generation.

Static Methods:

  • builder() - Creates a new Builder instance for fluent construction

Builder Methods:

  • temperature(Double) - Sets temperature parameter
  • topK(Integer) - Sets top-k sampling limit
  • topP(Double) - Sets top-p sampling threshold
  • repeatPenalty(Double) - Sets repetition penalty
  • seed(Integer) - Sets random seed
  • numPredict(Integer) - Sets max tokens to generate
  • numCtx(Integer) - Sets context window size
  • stop(List<String>) - Sets stop sequences
  • build() - Constructs the Options instance

Usage Context: The Options record is used internally when the Ollama client constructs API requests. Most developers configure these parameters through the runtime configuration properties (ChatModelConfig, EmbeddingModelConfig) rather than constructing Options instances directly. The configuration system automatically builds Options instances from the configured properties.

Usage Examples

Working with Options (Internal Usage)

While most configuration is done through application properties, understanding the Options structure helps when debugging or implementing custom integrations:

// Example of how Options might be constructed (internal use)
Options options = Options.builder()
    .temperature(0.8)
    .topK(40)
    .topP(0.9)
    .seed(42)
    .numPredict(2048)
    .stop(List.of("</s>", "<|endoftext|>"))
    .build();

Configuration-Based Approach (Recommended)

Instead of working with Options directly, configure through application properties:

# These properties are automatically converted to Options internally
quarkus.langchain4j.ollama.chat-model.temperature=0.8
quarkus.langchain4j.ollama.chat-model.top-k=40
quarkus.langchain4j.ollama.chat-model.top-p=0.9
quarkus.langchain4j.ollama.chat-model.seed=42
quarkus.langchain4j.ollama.chat-model.num-predict=2048
quarkus.langchain4j.ollama.chat-model.stop=</s>,<|endoftext|>

Understanding Role Usage

The Role enum is used internally when processing messages:

// Application code uses LangChain4j message types
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;

// These are automatically mapped to appropriate Role values:
// SystemMessage -> Role.SYSTEM
// UserMessage -> Role.USER
// AiMessage -> Role.ASSISTANT
// ToolExecutionResultMessage -> Role.TOOL

// Application developers don't typically use Role directly

Parameter Guidelines

Temperature

  • Range: 0.0 to 2.0
  • Low values (0.0-0.3): More deterministic and focused responses. Best for tasks requiring accuracy.
  • Medium values (0.4-0.7): Balanced creativity and consistency. Good for most use cases.
  • High values (0.8-2.0): More creative and varied responses. Best for creative writing or brainstorming.

Top-K

  • Range: 1 to vocabulary size
  • Low values (1-10): Very focused, limited vocabulary. More predictable.
  • Medium values (20-50): Balanced selection. Good default.
  • High values (100+): Wider vocabulary selection. More diverse but potentially less coherent.

Top-P

  • Range: 0.0 to 1.0
  • Low values (0.1-0.5): Conservative token selection. More focused.
  • Medium values (0.6-0.9): Standard nucleus sampling. Good balance.
  • High values (0.95-1.0): Allows more diverse tokens. More creative.

Repeat Penalty

  • Range: 0.0 to 2.0 (1.0 = no penalty)
  • Below 1.0: Encourages repetition (rarely used).
  • 1.0: No penalty (default).
  • Above 1.0 (1.1-1.5): Discourages repetition. Helps avoid repetitive text.

Context Window (numCtx)

  • Range: Model-dependent (typically 2048-32768 tokens)
  • Smaller windows: Faster, less memory, but limited context.
  • Larger windows: Can consider more context, but slower and more memory-intensive.
  • Default: Model's default context size (varies by model).

Max Tokens (numPredict)

  • Range: 1 to model's maximum
  • Small values (50-200): Short responses, faster generation.
  • Medium values (500-1000): Standard responses.
  • Large values (2000+): Long-form content generation.
  • Unset: No limit (generates until natural stop point).

Type Conversion

The Ollama extension handles type conversion between:

  1. Configuration PropertiesOptions Record

    • Runtime configuration properties are read from application.properties
    • Converted to Options instances when creating model clients
    • Happens automatically during bean creation
  2. LangChain4j MessagesRole Enum

    • LangChain4j message types (SystemMessage, UserMessage, etc.) are mapped to Role values
    • Conversion happens in the Ollama chat model implementation
    • Transparent to application code
  3. Java TypesJSON

    • Options record serialized to JSON for Ollama API requests
    • Role enum serialized using custom serializers
    • Jackson handles serialization/deserialization

Notes

  • The Options record uses the Java record feature (Java 16+) for concise immutable data
  • All Options fields are nullable - only specified parameters are sent to Ollama API
  • The Role enum uses custom Jackson serializers for proper JSON conversion
  • Configuration-based approach (via application.properties) is strongly recommended over programmatic Options construction
  • Model-specific defaults may vary depending on the Ollama model being used
  • Context window size and max tokens are constrained by the specific model's capabilities

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama-deployment@1.7.0

docs

architecture.md

build-step-processing.md

build-time-configuration.md

devservices.md

index.md

native-image-support.md

runtime-configuration.md

runtime-model-types.md

synthetic-beans.md

tile.json