Quarkus extension deployment module for integrating Ollama LLM models with Quarkus applications through the LangChain4j framework
The Ollama runtime module provides type definitions and data structures used for interacting with Ollama models. These types are part of the runtime API and are used internally by the model implementations.
This document covers:
When you inject ChatModel, StreamingChatModel, or EmbeddingModel beans into your application, you interact with the LangChain4j interfaces. These interfaces provide the actual methods you'll call to use the models.
The ChatModel interface provides synchronous text generation methods.
package dev.langchain4j.model.chat;
import dev.langchain4j.data.message.*;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
/**
* Synchronous chat model interface for text generation.
* Beans of this type are created by the Ollama deployment module
* and can be injected into your application.
*/
public interface ChatModel {
/**
* Simple text-to-text generation.
* Takes a user message string and returns the AI response as a string.
*
* @param userMessage the user's input text
* @return the AI-generated response text
*/
String chat(String userMessage);
/**
* Structured request-based generation with advanced options.
* Supports tool specifications, response formats, and detailed parameters.
*
* @param request the chat request with messages and parameters
* @return the chat response with AI message and metadata
*/
ChatResponse chat(ChatRequest request);
/**
* Multi-message generation with system and user context.
* Allows explicit system message for instructions/context.
*
* @param systemMessage instructions or context for the model
* @param userMessage the user's input
* @return the chat response with AI message and metadata
*/
ChatResponse chat(SystemMessage systemMessage, UserMessage userMessage);
}Package: dev.langchain4j.model.chat
Methods:
| Method | Parameters | Return Type | Description |
|---|---|---|---|
chat | String userMessage | String | Simple text input/output. Most common use case. |
chat | ChatRequest request | ChatResponse | Advanced features: tools, formats, parameters. |
chat | SystemMessage, UserMessage | ChatResponse | Explicit system context with user message. |
Usage Examples:
import jakarta.inject.Inject;
import dev.langchain4j.model.chat.ChatModel;
@ApplicationScoped
public class MyService {
@Inject
ChatModel chatModel;
// Simple text generation
public String simpleChat(String userInput) {
return chatModel.chat(userInput);
}
// With system message
public String chatWithContext(String context, String userInput) {
SystemMessage system = SystemMessage.from(context);
UserMessage user = UserMessage.from(userInput);
ChatResponse response = chatModel.chat(system, user);
return response.aiMessage().text();
}
// Advanced request with parameters
public String advancedChat(String userInput) {
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from(userInput)))
.parameters(ChatRequestParameters.builder()
.temperature(0.7)
.maxTokens(500)
.build())
.build();
ChatResponse response = chatModel.chat(request);
return response.aiMessage().text();
}
}Required Imports:
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.request.ChatRequestParameters;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;The StreamingChatModel interface provides asynchronous streaming text generation.
package dev.langchain4j.model.chat;
import dev.langchain4j.data.message.*;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import java.util.List;
/**
* Streaming chat model interface for real-time text generation.
* Beans of this type are created by the Ollama deployment module
* and can be injected into your application.
*/
public interface StreamingChatModel {
/**
* Simple streaming generation from user message.
* Response is delivered incrementally via the handler callback.
*
* @param userMessage the user's input text
* @param handler callback to receive streaming response chunks
*/
void chat(String userMessage, StreamingChatResponseHandler handler);
/**
* Structured streaming request with advanced options.
* Supports tool specifications, response formats, and detailed parameters.
*
* @param request the chat request with messages and parameters
* @param handler callback to receive streaming response chunks
*/
void chat(ChatRequest request, StreamingChatResponseHandler handler);
/**
* Streaming generation from a list of chat messages.
*
* @param messages list of messages in the conversation
* @param handler callback to receive streaming response chunks
*/
void chat(List<ChatMessage> messages, StreamingChatResponseHandler handler);
}Package: dev.langchain4j.model.chat
Methods:
| Method | Parameters | Return Type | Description |
|---|---|---|---|
chat | String, StreamingChatResponseHandler | void | Simple streaming text generation. |
chat | ChatRequest, StreamingChatResponseHandler | void | Advanced streaming with parameters. |
chat | List<ChatMessage>, StreamingChatResponseHandler | void | Stream from message list. |
StreamingChatResponseHandler Interface:
package dev.langchain4j.model.chat.response;
/**
* Handler for receiving streaming chat responses.
*/
public interface StreamingChatResponseHandler {
/**
* Called for each partial response chunk as it arrives.
* @param partialResponse the text chunk received
*/
void onPartialResponse(String partialResponse);
/**
* Called when the complete response is ready.
* @param completeResponse the final chat response with metadata
*/
void onCompleteResponse(ChatResponse completeResponse);
/**
* Called if an error occurs during streaming.
* @param error the error that occurred
*/
void onError(Throwable error);
}Usage Examples:
import jakarta.inject.Inject;
import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;
@ApplicationScoped
public class MyStreamingService {
@Inject
StreamingChatModel streamingChatModel;
// Simple streaming
public void streamResponse(String userInput) {
streamingChatModel.chat(userInput, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
// Process each chunk as it arrives
System.out.print(partialResponse);
}
@Override
public void onCompleteResponse(ChatResponse completeResponse) {
// Called when generation completes
System.out.println("\nComplete!");
System.out.println("Full text: " + completeResponse.aiMessage().text());
}
@Override
public void onError(Throwable error) {
// Handle errors
System.err.println("Error: " + error.getMessage());
}
});
}
// Accumulating streaming response
public CompletableFuture<String> streamAndAccumulate(String userInput) {
CompletableFuture<String> future = new CompletableFuture<>();
StringBuilder accumulated = new StringBuilder();
streamingChatModel.chat(userInput, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
accumulated.append(partialResponse);
}
@Override
public void onCompleteResponse(ChatResponse completeResponse) {
future.complete(accumulated.toString());
}
@Override
public void onError(Throwable error) {
future.completeExceptionally(error);
}
});
return future;
}
}Required Imports:
import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;The EmbeddingModel interface provides text-to-vector embedding generation.
package dev.langchain4j.model.embedding;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import java.util.List;
/**
* Embedding model interface for converting text to vector embeddings.
* Beans of this type are created by the Ollama deployment module
* and can be injected into your application.
*/
public interface EmbeddingModel {
/**
* Generate embedding for a single text.
*
* @param text the text to embed
* @return response containing the embedding vector
*/
Response<Embedding> embed(String text);
/**
* Generate embeddings for multiple texts in a batch.
* More efficient than calling embed() multiple times.
*
* @param texts list of texts to embed
* @return response containing list of embeddings
*/
Response<List<Embedding>> embedAll(List<String> texts);
}Package: dev.langchain4j.model.embedding
Methods:
| Method | Parameters | Return Type | Description |
|---|---|---|---|
embed | String text | Response<Embedding> | Single text to embedding vector. |
embedAll | List<String> texts | Response<List<Embedding>> | Batch embedding generation. |
Embedding Type:
public class Embedding {
public float[] vector(); // Get embedding as float array
public List<Float> vectorAsList(); // Get embedding as Float list
public int dimension(); // Get embedding dimension
}Response Type:
public class Response<T> {
public T content(); // Get the main content (Embedding or List<Embedding>)
public TokenUsage tokenUsage(); // Get token usage statistics
public FinishReason finishReason(); // Get completion reason
}Usage Examples:
import jakarta.inject.Inject;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
@ApplicationScoped
public class MyEmbeddingService {
@Inject
EmbeddingModel embeddingModel;
// Embed single text
public float[] embedText(String text) {
Response<Embedding> response = embeddingModel.embed(text);
Embedding embedding = response.content();
return embedding.vector();
}
// Embed multiple texts
public List<float[]> embedTexts(List<String> texts) {
Response<List<Embedding>> response = embeddingModel.embedAll(texts);
List<Embedding> embeddings = response.content();
return embeddings.stream()
.map(Embedding::vector)
.toList();
}
// Get embedding dimension
public int getEmbeddingDimension() {
Response<Embedding> response = embeddingModel.embed("test");
return response.content().dimension();
}
// Compute similarity between two texts
public double cosineSimilarity(String text1, String text2) {
float[] vec1 = embedText(text1);
float[] vec2 = embedText(text2);
return computeCosineSimilarity(vec1, vec2);
}
private double computeCosineSimilarity(float[] a, float[] b) {
double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
for (int i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
}Required Imports:
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.model.output.TokenUsage;
import dev.langchain4j.model.output.FinishReason;When using named configurations, inject models with the @ModelName qualifier:
import jakarta.inject.Inject;
import io.quarkiverse.langchain4j.ModelName;
import dev.langchain4j.model.chat.ChatModel;
@ApplicationScoped
public class MultiModelService {
// Default model
@Inject
ChatModel defaultModel;
// Named model "creative"
@Inject
@ModelName("creative")
ChatModel creativeModel;
// Named model "precise"
@Inject
@ModelName("precise")
ChatModel preciseModel;
public String generateCreativeResponse(String input) {
return creativeModel.chat(input);
}
public String generatePreciseResponse(String input) {
return preciseModel.chat(input);
}
}Configuration for named models:
# Default model
quarkus.langchain4j.ollama.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.chat-model.temperature=0.7
# Named model "creative"
quarkus.langchain4j.ollama.creative.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.creative.chat-model.temperature=1.2
# Named model "precise"
quarkus.langchain4j.ollama.precise.chat-model.model-id=llama3.2
quarkus.langchain4j.ollama.precise.chat-model.temperature=0.1The Role enum defines the possible roles for messages in chat conversations.
package io.quarkiverse.langchain4j.ollama;
import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
import com.fasterxml.jackson.databind.annotation.JsonSerialize;
@JsonDeserialize(using = RoleDeserializer.class)
@JsonSerialize(using = RoleSerializer.class)
public enum Role {
/**
* System message role - for system instructions and context
*/
SYSTEM,
/**
* User message role - for user input and questions
*/
USER,
/**
* Assistant message role - for AI model responses
*/
ASSISTANT,
/**
* Tool message role - for tool/function call results
*/
TOOL
}Package: io.quarkiverse.langchain4j.ollama
Values:
SYSTEM - System messages provide instructions, context, or behavior guidelines to the modelUSER - User messages contain input from the end userASSISTANT - Assistant messages are responses generated by the AI modelTOOL - Tool messages contain results from function/tool executionsSerialization: Custom Jackson serializers/deserializers handle JSON conversion
Usage Context: This enum is used internally by the Ollama chat implementation when constructing message payloads. Application developers typically use LangChain4j's higher-level message types (SystemMessage, UserMessage, AiMessage, ToolExecutionResultMessage) which are automatically mapped to the appropriate Role values.
The Options record encapsulates advanced model parameters for Ollama requests.
package io.quarkiverse.langchain4j.ollama;
import java.util.List;
/**
* Advanced options for Ollama model requests
*/
public record Options(
Double temperature,
Integer topK,
Double topP,
Double repeatPenalty,
Integer seed,
Integer numPredict,
Integer numCtx,
List<String> stop
) {
/**
* Creates a new builder for Options
*/
public static Builder builder() {
return new Builder();
}
/**
* Builder for constructing Options instances
*/
public static class Builder {
private Double temperature;
private Integer topK;
private Double topP;
private Double repeatPenalty;
private Integer seed;
private Integer numPredict;
private Integer numCtx;
private List<String> stop;
/**
* Sets the temperature parameter (0-2).
* Lower values make responses more deterministic, higher values more creative.
*
* @param temperature the temperature value
* @return this builder
*/
public Builder temperature(Double temperature) {
this.temperature = temperature;
return this;
}
/**
* Sets the top-k sampling parameter.
* Limits vocabulary to the top k most probable tokens.
*
* @param topK the top-k value
* @return this builder
*/
public Builder topK(Integer topK) {
this.topK = topK;
return this;
}
/**
* Sets the top-p sampling parameter (0-1).
* Controls diversity via nucleus sampling.
*
* @param topP the top-p value
* @return this builder
*/
public Builder topP(Double topP) {
this.topP = topP;
return this;
}
/**
* Sets the repeat penalty parameter.
* Penalizes repetition in generated text (1.0 = no penalty).
*
* @param repeatPenalty the repeat penalty value
* @return this builder
*/
public Builder repeatPenalty(Double repeatPenalty) {
this.repeatPenalty = repeatPenalty;
return this;
}
/**
* Sets the random seed for reproducible results.
* Same seed with same inputs produces same output.
*
* @param seed the seed value
* @return this builder
*/
public Builder seed(Integer seed) {
this.seed = seed;
return this;
}
/**
* Sets the maximum number of tokens to predict/generate.
*
* @param numPredict the maximum token count
* @return this builder
*/
public Builder numPredict(Integer numPredict) {
this.numPredict = numPredict;
return this;
}
/**
* Sets the context window size in tokens.
* Determines how much previous context the model considers.
*
* @param numCtx the context window size
* @return this builder
*/
public Builder numCtx(Integer numCtx) {
this.numCtx = numCtx;
return this;
}
/**
* Sets the stop sequences.
* Model stops generating when any of these sequences is encountered.
*
* @param stop list of stop sequences
* @return this builder
*/
public Builder stop(List<String> stop) {
this.stop = stop;
return this;
}
/**
* Builds the Options instance
*
* @return the constructed Options
*/
public Options build() {
return new Options(
temperature,
topK,
topP,
repeatPenalty,
seed,
numPredict,
numCtx,
stop
);
}
}
}Package: io.quarkiverse.langchain4j.ollama
Record Components:
| Component | Type | Description |
|---|---|---|
temperature | Double | Sampling temperature (0-2). Controls randomness. |
topK | Integer | Top-k sampling limit. Restricts token selection to k most probable. |
topP | Double | Top-p/nucleus sampling threshold (0-1). Controls diversity. |
repeatPenalty | Double | Repeat penalty factor. Reduces repetition (1.0 = no penalty). |
seed | Integer | Random seed for reproducibility. |
numPredict | Integer | Maximum tokens to generate. |
numCtx | Integer | Context window size in tokens. |
stop | List<String> | Stop sequences that halt generation. |
Static Methods:
builder() - Creates a new Builder instance for fluent constructionBuilder Methods:
temperature(Double) - Sets temperature parametertopK(Integer) - Sets top-k sampling limittopP(Double) - Sets top-p sampling thresholdrepeatPenalty(Double) - Sets repetition penaltyseed(Integer) - Sets random seednumPredict(Integer) - Sets max tokens to generatenumCtx(Integer) - Sets context window sizestop(List<String>) - Sets stop sequencesbuild() - Constructs the Options instanceUsage Context: The Options record is used internally when the Ollama client constructs API requests. Most developers configure these parameters through the runtime configuration properties (ChatModelConfig, EmbeddingModelConfig) rather than constructing Options instances directly. The configuration system automatically builds Options instances from the configured properties.
While most configuration is done through application properties, understanding the Options structure helps when debugging or implementing custom integrations:
// Example of how Options might be constructed (internal use)
Options options = Options.builder()
.temperature(0.8)
.topK(40)
.topP(0.9)
.seed(42)
.numPredict(2048)
.stop(List.of("</s>", "<|endoftext|>"))
.build();Instead of working with Options directly, configure through application properties:
# These properties are automatically converted to Options internally
quarkus.langchain4j.ollama.chat-model.temperature=0.8
quarkus.langchain4j.ollama.chat-model.top-k=40
quarkus.langchain4j.ollama.chat-model.top-p=0.9
quarkus.langchain4j.ollama.chat-model.seed=42
quarkus.langchain4j.ollama.chat-model.num-predict=2048
quarkus.langchain4j.ollama.chat-model.stop=</s>,<|endoftext|>The Role enum is used internally when processing messages:
// Application code uses LangChain4j message types
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
// These are automatically mapped to appropriate Role values:
// SystemMessage -> Role.SYSTEM
// UserMessage -> Role.USER
// AiMessage -> Role.ASSISTANT
// ToolExecutionResultMessage -> Role.TOOL
// Application developers don't typically use Role directlyThe Ollama extension handles type conversion between:
Configuration Properties → Options Record
application.propertiesOptions instances when creating model clientsLangChain4j Messages → Role Enum
SystemMessage, UserMessage, etc.) are mapped to Role valuesJava Types → JSON
Options record serialized to JSON for Ollama API requestsRole enum serialized using custom serializersOptions record uses the Java record feature (Java 16+) for concise immutable dataOptions fields are nullable - only specified parameters are sent to Ollama APIRole enum uses custom Jackson serializers for proper JSON conversionOptions constructionInstall with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama-deployment