Quarkus extension for integrating local Ollama language models with LangChain4j
Type-safe data models for requests, responses, messages, and options. All models use Java records for immutability and include builder patterns for convenient construction.
Request object for chat API containing model, messages, tools, and options.
record ChatRequest(
String model,
List<Message> messages,
List<Tool> tools,
Options options,
@JsonSerialize(using = FormatJsonSerializer.class) String format,
Boolean stream
) {
static Builder builder();
}
class ChatRequest.Builder {
Builder model(String model);
Builder messages(List<Message> messages);
Builder tools(List<Tool> tools);
Builder options(Options options);
Builder format(String format);
Builder stream(Boolean stream);
Builder from(ChatRequest request);
ChatRequest build();
}Components:
model - Model name (e.g., "llama3.2", "mistral")messages - Conversation historytools - Available tools for function callingoptions - Model options (temperature, topK, etc.)format - Response format: "json" or JSON schema stringstream - Whether to stream responseUsage:
ChatRequest request = ChatRequest.builder()
.model("llama3.2")
.messages(List.of(
Message.builder()
.role(Role.SYSTEM)
.content("You are a helpful assistant.")
.build(),
Message.builder()
.role(Role.USER)
.content("Hello!")
.build()
))
.options(Options.builder()
.temperature(0.7)
.build())
.stream(false)
.build();Copy and modify existing request:
ChatRequest newRequest = ChatRequest.builder()
.from(existingRequest)
.temperature(0.9)
.build();Response object from chat API containing model output and metadata.
record ChatResponse(
String model,
String createdAt,
Message message,
Boolean done,
Integer promptEvalCount,
Integer evalCount
) {
static ChatResponse emptyNotDone();
}Components:
model - Model name that generated responsecreatedAt - Timestamp in ISO 8601 formatmessage - Response message with role ASSISTANTdone - Whether generation is complete (false for streaming chunks)promptEvalCount - Number of input tokens processedevalCount - Number of output tokens generatedUsage:
ChatResponse response = ollamaClient.chat(request);
String content = response.message().content();
System.out.println("Tokens: " + response.evalCount());Streaming responses:
Multi<ChatResponse> stream = ollamaClient.streamingChat(request);
stream.subscribe().with(
chunk -> {
if (!chunk.done()) {
System.out.print(chunk.message().content());
} else {
System.out.println("\nTotal tokens: " + chunk.evalCount());
}
}
);Empty response for incomplete streaming:
ChatResponse emptyChunk = ChatResponse.emptyNotDone();
// Returns response with done=false, useful for internal bufferingChat message representing user, assistant, system, or tool messages.
record Message(
Role role,
String content,
List<ToolCall> toolCalls,
List<String> images,
@JsonIgnore Map<String, Object> additionalFields
) {
static Builder builder();
@JsonAnyGetter
Map<String, Object> getAdditionalFields();
}
class Message.Builder {
Builder role(Role role);
Builder content(String content);
Builder toolCalls(List<ToolCall> toolCalls);
Builder images(List<String> images);
@JsonAnySetter
Builder additionalFields(Map<String, Object> additionalFields);
Message build();
}Components:
role - Message role (SYSTEM, USER, ASSISTANT, TOOL)content - Message text contenttoolCalls - Tool calls requested by assistant (for function calling)images - Base64-encoded images (for vision models)additionalFields - Extensibility for future Ollama featuresUsage:
// System message
Message systemMsg = Message.builder()
.role(Role.SYSTEM)
.content("You are a helpful assistant.")
.build();
// User message
Message userMsg = Message.builder()
.role(Role.USER)
.content("What is 2+2?")
.build();
// Assistant message
Message assistantMsg = Message.builder()
.role(Role.ASSISTANT)
.content("2+2 equals 4.")
.build();
// User message with image (vision model)
Message visionMsg = Message.builder()
.role(Role.USER)
.content("Describe this image")
.images(List.of(base64EncodedImage))
.build();
// Assistant message with tool calls
Message toolMsg = Message.builder()
.role(Role.ASSISTANT)
.toolCalls(List.of(
ToolCall.fromFunctionCall("getWeather", Map.of("location", "Paris"))
))
.build();
// Tool execution result
Message toolResultMsg = Message.builder()
.role(Role.TOOL)
.content("{\"temperature\": 18, \"condition\": \"sunny\"}")
.build();Enum representing message role types.
@JsonDeserialize(using = RoleDeserializer.class)
@JsonSerialize(using = RoleSerializer.class)
enum Role {
SYSTEM, // System instructions/prompt
USER, // User input message
ASSISTANT, // AI-generated response
TOOL // Tool execution result
}Usage:
Message msg = Message.builder()
.role(Role.USER)
.content("Hello")
.build();
Role role = msg.role();
if (role == Role.ASSISTANT) {
// Process AI response
}Configuration options for model behavior during generation.
record Options(
Double temperature,
Integer topK,
Double topP,
Double repeatPenalty,
Integer seed,
Integer numPredict,
Integer numCtx,
List<String> stop
) {
static Builder builder();
}
class Options.Builder {
Builder temperature(Double temperature);
Builder topK(Integer topK);
Builder topP(Double topP);
Builder repeatPenalty(Double repeatPenalty);
Builder seed(Integer seed);
Builder numPredict(Integer numPredict);
Builder numCtx(Integer numCtx);
Builder stop(List<String> stop);
Options build();
}Components:
temperature - Randomness (0.0 = deterministic, 2.0 = very random)topK - Limit token selection to top K tokenstopP - Nucleus sampling threshold (0.0-1.0)repeatPenalty - Penalty for repeating tokens (1.0 = no penalty)seed - Random seed for reproducibilitynumPredict - Maximum tokens to generatenumCtx - Context window sizestop - Stop sequences to end generationUsage:
// Deterministic, precise responses
Options precise = Options.builder()
.temperature(0.1)
.topP(0.7)
.seed(42)
.build();
// Creative, varied responses
Options creative = Options.builder()
.temperature(1.2)
.topP(0.95)
.topK(50)
.build();
// Limited generation with stop sequences
Options limited = Options.builder()
.numPredict(100)
.stop(List.of("\n\n", "END"))
.build();
// Use in request
ChatRequest request = ChatRequest.builder()
.model("llama3.2")
.messages(messages)
.options(precise)
.build();Common patterns:
// Code generation
Options codeOptions = Options.builder()
.temperature(0.2)
.topP(0.8)
.numPredict(2048)
.stop(List.of("```\n"))
.build();
// Story writing
Options storyOptions = Options.builder()
.temperature(1.3)
.topP(0.95)
.topK(60)
.numPredict(4096)
.build();
// Reproducible testing
Options testOptions = Options.builder()
.temperature(0.0)
.seed(12345)
.build();Request object for embedding API.
class EmbeddingRequest {
static Builder builder();
String getModel();
String getInput();
}
class EmbeddingRequest.Builder {
Builder model(String val);
Builder input(String val);
EmbeddingRequest build();
}Properties:
model - Embedding model name (default: "llama2")input - Text to embedUsage:
EmbeddingRequest request = EmbeddingRequest.builder()
.model("nomic-embed-text")
.input("Text to embed")
.build();
EmbeddingResponse response = ollamaClient.embedding(request);Response object from embedding API containing vector representations.
@JsonDeserialize(builder = EmbeddingResponse.Builder.class)
class EmbeddingResponse {
float[][] getEmbeddings();
void setEmbeddings(float[][] embeddings);
static Builder builder();
}
@JsonPOJOBuilder(withPrefix = "")
class EmbeddingResponse.Builder {
Builder embeddings(float[][] val);
EmbeddingResponse build();
}Properties:
embeddings - 2D array of embedding vectors (first dimension = number of inputs, second = dimensions)Usage:
EmbeddingResponse response = ollamaClient.embedding(request);
float[][] embeddings = response.getEmbeddings();
float[] vector = embeddings[0]; // First embedding
int dimensions = vector.length; // e.g., 768 for nomic-embed-text
System.out.println("Embedding dimensions: " + dimensions);
System.out.println("First value: " + vector[0]);Custom JSON serializers and deserializers for Ollama-specific formats.
// Role serialization
class RoleSerializer extends StdSerializer<Role> {
RoleSerializer();
void serialize(Role role, JsonGenerator jsonGenerator, SerializerProvider serializerProvider);
}
class RoleDeserializer extends StdDeserializer<Role> {
RoleDeserializer();
Role deserialize(JsonParser jp, DeserializationContext deserializationContext);
}
// Tool type serialization
class ToolTypeSerializer extends StdSerializer<Tool.Type> {
ToolTypeSerializer();
void serialize(Tool.Type toolType, JsonGenerator jsonGenerator, SerializerProvider serializerProvider);
}
class ToolTypeDeserializer extends StdDeserializer<Tool.Type> {
ToolTypeDeserializer();
Tool.Type deserialize(JsonParser jp, DeserializationContext deserializationContext);
}
// Format field serialization
class FormatJsonSerializer extends JsonSerializer<String> {
void serialize(String value, JsonGenerator gen, SerializerProvider serializers);
}Purpose:
Role enum to lowercase strings ("user", "assistant", etc.)Role enumTool.Type enum to lowercase stringsTool.Type enumformat field as JSON object or stringThese are used automatically by Jackson and don't require manual invocation.
import io.quarkiverse.langchain4j.ollama.*;
// Build comprehensive chat request
Options options = Options.builder()
.temperature(0.8)
.topP(0.9)
.topK(40)
.numPredict(1024)
.stop(List.of("\n\n"))
.seed(42)
.build();
List<Message> messages = List.of(
Message.builder()
.role(Role.SYSTEM)
.content("You are a helpful coding assistant.")
.build(),
Message.builder()
.role(Role.USER)
.content("Write a function to calculate factorial")
.build()
);
ChatRequest request = ChatRequest.builder()
.model("codellama")
.messages(messages)
.options(options)
.format("json")
.stream(false)
.build();
// Make request
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(30),
true, false, false,
null, null
);
ChatResponse response = client.chat(request);
// Process response
Message assistantMessage = response.message();
System.out.println("Role: " + assistantMessage.role());
System.out.println("Content: " + assistantMessage.content());
System.out.println("Input tokens: " + response.promptEvalCount());
System.out.println("Output tokens: " + response.evalCount());All data models use builder pattern for:
// Minimal request
ChatRequest minimal = ChatRequest.builder()
.model("llama3.2")
.messages(messages)
.build();
// Full request with all options
ChatRequest full = ChatRequest.builder()
.model("llama3.2")
.messages(messages)
.tools(tools)
.options(options)
.format("json")
.stream(true)
.build();Message includes additionalFields for forward compatibility with future Ollama features:
Message msg = Message.builder()
.role(Role.USER)
.content("Hello")
.additionalFields(Map.of("customField", "customValue"))
.build();
Map<String, Object> fields = msg.getAdditionalFields();This allows handling of new Ollama API features without breaking existing code.
Install with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama