Quarkus extension for integrating local Ollama language models with LangChain4j
Chat models provide conversational AI capabilities with support for both synchronous and streaming responses, including advanced features like function calling and structured output.
Inject chat models as CDI beans for automatic lifecycle management and configuration.
import jakarta.inject.Inject;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.StreamingChatModel;
// Default configuration
@Inject
ChatModel chatModel;
@Inject
StreamingChatModel streamingChatModel;
// Named configuration
@Inject
@Named("custom-model")
ChatModel customChatModel;
@Inject
@Named("custom-model")
StreamingChatModel customStreamingModel;The blocking/synchronous chat model interface for Ollama (backed by dev.langchain4j.model.ollama.OllamaChatModel from upstream LangChain4j library).
interface ChatModel {
// Simple string-based chat
String chat(String message);
// Advanced chat with full control
ChatResponse doChat(ChatRequest chatRequest);
// Listener management
List<ChatModelListener> listeners();
// Default parameters
ChatRequestParameters defaultRequestParameters();
// Supported capabilities
Set<Capability> supportedCapabilities();
}
// ChatRequest structure
class ChatRequest {
List<ChatMessage> messages();
List<ToolSpecification> toolSpecifications();
ChatRequestParameters parameters();
}
// ChatResponse structure
class ChatResponse {
AiMessage aiMessage();
ChatResponseMetadata metadata();
}Usage:
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
// Synchronous chat
String response = chatModel.chat("Hello, how are you?");
// Streaming chat
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from("Tell me a story")))
.build();
streamingChatModel.doChat(request, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String token) {
System.out.print(token);
}
@Override
public void onCompleteResponse(ChatResponse response) {
System.out.println("\n[Complete]");
}
@Override
public void onError(Throwable error) {
error.printStackTrace();
}
});Build streaming chat model instances programmatically for fine-grained control.
class OllamaStreamingChatLanguageModel implements StreamingChatModel {
static Builder builder();
void doChat(dev.langchain4j.model.chat.request.ChatRequest chatRequest, StreamingChatResponseHandler handler);
}
class OllamaStreamingChatLanguageModel.Builder {
Builder baseUrl(String val);
Builder tlsConfigurationName(String tlsConfigurationName);
Builder timeout(Duration val);
Builder model(String val);
Builder format(String val);
Builder options(Options val);
Builder logRequests(boolean logRequests);
Builder logResponses(boolean logResponses);
Builder logCurl(boolean logCurl);
Builder configName(String configName);
Builder listeners(List<ChatModelListener> listeners);
OllamaStreamingChatLanguageModel build();
}Parameters:
baseUrl - Ollama server URL (default: "http://localhost:11434")tlsConfigurationName - Named TLS configuration for HTTPStimeout - Request timeout (default: 10 seconds)model - Model name (e.g., "llama3.2", "mistral")format - Response format: "json" for JSON mode, or a JSON schema string for structured outputoptions - Model options (temperature, topK, topP, etc.)logRequests - Log request payloadslogResponses - Log response payloadslogCurl - Log equivalent cURL commandsconfigName - Named configuration referencelisteners - Chat model event listeners for observabilityUsage:
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;
OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
.baseUrl("http://localhost:11434")
.model("llama3.2")
.timeout(Duration.ofSeconds(30))
.options(Options.builder()
.temperature(0.7)
.topP(0.9)
.build())
.logRequests(true)
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from("Explain quantum computing")))
.build();
model.doChat(request, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String token) {
System.out.print(token);
}
@Override
public void onCompleteResponse(ChatResponse response) {
System.out.println("\nTokens: " + response.tokenUsage());
}
@Override
public void onError(Throwable error) {
error.printStackTrace();
}
});Define AI-powered interfaces with annotations for clean, type-safe AI integration.
import io.quarkiverse.langchain4j.RegisterAiService;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;
@RegisterAiService
public interface ChatService {
@SystemMessage("You are a helpful assistant.")
@UserMessage("Answer: {question}")
String chat(String question);
}Annotations:
@RegisterAiService - Marks interface as AI service (supports modelName parameter for named configs)@SystemMessage - System prompt template@UserMessage - User message template@V - Variable injection in templates@MemoryId - Conversation memory identifier for multi-turn chatsUsage:
@Inject
ChatService chatService;
String answer = chatService.chat("What is Quarkus?");Advanced AI Service:
@RegisterAiService(modelName = "creative")
public interface ContentGenerator {
@SystemMessage("You are a creative writing assistant specializing in {genre}.")
@UserMessage("Write a {wordCount} word story about: {topic}")
String generateStory(@V("genre") String genre,
@V("wordCount") int wordCount,
@V("topic") String topic);
}
// Usage
String story = generator.generateStory("science fiction", 500, "time travel");Force models to respond with valid JSON for structured data extraction.
Configuration:
quarkus.langchain4j.ollama.chat-model.format=jsonProgrammatic:
OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
.model("llama3.2")
.format("json")
.build();With JSON Schema:
String schema = """
{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string"}
},
"required": ["name", "age"]
}
""";
OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
.model("llama3.2")
.format(schema)
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from("Extract: John Doe is 30 years old, email john@example.com")))
.build();
// Collect response via handler
StringBuilder jsonResponse = new StringBuilder();
model.doChat(request, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String token) {
jsonResponse.append(token);
}
@Override
public void onCompleteResponse(ChatResponse response) {
// jsonResponse contains: {"name":"John Doe","age":30,"email":"john@example.com"}
}
@Override
public void onError(Throwable error) {
error.printStackTrace();
}
});Maintain conversation history for context-aware interactions.
With AI Services:
@RegisterAiService
public interface Conversation {
String chat(@MemoryId String userId, @UserMessage String message);
}
// Each user gets separate conversation history
String response1 = conversation.chat("user123", "My name is Alice");
String response2 = conversation.chat("user123", "What's my name?"); // "Your name is Alice"Low-Level API:
import dev.langchain4j.data.message.*;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
List<ChatMessage> history = new ArrayList<>();
history.add(SystemMessage.from("You are a helpful assistant."));
// Turn 1
history.add(UserMessage.from("My favorite color is blue"));
ChatRequest request1 = ChatRequest.builder()
.messages(new ArrayList<>(history))
.build();
ChatResponse response1 = chatModel.doChat(request1);
history.add(response1.aiMessage());
// Turn 2
history.add(UserMessage.from("What's my favorite color?"));
ChatRequest request2 = ChatRequest.builder()
.messages(new ArrayList<>(history))
.build();
ChatResponse response2 = chatModel.doChat(request2);
// Model remembers: "Your favorite color is blue"Send images to vision-capable models for analysis.
Images must be base64-encoded strings:
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Paths;
byte[] imageBytes = Files.readAllBytes(Paths.get("image.jpg"));
String base64Image = Base64.getEncoder().encodeToString(imageBytes);
// Using low-level API
Message message = Message.builder()
.role(Role.USER)
.content("Describe this image")
.images(List.of(base64Image))
.build();
ChatRequest request = ChatRequest.builder()
.model("llava") // Vision model
.messages(List.of(message))
.build();
ChatResponse response = ollamaClient.chat(request);Monitor chat model interactions for logging, metrics, and debugging.
import dev.langchain4j.model.chat.listener.ChatModelListener;
import dev.langchain4j.model.chat.listener.ChatModelRequest;
import dev.langchain4j.model.chat.listener.ChatModelResponse;
public class MetricsListener implements ChatModelListener {
@Override
public void onRequest(ChatModelRequest request) {
// Log request, start timer, etc.
}
@Override
public void onResponse(ChatModelResponse response) {
// Log response, record metrics, etc.
}
@Override
public void onError(Throwable error) {
// Log error, increment error counter, etc.
}
}
OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
.model("llama3.2")
.listeners(List.of(new MetricsListener()))
.build();See Configuration for complete chat model configuration options including:
model-id)num-predict)Chat operations may throw exceptions:
RuntimeException - Connection failures, timeouts, model errorsonError() callbackAlways handle errors appropriately:
try {
String response = chatModel.chat("Hello");
} catch (Exception e) {
logger.error("Chat failed", e);
return "I'm sorry, I couldn't process that request.";
}StreamingChatModel for long responses to provide immediate feedbackInstall with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama