CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama

Quarkus extension for integrating local Ollama language models with LangChain4j

Overview
Eval results
Files

chat-models.mddocs/

Chat Models

Chat models provide conversational AI capabilities with support for both synchronous and streaming responses, including advanced features like function calling and structured output.

Capabilities

CDI Injection (Recommended)

Inject chat models as CDI beans for automatic lifecycle management and configuration.

import jakarta.inject.Inject;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.StreamingChatModel;

// Default configuration
@Inject
ChatModel chatModel;

@Inject
StreamingChatModel streamingChatModel;

// Named configuration
@Inject
@Named("custom-model")
ChatModel customChatModel;

@Inject
@Named("custom-model")
StreamingChatModel customStreamingModel;

ChatModel Interface

The blocking/synchronous chat model interface for Ollama (backed by dev.langchain4j.model.ollama.OllamaChatModel from upstream LangChain4j library).

interface ChatModel {
    // Simple string-based chat
    String chat(String message);

    // Advanced chat with full control
    ChatResponse doChat(ChatRequest chatRequest);

    // Listener management
    List<ChatModelListener> listeners();

    // Default parameters
    ChatRequestParameters defaultRequestParameters();

    // Supported capabilities
    Set<Capability> supportedCapabilities();
}

// ChatRequest structure
class ChatRequest {
    List<ChatMessage> messages();
    List<ToolSpecification> toolSpecifications();
    ChatRequestParameters parameters();
}

// ChatResponse structure
class ChatResponse {
    AiMessage aiMessage();
    ChatResponseMetadata metadata();
}

Usage:

import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;

// Synchronous chat
String response = chatModel.chat("Hello, how are you?");

// Streaming chat
ChatRequest request = ChatRequest.builder()
    .messages(List.of(UserMessage.from("Tell me a story")))
    .build();

streamingChatModel.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String token) {
        System.out.print(token);
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        System.out.println("\n[Complete]");
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Programmatic Streaming Chat Model

Build streaming chat model instances programmatically for fine-grained control.

class OllamaStreamingChatLanguageModel implements StreamingChatModel {
    static Builder builder();
    void doChat(dev.langchain4j.model.chat.request.ChatRequest chatRequest, StreamingChatResponseHandler handler);
}

class OllamaStreamingChatLanguageModel.Builder {
    Builder baseUrl(String val);
    Builder tlsConfigurationName(String tlsConfigurationName);
    Builder timeout(Duration val);
    Builder model(String val);
    Builder format(String val);
    Builder options(Options val);
    Builder logRequests(boolean logRequests);
    Builder logResponses(boolean logResponses);
    Builder logCurl(boolean logCurl);
    Builder configName(String configName);
    Builder listeners(List<ChatModelListener> listeners);
    OllamaStreamingChatLanguageModel build();
}

Parameters:

  • baseUrl - Ollama server URL (default: "http://localhost:11434")
  • tlsConfigurationName - Named TLS configuration for HTTPS
  • timeout - Request timeout (default: 10 seconds)
  • model - Model name (e.g., "llama3.2", "mistral")
  • format - Response format: "json" for JSON mode, or a JSON schema string for structured output
  • options - Model options (temperature, topK, topP, etc.)
  • logRequests - Log request payloads
  • logResponses - Log response payloads
  • logCurl - Log equivalent cURL commands
  • configName - Named configuration reference
  • listeners - Chat model event listeners for observability

Usage:

import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.data.message.UserMessage;

OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
    .baseUrl("http://localhost:11434")
    .model("llama3.2")
    .timeout(Duration.ofSeconds(30))
    .options(Options.builder()
        .temperature(0.7)
        .topP(0.9)
        .build())
    .logRequests(true)
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(List.of(UserMessage.from("Explain quantum computing")))
    .build();

model.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String token) {
        System.out.print(token);
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        System.out.println("\nTokens: " + response.tokenUsage());
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Declarative AI Services

Define AI-powered interfaces with annotations for clean, type-safe AI integration.

import io.quarkiverse.langchain4j.RegisterAiService;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;

@RegisterAiService
public interface ChatService {
    @SystemMessage("You are a helpful assistant.")
    @UserMessage("Answer: {question}")
    String chat(String question);
}

Annotations:

  • @RegisterAiService - Marks interface as AI service (supports modelName parameter for named configs)
  • @SystemMessage - System prompt template
  • @UserMessage - User message template
  • @V - Variable injection in templates
  • @MemoryId - Conversation memory identifier for multi-turn chats

Usage:

@Inject
ChatService chatService;

String answer = chatService.chat("What is Quarkus?");

Advanced AI Service:

@RegisterAiService(modelName = "creative")
public interface ContentGenerator {
    @SystemMessage("You are a creative writing assistant specializing in {genre}.")
    @UserMessage("Write a {wordCount} word story about: {topic}")
    String generateStory(@V("genre") String genre,
                        @V("wordCount") int wordCount,
                        @V("topic") String topic);
}

// Usage
String story = generator.generateStory("science fiction", 500, "time travel");

Structured Output with JSON Mode

Force models to respond with valid JSON for structured data extraction.

Configuration:

quarkus.langchain4j.ollama.chat-model.format=json

Programmatic:

OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
    .model("llama3.2")
    .format("json")
    .build();

With JSON Schema:

String schema = """
{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "age": {"type": "integer"},
    "email": {"type": "string"}
  },
  "required": ["name", "age"]
}
""";

OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
    .model("llama3.2")
    .format(schema)
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(List.of(UserMessage.from("Extract: John Doe is 30 years old, email john@example.com")))
    .build();

// Collect response via handler
StringBuilder jsonResponse = new StringBuilder();
model.doChat(request, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String token) {
        jsonResponse.append(token);
    }

    @Override
    public void onCompleteResponse(ChatResponse response) {
        // jsonResponse contains: {"name":"John Doe","age":30,"email":"john@example.com"}
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

Multi-Turn Conversations

Maintain conversation history for context-aware interactions.

With AI Services:

@RegisterAiService
public interface Conversation {
    String chat(@MemoryId String userId, @UserMessage String message);
}

// Each user gets separate conversation history
String response1 = conversation.chat("user123", "My name is Alice");
String response2 = conversation.chat("user123", "What's my name?"); // "Your name is Alice"

Low-Level API:

import dev.langchain4j.data.message.*;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;

List<ChatMessage> history = new ArrayList<>();
history.add(SystemMessage.from("You are a helpful assistant."));

// Turn 1
history.add(UserMessage.from("My favorite color is blue"));
ChatRequest request1 = ChatRequest.builder()
    .messages(new ArrayList<>(history))
    .build();
ChatResponse response1 = chatModel.doChat(request1);
history.add(response1.aiMessage());

// Turn 2
history.add(UserMessage.from("What's my favorite color?"));
ChatRequest request2 = ChatRequest.builder()
    .messages(new ArrayList<>(history))
    .build();
ChatResponse response2 = chatModel.doChat(request2);
// Model remembers: "Your favorite color is blue"

Image Inputs (Vision Models)

Send images to vision-capable models for analysis.

Images must be base64-encoded strings:

import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Paths;

byte[] imageBytes = Files.readAllBytes(Paths.get("image.jpg"));
String base64Image = Base64.getEncoder().encodeToString(imageBytes);

// Using low-level API
Message message = Message.builder()
    .role(Role.USER)
    .content("Describe this image")
    .images(List.of(base64Image))
    .build();

ChatRequest request = ChatRequest.builder()
    .model("llava")  // Vision model
    .messages(List.of(message))
    .build();

ChatResponse response = ollamaClient.chat(request);

Model Listeners for Observability

Monitor chat model interactions for logging, metrics, and debugging.

import dev.langchain4j.model.chat.listener.ChatModelListener;
import dev.langchain4j.model.chat.listener.ChatModelRequest;
import dev.langchain4j.model.chat.listener.ChatModelResponse;

public class MetricsListener implements ChatModelListener {
    @Override
    public void onRequest(ChatModelRequest request) {
        // Log request, start timer, etc.
    }

    @Override
    public void onResponse(ChatModelResponse response) {
        // Log response, record metrics, etc.
    }

    @Override
    public void onError(Throwable error) {
        // Log error, increment error counter, etc.
    }
}

OllamaStreamingChatLanguageModel model = OllamaStreamingChatLanguageModel.builder()
    .model("llama3.2")
    .listeners(List.of(new MetricsListener()))
    .build();

Configuration

See Configuration for complete chat model configuration options including:

  • Model selection (model-id)
  • Temperature and sampling parameters
  • Token limits (num-predict)
  • Stop sequences
  • Logging options
  • Named configurations

Error Handling

Chat operations may throw exceptions:

  • RuntimeException - Connection failures, timeouts, model errors
  • Model may return empty or incomplete responses if interrupted
  • Streaming handlers receive errors via onError() callback

Always handle errors appropriately:

try {
    String response = chatModel.chat("Hello");
} catch (Exception e) {
    logger.error("Chat failed", e);
    return "I'm sorry, I couldn't process that request.";
}

Performance Considerations

  • Streaming: Use StreamingChatModel for long responses to provide immediate feedback
  • Timeouts: Adjust timeout based on model size and expected response length
  • Model size: Smaller models (e.g., "llama3.2:1b") respond faster but with lower quality
  • Temperature: Lower temperature (0.1-0.3) for faster, more deterministic responses
  • Context window: Limit conversation history to prevent context overflow

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama

docs

chat-models.md

configuration.md

data-models.md

embedding-models.md

http-client.md

index.md

tool-calling.md

tile.json