tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama

Quarkus extension for integrating local Ollama language models with LangChain4j

Overview

Eval results

Files

HTTP Client API

Name: tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama
Author: tessl

Low-level HTTP client for direct Ollama API access when CDI injection is not available or when fine-grained control is needed.

Capabilities

OllamaClient

Direct HTTP client for synchronous and streaming communication with Ollama server.

class OllamaClient {
    OllamaClient(
        String baseUrl,
        Duration timeout,
        boolean logRequests,
        boolean logResponses,
        boolean logCurl,
        String configName,
        String tlsConfigurationName
    );

    ChatResponse chat(ChatRequest request);
    Multi<ChatResponse> streamingChat(ChatRequest request);
    EmbeddingResponse embedding(EmbeddingRequest request);
}

Constructor Parameters:

baseUrl - Ollama server URL (e.g., "http://localhost:11434")
timeout - Request timeout duration
logRequests - Enable request payload logging
logResponses - Enable response payload logging
logCurl - Enable cURL command logging for debugging
configName - Named configuration reference (can be null)
tlsConfigurationName - Named TLS configuration for HTTPS (can be null)

Methods:

chat(ChatRequest) - Synchronous chat request, blocks until complete
streamingChat(ChatRequest) - Streaming chat request, returns reactive stream
embedding(EmbeddingRequest) - Embedding request, returns vector representation

Usage:

import io.quarkiverse.langchain4j.ollama.*;
import java.time.Duration;

// Create client
OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    true,  // log requests
    true,  // log responses
    false, // log cURL
    null,  // no named config
    null   // no TLS config
);

// Synchronous chat
ChatRequest chatRequest = ChatRequest.builder()
    .model("llama3.2")
    .messages(List.of(
        Message.builder()
            .role(Role.USER)
            .content("Hello!")
            .build()
    ))
    .build();

ChatResponse response = client.chat(chatRequest);
System.out.println(response.message().content());

// Streaming chat
Multi<ChatResponse> stream = client.streamingChat(chatRequest);
stream.subscribe().with(
    chunk -> System.out.print(chunk.message().content()),
    error -> System.err.println("Error: " + error),
    () -> System.out.println("\n[Complete]")
);

// Embeddings
EmbeddingRequest embeddingRequest = EmbeddingRequest.builder()
    .model("nomic-embed-text")
    .input("Text to embed")
    .build();

EmbeddingResponse embeddingResponse = client.embedding(embeddingRequest);
float[] vector = embeddingResponse.getEmbeddings()[0];

OllamaRestApi

Microprofile REST Client interface for Ollama API endpoints.

interface OllamaRestApi {
    @POST
    @Path("/api/chat")
    ChatResponse chat(ChatRequest request);

    @POST
    @Path("/api/chat")
    @RestStreamElementType(MediaType.APPLICATION_JSON)
    Multi<ChatResponse> streamingChat(ChatRequest request);

    @POST
    @Path("/api/embed")
    EmbeddingResponse embeddings(EmbeddingRequest request);

    static ObjectMapper objectMapper(ObjectMapper defaultObjectMapper);
}

Nested Classes:

OllamaRestApiReaderInterceptor - Handles incomplete JSON chunks in streaming responses
OpenAiRestApiWriterInterceptor - Automatically sets stream parameter in requests
OllamaLogger - Custom logger for requests/responses with cURL support

This interface is typically used internally by OllamaClient, but can be used directly for advanced scenarios with Microprofile REST Client.

Usage:

import org.eclipse.microprofile.rest.client.RestClientBuilder;
import java.net.URI;

// Build REST client
OllamaRestApi api = RestClientBuilder.newBuilder()
    .baseUri(URI.create("http://localhost:11434"))
    .build(OllamaRestApi.class);

// Make requests
ChatResponse response = api.chat(request);
Multi<ChatResponse> stream = api.streamingChat(request);
EmbeddingResponse embeddings = api.embeddings(embeddingRequest);

Synchronous Chat

Make blocking chat requests that wait for complete response.

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    false, false, false,
    null, null
);

List<Message> messages = List.of(
    Message.builder()
        .role(Role.SYSTEM)
        .content("You are a helpful assistant.")
        .build(),
    Message.builder()
        .role(Role.USER)
        .content("Explain quantum computing in simple terms.")
        .build()
);

ChatRequest request = ChatRequest.builder()
    .model("llama3.2")
    .messages(messages)
    .options(Options.builder()
        .temperature(0.7)
        .numPredict(500)
        .build())
    .build();

try {
    ChatResponse response = client.chat(request);
    System.out.println("Response: " + response.message().content());
    System.out.println("Tokens: " + response.evalCount());
} catch (Exception e) {
    System.err.println("Request failed: " + e.getMessage());
}

Streaming Chat

Process responses incrementally as they are generated.

import io.smallrye.mutiny.Multi;

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(60),
    false, false, false,
    null, null
);

ChatRequest request = ChatRequest.builder()
    .model("llama3.2")
    .messages(messages)
    .stream(true)  // Explicitly enable streaming
    .build();

Multi<ChatResponse> stream = client.streamingChat(request);

StringBuilder fullResponse = new StringBuilder();

stream.subscribe().with(
    chunk -> {
        if (!chunk.done()) {
            String content = chunk.message().content();
            if (content != null) {
                fullResponse.append(content);
                System.out.print(content);
            }
        } else {
            // Final chunk with metadata
            System.out.println("\n\n--- Metadata ---");
            System.out.println("Model: " + chunk.model());
            System.out.println("Input tokens: " + chunk.promptEvalCount());
            System.out.println("Output tokens: " + chunk.evalCount());
            System.out.println("Full response: " + fullResponse.toString());
        }
    },
    error -> {
        System.err.println("Streaming error: " + error.getMessage());
        error.printStackTrace();
    },
    () -> System.out.println("Stream complete")
);

Batch Processing

Process multiple requests efficiently.

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(20),
    false, false, false,
    null, null
);

List<String> inputs = List.of(
    "Summarize: The quick brown fox jumps over the lazy dog.",
    "Translate to French: Hello, how are you?",
    "What is 15 * 23?"
);

List<ChatResponse> responses = inputs.stream()
    .map(input -> {
        ChatRequest request = ChatRequest.builder()
            .model("llama3.2")
            .messages(List.of(
                Message.builder()
                    .role(Role.USER)
                    .content(input)
                    .build()
            ))
            .build();
        try {
            return client.chat(request);
        } catch (Exception e) {
            logger.error("Request failed: " + input, e);
            return null;
        }
    })
    .filter(Objects::nonNull)
    .collect(Collectors.toList());

responses.forEach(response ->
    System.out.println(response.message().content())
);

HTTPS with TLS

Connect to Ollama server over HTTPS with TLS configuration.

Configuration:

# Define TLS configuration
quarkus.tls.ollama-tls.trust-store.pem.certs=server-cert.pem
quarkus.tls.ollama-tls.key-store.pem.keys=client-key.pem
quarkus.tls.ollama-tls.key-store.pem.certs=client-cert.pem

Usage:

OllamaClient client = new OllamaClient(
    "https://ollama.example.com",
    Duration.ofSeconds(30),
    false, false, false,
    null,
    "ollama-tls"  // TLS configuration name
);

ChatResponse response = client.chat(request);

Request Logging

Enable detailed logging for debugging.

// Enable all logging
OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    true,  // log requests
    true,  // log responses
    true,  // log cURL commands
    null,
    null
);

// Make request - logs will be printed
ChatResponse response = client.chat(request);

Example log output:

--- Request ---
POST http://localhost:11434/api/chat
{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}

--- cURL equivalent ---
curl -X POST http://localhost:11434/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}],"stream":false}'

--- Response ---
{
  "model": "llama3.2",
  "created_at": "2026-02-25T08:00:00Z",
  "message": {"role": "assistant", "content": "Hello! How can I help you?"},
  "done": true,
  "eval_count": 12
}

Error Handling

Handle various error scenarios gracefully.

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(10),
    false, false, false,
    null, null
);

try {
    ChatResponse response = client.chat(request);
    // Process response
} catch (jakarta.ws.rs.ProcessingException e) {
    // Connection error (server not running, network issues)
    System.err.println("Cannot connect to Ollama server: " + e.getMessage());
} catch (java.util.concurrent.TimeoutException e) {
    // Request timeout
    System.err.println("Request timed out");
} catch (jakarta.ws.rs.WebApplicationException e) {
    // HTTP error (404, 500, etc.)
    int status = e.getResponse().getStatus();
    System.err.println("Server error: " + status);
    if (status == 404) {
        System.err.println("Model not found - ensure it's downloaded with 'ollama pull'");
    }
} catch (Exception e) {
    // Other errors
    System.err.println("Unexpected error: " + e.getMessage());
    e.printStackTrace();
}

Connection Pooling

For high-throughput applications, reuse client instances.

// Create once, reuse across requests
@ApplicationScoped
public class OllamaClientProvider {
    private final OllamaClient client;

    public OllamaClientProvider() {
        this.client = new OllamaClient(
            "http://localhost:11434",
            Duration.ofSeconds(30),
            false, false, false,
            null, null
        );
    }

    public OllamaClient getClient() {
        return client;
    }
}

// Inject and use
@Inject
OllamaClientProvider clientProvider;

ChatResponse response = clientProvider.getClient().chat(request);

Authentication

Add authentication headers using custom filter.

class OllamaModelAuthProviderFilter implements ClientRequestFilter {
    OllamaModelAuthProviderFilter(ModelAuthProvider authorizer);
    void filter(ClientRequestContext context);
}

Usage:

import io.quarkiverse.langchain4j.ModelAuthProvider;

// Implement auth provider
public class CustomAuthProvider implements ModelAuthProvider {
    @Override
    public String getAuthorization(String modelName) {
        return "Bearer " + getApiToken();
    }
}

// Register filter (typically done via CDI)
OllamaModelAuthProviderFilter filter = new OllamaModelAuthProviderFilter(
    new CustomAuthProvider()
);

// Filter is automatically applied to requests

Reactive Streams Integration

Integrate with reactive streams for non-blocking processing.

import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.Uni;

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    false, false, false,
    null, null
);

// Non-blocking request with Uni
Uni<ChatResponse> uniResponse = Uni.createFrom().item(() -> client.chat(request));

uniResponse
    .onItem().transform(response -> response.message().content())
    .subscribe().with(
        content -> System.out.println("Response: " + content),
        error -> System.err.println("Error: " + error)
    );

// Stream processing with Multi
Multi<ChatResponse> stream = client.streamingChat(request);

stream
    .select().where(chunk -> !chunk.done())
    .onItem().transform(chunk -> chunk.message().content())
    .collect().asList()
    .subscribe().with(
        chunks -> {
            String fullResponse = String.join("", chunks);
            System.out.println("Full response: " + fullResponse);
        }
    );

Client Configuration Best Practices

Timeout tuning: Set timeout based on model size and expected response length
- Small models: 10-15 seconds
- Medium models: 20-30 seconds
- Large models: 45-60 seconds
- Streaming: Longer timeouts (60-120 seconds)
Connection reuse: Create client once and reuse across requests
Error handling: Always handle network errors, timeouts, and HTTP errors
Logging: Enable logging in development, disable in production for performance
TLS: Use HTTPS with proper certificate validation in production
Streaming: Prefer streaming for long responses to provide immediate feedback
Resource cleanup: Close streams properly to free resources
Rate limiting: Implement client-side rate limiting for external Ollama servers
Health checks: Verify Ollama availability before processing requests
Monitoring: Track request latency, errors, and token usage

Performance Optimization

// Use connection pooling (automatic with REST Client)
OllamaClient client = new OllamaClient(...);

// Batch embeddings for efficiency
List<String> texts = loadTexts(); // e.g., 1000 texts
int batchSize = 50;

for (int i = 0; i < texts.size(); i += batchSize) {
    List<String> batch = texts.subList(i, Math.min(i + batchSize, texts.size()));

    // Process batch
    for (String text : batch) {
        EmbeddingRequest req = EmbeddingRequest.builder()
            .model("nomic-embed-text")
            .input(text)
            .build();
        EmbeddingResponse resp = client.embedding(req);
        // Store embedding
    }
}

// Use streaming for immediate feedback
Multi<ChatResponse> stream = client.streamingChat(request);
stream.subscribe().with(
    chunk -> updateUI(chunk.message().content()),
    error -> showError(error)
);

Comparison: CDI vs Direct Client

Use CDI injection when:

Building Quarkus applications
Need automatic configuration management
Want declarative AI services
Require CDI features (interceptors, qualifiers, etc.)

Use direct client when:

Not using Quarkus
Need fine-grained control over requests
Building custom abstractions
Testing without CDI container
Programmatic configuration required

// CDI approach (recommended for Quarkus)
@Inject
ChatModel chatModel;
String response = chatModel.chat("Hello");

// Direct client approach
OllamaClient client = new OllamaClient(...);
ChatRequest request = ChatRequest.builder()...build();
ChatResponse response = client.chat(request);

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama

docs

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama

http-client.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

HTTP Client API

Capabilities

OllamaClient

OllamaRestApi

Synchronous Chat

Streaming Chat

Batch Processing

HTTPS with TLS

Request Logging

Error Handling

Connection Pooling

Authentication

Reactive Streams Integration

Client Configuration Best Practices

Performance Optimization

Comparison: CDI vs Direct Client

http-client.mddocs/