CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama

Quarkus extension for integrating local Ollama language models with LangChain4j

Overview
Eval results
Files

http-client.mddocs/

HTTP Client API

Low-level HTTP client for direct Ollama API access when CDI injection is not available or when fine-grained control is needed.

Capabilities

OllamaClient

Direct HTTP client for synchronous and streaming communication with Ollama server.

class OllamaClient {
    OllamaClient(
        String baseUrl,
        Duration timeout,
        boolean logRequests,
        boolean logResponses,
        boolean logCurl,
        String configName,
        String tlsConfigurationName
    );

    ChatResponse chat(ChatRequest request);
    Multi<ChatResponse> streamingChat(ChatRequest request);
    EmbeddingResponse embedding(EmbeddingRequest request);
}

Constructor Parameters:

  • baseUrl - Ollama server URL (e.g., "http://localhost:11434")
  • timeout - Request timeout duration
  • logRequests - Enable request payload logging
  • logResponses - Enable response payload logging
  • logCurl - Enable cURL command logging for debugging
  • configName - Named configuration reference (can be null)
  • tlsConfigurationName - Named TLS configuration for HTTPS (can be null)

Methods:

  • chat(ChatRequest) - Synchronous chat request, blocks until complete
  • streamingChat(ChatRequest) - Streaming chat request, returns reactive stream
  • embedding(EmbeddingRequest) - Embedding request, returns vector representation

Usage:

import io.quarkiverse.langchain4j.ollama.*;
import java.time.Duration;

// Create client
OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    true,  // log requests
    true,  // log responses
    false, // log cURL
    null,  // no named config
    null   // no TLS config
);

// Synchronous chat
ChatRequest chatRequest = ChatRequest.builder()
    .model("llama3.2")
    .messages(List.of(
        Message.builder()
            .role(Role.USER)
            .content("Hello!")
            .build()
    ))
    .build();

ChatResponse response = client.chat(chatRequest);
System.out.println(response.message().content());

// Streaming chat
Multi<ChatResponse> stream = client.streamingChat(chatRequest);
stream.subscribe().with(
    chunk -> System.out.print(chunk.message().content()),
    error -> System.err.println("Error: " + error),
    () -> System.out.println("\n[Complete]")
);

// Embeddings
EmbeddingRequest embeddingRequest = EmbeddingRequest.builder()
    .model("nomic-embed-text")
    .input("Text to embed")
    .build();

EmbeddingResponse embeddingResponse = client.embedding(embeddingRequest);
float[] vector = embeddingResponse.getEmbeddings()[0];

OllamaRestApi

Microprofile REST Client interface for Ollama API endpoints.

interface OllamaRestApi {
    @POST
    @Path("/api/chat")
    ChatResponse chat(ChatRequest request);

    @POST
    @Path("/api/chat")
    @RestStreamElementType(MediaType.APPLICATION_JSON)
    Multi<ChatResponse> streamingChat(ChatRequest request);

    @POST
    @Path("/api/embed")
    EmbeddingResponse embeddings(EmbeddingRequest request);

    static ObjectMapper objectMapper(ObjectMapper defaultObjectMapper);
}

Nested Classes:

  • OllamaRestApiReaderInterceptor - Handles incomplete JSON chunks in streaming responses
  • OpenAiRestApiWriterInterceptor - Automatically sets stream parameter in requests
  • OllamaLogger - Custom logger for requests/responses with cURL support

This interface is typically used internally by OllamaClient, but can be used directly for advanced scenarios with Microprofile REST Client.

Usage:

import org.eclipse.microprofile.rest.client.RestClientBuilder;
import java.net.URI;

// Build REST client
OllamaRestApi api = RestClientBuilder.newBuilder()
    .baseUri(URI.create("http://localhost:11434"))
    .build(OllamaRestApi.class);

// Make requests
ChatResponse response = api.chat(request);
Multi<ChatResponse> stream = api.streamingChat(request);
EmbeddingResponse embeddings = api.embeddings(embeddingRequest);

Synchronous Chat

Make blocking chat requests that wait for complete response.

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    false, false, false,
    null, null
);

List<Message> messages = List.of(
    Message.builder()
        .role(Role.SYSTEM)
        .content("You are a helpful assistant.")
        .build(),
    Message.builder()
        .role(Role.USER)
        .content("Explain quantum computing in simple terms.")
        .build()
);

ChatRequest request = ChatRequest.builder()
    .model("llama3.2")
    .messages(messages)
    .options(Options.builder()
        .temperature(0.7)
        .numPredict(500)
        .build())
    .build();

try {
    ChatResponse response = client.chat(request);
    System.out.println("Response: " + response.message().content());
    System.out.println("Tokens: " + response.evalCount());
} catch (Exception e) {
    System.err.println("Request failed: " + e.getMessage());
}

Streaming Chat

Process responses incrementally as they are generated.

import io.smallrye.mutiny.Multi;

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(60),
    false, false, false,
    null, null
);

ChatRequest request = ChatRequest.builder()
    .model("llama3.2")
    .messages(messages)
    .stream(true)  // Explicitly enable streaming
    .build();

Multi<ChatResponse> stream = client.streamingChat(request);

StringBuilder fullResponse = new StringBuilder();

stream.subscribe().with(
    chunk -> {
        if (!chunk.done()) {
            String content = chunk.message().content();
            if (content != null) {
                fullResponse.append(content);
                System.out.print(content);
            }
        } else {
            // Final chunk with metadata
            System.out.println("\n\n--- Metadata ---");
            System.out.println("Model: " + chunk.model());
            System.out.println("Input tokens: " + chunk.promptEvalCount());
            System.out.println("Output tokens: " + chunk.evalCount());
            System.out.println("Full response: " + fullResponse.toString());
        }
    },
    error -> {
        System.err.println("Streaming error: " + error.getMessage());
        error.printStackTrace();
    },
    () -> System.out.println("Stream complete")
);

Batch Processing

Process multiple requests efficiently.

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(20),
    false, false, false,
    null, null
);

List<String> inputs = List.of(
    "Summarize: The quick brown fox jumps over the lazy dog.",
    "Translate to French: Hello, how are you?",
    "What is 15 * 23?"
);

List<ChatResponse> responses = inputs.stream()
    .map(input -> {
        ChatRequest request = ChatRequest.builder()
            .model("llama3.2")
            .messages(List.of(
                Message.builder()
                    .role(Role.USER)
                    .content(input)
                    .build()
            ))
            .build();
        try {
            return client.chat(request);
        } catch (Exception e) {
            logger.error("Request failed: " + input, e);
            return null;
        }
    })
    .filter(Objects::nonNull)
    .collect(Collectors.toList());

responses.forEach(response ->
    System.out.println(response.message().content())
);

HTTPS with TLS

Connect to Ollama server over HTTPS with TLS configuration.

Configuration:

# Define TLS configuration
quarkus.tls.ollama-tls.trust-store.pem.certs=server-cert.pem
quarkus.tls.ollama-tls.key-store.pem.keys=client-key.pem
quarkus.tls.ollama-tls.key-store.pem.certs=client-cert.pem

Usage:

OllamaClient client = new OllamaClient(
    "https://ollama.example.com",
    Duration.ofSeconds(30),
    false, false, false,
    null,
    "ollama-tls"  // TLS configuration name
);

ChatResponse response = client.chat(request);

Request Logging

Enable detailed logging for debugging.

// Enable all logging
OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    true,  // log requests
    true,  // log responses
    true,  // log cURL commands
    null,
    null
);

// Make request - logs will be printed
ChatResponse response = client.chat(request);

Example log output:

--- Request ---
POST http://localhost:11434/api/chat
{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}

--- cURL equivalent ---
curl -X POST http://localhost:11434/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}],"stream":false}'

--- Response ---
{
  "model": "llama3.2",
  "created_at": "2026-02-25T08:00:00Z",
  "message": {"role": "assistant", "content": "Hello! How can I help you?"},
  "done": true,
  "eval_count": 12
}

Error Handling

Handle various error scenarios gracefully.

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(10),
    false, false, false,
    null, null
);

try {
    ChatResponse response = client.chat(request);
    // Process response
} catch (jakarta.ws.rs.ProcessingException e) {
    // Connection error (server not running, network issues)
    System.err.println("Cannot connect to Ollama server: " + e.getMessage());
} catch (java.util.concurrent.TimeoutException e) {
    // Request timeout
    System.err.println("Request timed out");
} catch (jakarta.ws.rs.WebApplicationException e) {
    // HTTP error (404, 500, etc.)
    int status = e.getResponse().getStatus();
    System.err.println("Server error: " + status);
    if (status == 404) {
        System.err.println("Model not found - ensure it's downloaded with 'ollama pull'");
    }
} catch (Exception e) {
    // Other errors
    System.err.println("Unexpected error: " + e.getMessage());
    e.printStackTrace();
}

Connection Pooling

For high-throughput applications, reuse client instances.

// Create once, reuse across requests
@ApplicationScoped
public class OllamaClientProvider {
    private final OllamaClient client;

    public OllamaClientProvider() {
        this.client = new OllamaClient(
            "http://localhost:11434",
            Duration.ofSeconds(30),
            false, false, false,
            null, null
        );
    }

    public OllamaClient getClient() {
        return client;
    }
}

// Inject and use
@Inject
OllamaClientProvider clientProvider;

ChatResponse response = clientProvider.getClient().chat(request);

Authentication

Add authentication headers using custom filter.

class OllamaModelAuthProviderFilter implements ClientRequestFilter {
    OllamaModelAuthProviderFilter(ModelAuthProvider authorizer);
    void filter(ClientRequestContext context);
}

Usage:

import io.quarkiverse.langchain4j.ModelAuthProvider;

// Implement auth provider
public class CustomAuthProvider implements ModelAuthProvider {
    @Override
    public String getAuthorization(String modelName) {
        return "Bearer " + getApiToken();
    }
}

// Register filter (typically done via CDI)
OllamaModelAuthProviderFilter filter = new OllamaModelAuthProviderFilter(
    new CustomAuthProvider()
);

// Filter is automatically applied to requests

Reactive Streams Integration

Integrate with reactive streams for non-blocking processing.

import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.Uni;

OllamaClient client = new OllamaClient(
    "http://localhost:11434",
    Duration.ofSeconds(30),
    false, false, false,
    null, null
);

// Non-blocking request with Uni
Uni<ChatResponse> uniResponse = Uni.createFrom().item(() -> client.chat(request));

uniResponse
    .onItem().transform(response -> response.message().content())
    .subscribe().with(
        content -> System.out.println("Response: " + content),
        error -> System.err.println("Error: " + error)
    );

// Stream processing with Multi
Multi<ChatResponse> stream = client.streamingChat(request);

stream
    .select().where(chunk -> !chunk.done())
    .onItem().transform(chunk -> chunk.message().content())
    .collect().asList()
    .subscribe().with(
        chunks -> {
            String fullResponse = String.join("", chunks);
            System.out.println("Full response: " + fullResponse);
        }
    );

Client Configuration Best Practices

  1. Timeout tuning: Set timeout based on model size and expected response length

    • Small models: 10-15 seconds
    • Medium models: 20-30 seconds
    • Large models: 45-60 seconds
    • Streaming: Longer timeouts (60-120 seconds)
  2. Connection reuse: Create client once and reuse across requests

  3. Error handling: Always handle network errors, timeouts, and HTTP errors

  4. Logging: Enable logging in development, disable in production for performance

  5. TLS: Use HTTPS with proper certificate validation in production

  6. Streaming: Prefer streaming for long responses to provide immediate feedback

  7. Resource cleanup: Close streams properly to free resources

  8. Rate limiting: Implement client-side rate limiting for external Ollama servers

  9. Health checks: Verify Ollama availability before processing requests

  10. Monitoring: Track request latency, errors, and token usage

Performance Optimization

// Use connection pooling (automatic with REST Client)
OllamaClient client = new OllamaClient(...);

// Batch embeddings for efficiency
List<String> texts = loadTexts(); // e.g., 1000 texts
int batchSize = 50;

for (int i = 0; i < texts.size(); i += batchSize) {
    List<String> batch = texts.subList(i, Math.min(i + batchSize, texts.size()));

    // Process batch
    for (String text : batch) {
        EmbeddingRequest req = EmbeddingRequest.builder()
            .model("nomic-embed-text")
            .input(text)
            .build();
        EmbeddingResponse resp = client.embedding(req);
        // Store embedding
    }
}

// Use streaming for immediate feedback
Multi<ChatResponse> stream = client.streamingChat(request);
stream.subscribe().with(
    chunk -> updateUI(chunk.message().content()),
    error -> showError(error)
);

Comparison: CDI vs Direct Client

Use CDI injection when:

  • Building Quarkus applications
  • Need automatic configuration management
  • Want declarative AI services
  • Require CDI features (interceptors, qualifiers, etc.)

Use direct client when:

  • Not using Quarkus
  • Need fine-grained control over requests
  • Building custom abstractions
  • Testing without CDI container
  • Programmatic configuration required
// CDI approach (recommended for Quarkus)
@Inject
ChatModel chatModel;
String response = chatModel.chat("Hello");

// Direct client approach
OllamaClient client = new OllamaClient(...);
ChatRequest request = ChatRequest.builder()...build();
ChatResponse response = client.chat(request);

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama

docs

chat-models.md

configuration.md

data-models.md

embedding-models.md

http-client.md

index.md

tool-calling.md

tile.json