Quarkus extension for integrating local Ollama language models with LangChain4j
Low-level HTTP client for direct Ollama API access when CDI injection is not available or when fine-grained control is needed.
Direct HTTP client for synchronous and streaming communication with Ollama server.
class OllamaClient {
OllamaClient(
String baseUrl,
Duration timeout,
boolean logRequests,
boolean logResponses,
boolean logCurl,
String configName,
String tlsConfigurationName
);
ChatResponse chat(ChatRequest request);
Multi<ChatResponse> streamingChat(ChatRequest request);
EmbeddingResponse embedding(EmbeddingRequest request);
}Constructor Parameters:
baseUrl - Ollama server URL (e.g., "http://localhost:11434")timeout - Request timeout durationlogRequests - Enable request payload logginglogResponses - Enable response payload logginglogCurl - Enable cURL command logging for debuggingconfigName - Named configuration reference (can be null)tlsConfigurationName - Named TLS configuration for HTTPS (can be null)Methods:
chat(ChatRequest) - Synchronous chat request, blocks until completestreamingChat(ChatRequest) - Streaming chat request, returns reactive streamembedding(EmbeddingRequest) - Embedding request, returns vector representationUsage:
import io.quarkiverse.langchain4j.ollama.*;
import java.time.Duration;
// Create client
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(30),
true, // log requests
true, // log responses
false, // log cURL
null, // no named config
null // no TLS config
);
// Synchronous chat
ChatRequest chatRequest = ChatRequest.builder()
.model("llama3.2")
.messages(List.of(
Message.builder()
.role(Role.USER)
.content("Hello!")
.build()
))
.build();
ChatResponse response = client.chat(chatRequest);
System.out.println(response.message().content());
// Streaming chat
Multi<ChatResponse> stream = client.streamingChat(chatRequest);
stream.subscribe().with(
chunk -> System.out.print(chunk.message().content()),
error -> System.err.println("Error: " + error),
() -> System.out.println("\n[Complete]")
);
// Embeddings
EmbeddingRequest embeddingRequest = EmbeddingRequest.builder()
.model("nomic-embed-text")
.input("Text to embed")
.build();
EmbeddingResponse embeddingResponse = client.embedding(embeddingRequest);
float[] vector = embeddingResponse.getEmbeddings()[0];Microprofile REST Client interface for Ollama API endpoints.
interface OllamaRestApi {
@POST
@Path("/api/chat")
ChatResponse chat(ChatRequest request);
@POST
@Path("/api/chat")
@RestStreamElementType(MediaType.APPLICATION_JSON)
Multi<ChatResponse> streamingChat(ChatRequest request);
@POST
@Path("/api/embed")
EmbeddingResponse embeddings(EmbeddingRequest request);
static ObjectMapper objectMapper(ObjectMapper defaultObjectMapper);
}Nested Classes:
OllamaRestApiReaderInterceptor - Handles incomplete JSON chunks in streaming responsesOpenAiRestApiWriterInterceptor - Automatically sets stream parameter in requestsOllamaLogger - Custom logger for requests/responses with cURL supportThis interface is typically used internally by OllamaClient, but can be used directly for advanced scenarios with Microprofile REST Client.
Usage:
import org.eclipse.microprofile.rest.client.RestClientBuilder;
import java.net.URI;
// Build REST client
OllamaRestApi api = RestClientBuilder.newBuilder()
.baseUri(URI.create("http://localhost:11434"))
.build(OllamaRestApi.class);
// Make requests
ChatResponse response = api.chat(request);
Multi<ChatResponse> stream = api.streamingChat(request);
EmbeddingResponse embeddings = api.embeddings(embeddingRequest);Make blocking chat requests that wait for complete response.
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(30),
false, false, false,
null, null
);
List<Message> messages = List.of(
Message.builder()
.role(Role.SYSTEM)
.content("You are a helpful assistant.")
.build(),
Message.builder()
.role(Role.USER)
.content("Explain quantum computing in simple terms.")
.build()
);
ChatRequest request = ChatRequest.builder()
.model("llama3.2")
.messages(messages)
.options(Options.builder()
.temperature(0.7)
.numPredict(500)
.build())
.build();
try {
ChatResponse response = client.chat(request);
System.out.println("Response: " + response.message().content());
System.out.println("Tokens: " + response.evalCount());
} catch (Exception e) {
System.err.println("Request failed: " + e.getMessage());
}Process responses incrementally as they are generated.
import io.smallrye.mutiny.Multi;
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(60),
false, false, false,
null, null
);
ChatRequest request = ChatRequest.builder()
.model("llama3.2")
.messages(messages)
.stream(true) // Explicitly enable streaming
.build();
Multi<ChatResponse> stream = client.streamingChat(request);
StringBuilder fullResponse = new StringBuilder();
stream.subscribe().with(
chunk -> {
if (!chunk.done()) {
String content = chunk.message().content();
if (content != null) {
fullResponse.append(content);
System.out.print(content);
}
} else {
// Final chunk with metadata
System.out.println("\n\n--- Metadata ---");
System.out.println("Model: " + chunk.model());
System.out.println("Input tokens: " + chunk.promptEvalCount());
System.out.println("Output tokens: " + chunk.evalCount());
System.out.println("Full response: " + fullResponse.toString());
}
},
error -> {
System.err.println("Streaming error: " + error.getMessage());
error.printStackTrace();
},
() -> System.out.println("Stream complete")
);Process multiple requests efficiently.
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(20),
false, false, false,
null, null
);
List<String> inputs = List.of(
"Summarize: The quick brown fox jumps over the lazy dog.",
"Translate to French: Hello, how are you?",
"What is 15 * 23?"
);
List<ChatResponse> responses = inputs.stream()
.map(input -> {
ChatRequest request = ChatRequest.builder()
.model("llama3.2")
.messages(List.of(
Message.builder()
.role(Role.USER)
.content(input)
.build()
))
.build();
try {
return client.chat(request);
} catch (Exception e) {
logger.error("Request failed: " + input, e);
return null;
}
})
.filter(Objects::nonNull)
.collect(Collectors.toList());
responses.forEach(response ->
System.out.println(response.message().content())
);Connect to Ollama server over HTTPS with TLS configuration.
Configuration:
# Define TLS configuration
quarkus.tls.ollama-tls.trust-store.pem.certs=server-cert.pem
quarkus.tls.ollama-tls.key-store.pem.keys=client-key.pem
quarkus.tls.ollama-tls.key-store.pem.certs=client-cert.pemUsage:
OllamaClient client = new OllamaClient(
"https://ollama.example.com",
Duration.ofSeconds(30),
false, false, false,
null,
"ollama-tls" // TLS configuration name
);
ChatResponse response = client.chat(request);Enable detailed logging for debugging.
// Enable all logging
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(30),
true, // log requests
true, // log responses
true, // log cURL commands
null,
null
);
// Make request - logs will be printed
ChatResponse response = client.chat(request);Example log output:
--- Request ---
POST http://localhost:11434/api/chat
{
"model": "llama3.2",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}
--- cURL equivalent ---
curl -X POST http://localhost:11434/api/chat \
-H 'Content-Type: application/json' \
-d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}],"stream":false}'
--- Response ---
{
"model": "llama3.2",
"created_at": "2026-02-25T08:00:00Z",
"message": {"role": "assistant", "content": "Hello! How can I help you?"},
"done": true,
"eval_count": 12
}Handle various error scenarios gracefully.
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(10),
false, false, false,
null, null
);
try {
ChatResponse response = client.chat(request);
// Process response
} catch (jakarta.ws.rs.ProcessingException e) {
// Connection error (server not running, network issues)
System.err.println("Cannot connect to Ollama server: " + e.getMessage());
} catch (java.util.concurrent.TimeoutException e) {
// Request timeout
System.err.println("Request timed out");
} catch (jakarta.ws.rs.WebApplicationException e) {
// HTTP error (404, 500, etc.)
int status = e.getResponse().getStatus();
System.err.println("Server error: " + status);
if (status == 404) {
System.err.println("Model not found - ensure it's downloaded with 'ollama pull'");
}
} catch (Exception e) {
// Other errors
System.err.println("Unexpected error: " + e.getMessage());
e.printStackTrace();
}For high-throughput applications, reuse client instances.
// Create once, reuse across requests
@ApplicationScoped
public class OllamaClientProvider {
private final OllamaClient client;
public OllamaClientProvider() {
this.client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(30),
false, false, false,
null, null
);
}
public OllamaClient getClient() {
return client;
}
}
// Inject and use
@Inject
OllamaClientProvider clientProvider;
ChatResponse response = clientProvider.getClient().chat(request);Add authentication headers using custom filter.
class OllamaModelAuthProviderFilter implements ClientRequestFilter {
OllamaModelAuthProviderFilter(ModelAuthProvider authorizer);
void filter(ClientRequestContext context);
}Usage:
import io.quarkiverse.langchain4j.ModelAuthProvider;
// Implement auth provider
public class CustomAuthProvider implements ModelAuthProvider {
@Override
public String getAuthorization(String modelName) {
return "Bearer " + getApiToken();
}
}
// Register filter (typically done via CDI)
OllamaModelAuthProviderFilter filter = new OllamaModelAuthProviderFilter(
new CustomAuthProvider()
);
// Filter is automatically applied to requestsIntegrate with reactive streams for non-blocking processing.
import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.Uni;
OllamaClient client = new OllamaClient(
"http://localhost:11434",
Duration.ofSeconds(30),
false, false, false,
null, null
);
// Non-blocking request with Uni
Uni<ChatResponse> uniResponse = Uni.createFrom().item(() -> client.chat(request));
uniResponse
.onItem().transform(response -> response.message().content())
.subscribe().with(
content -> System.out.println("Response: " + content),
error -> System.err.println("Error: " + error)
);
// Stream processing with Multi
Multi<ChatResponse> stream = client.streamingChat(request);
stream
.select().where(chunk -> !chunk.done())
.onItem().transform(chunk -> chunk.message().content())
.collect().asList()
.subscribe().with(
chunks -> {
String fullResponse = String.join("", chunks);
System.out.println("Full response: " + fullResponse);
}
);Timeout tuning: Set timeout based on model size and expected response length
Connection reuse: Create client once and reuse across requests
Error handling: Always handle network errors, timeouts, and HTTP errors
Logging: Enable logging in development, disable in production for performance
TLS: Use HTTPS with proper certificate validation in production
Streaming: Prefer streaming for long responses to provide immediate feedback
Resource cleanup: Close streams properly to free resources
Rate limiting: Implement client-side rate limiting for external Ollama servers
Health checks: Verify Ollama availability before processing requests
Monitoring: Track request latency, errors, and token usage
// Use connection pooling (automatic with REST Client)
OllamaClient client = new OllamaClient(...);
// Batch embeddings for efficiency
List<String> texts = loadTexts(); // e.g., 1000 texts
int batchSize = 50;
for (int i = 0; i < texts.size(); i += batchSize) {
List<String> batch = texts.subList(i, Math.min(i + batchSize, texts.size()));
// Process batch
for (String text : batch) {
EmbeddingRequest req = EmbeddingRequest.builder()
.model("nomic-embed-text")
.input(text)
.build();
EmbeddingResponse resp = client.embedding(req);
// Store embedding
}
}
// Use streaming for immediate feedback
Multi<ChatResponse> stream = client.streamingChat(request);
stream.subscribe().with(
chunk -> updateUI(chunk.message().content()),
error -> showError(error)
);Use CDI injection when:
Use direct client when:
// CDI approach (recommended for Quarkus)
@Inject
ChatModel chatModel;
String response = chatModel.chat("Hello");
// Direct client approach
OllamaClient client = new OllamaClient(...);
ChatRequest request = ChatRequest.builder()...build();
ChatResponse response = client.chat(request);Install with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama