Quarkus extension for integrating local Ollama language models with LangChain4j
Embedding models generate vector representations of text for semantic search, similarity comparison, clustering, and Retrieval-Augmented Generation (RAG) applications.
Inject embedding models as CDI beans for automatic configuration and lifecycle management.
import jakarta.inject.Inject;
import dev.langchain4j.model.embedding.EmbeddingModel;
// Default configuration
@Inject
EmbeddingModel embeddingModel;
// Named configuration
@Inject
@Named("custom-embeddings")
EmbeddingModel customEmbeddingModel;Usage:
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
// Embed single text
Response<Embedding> response = embeddingModel.embed("Some text to embed");
Embedding embedding = response.content();
float[] vector = embedding.vector();
// Embed multiple texts
Response<List<Embedding>> batchResponse = embeddingModel.embedAll(List.of(
"First document",
"Second document",
"Third document"
));
List<Embedding> embeddings = batchResponse.content();Build embedding model instances programmatically for fine-grained control.
class OllamaEmbeddingModel implements EmbeddingModel {
static Builder builder();
Response<Embedding> embed(String text);
Response<Embedding> embed(TextSegment textSegment);
Response<List<Embedding>> embedAll(List<TextSegment> textSegments);
}
class OllamaEmbeddingModel.Builder {
Builder baseUrl(String val);
Builder tlsConfigurationName(String tlsConfigurationName);
Builder timeout(Duration val);
Builder model(String val);
Builder logRequests(boolean logRequests);
Builder logResponses(boolean logResponses);
Builder configName(String configName);
OllamaEmbeddingModel build();
}Parameters:
baseUrl - Ollama server URL (default: "http://localhost:11434")tlsConfigurationName - Named TLS configuration for HTTPStimeout - Request timeout (default: 10 seconds)model - Embedding model name (e.g., "nomic-embed-text", "mxbai-embed-large")logRequests - Log request payloads for debugginglogResponses - Log response payloads for debuggingconfigName - Named configuration referenceUsage:
OllamaEmbeddingModel model = OllamaEmbeddingModel.builder()
.baseUrl("http://localhost:11434")
.model("nomic-embed-text")
.timeout(Duration.ofSeconds(15))
.logRequests(true)
.build();
Response<Embedding> response = model.embed("The quick brown fox");
float[] vector = response.content().vector();
System.out.println("Embedding dimensions: " + vector.length);Use TextSegment for metadata-rich embeddings, particularly useful in RAG applications.
import dev.langchain4j.data.segment.TextSegment;
// TextSegment with metadata
TextSegment segment = TextSegment.from("Document content",
Metadata.from(Map.of(
"source", "document.pdf",
"page", 5,
"title", "Introduction"
)));
Response<Embedding> response = embeddingModel.embed(segment);Batch embedding:
List<TextSegment> segments = List.of(
TextSegment.from("First paragraph", Metadata.from("source", "doc1.txt")),
TextSegment.from("Second paragraph", Metadata.from("source", "doc2.txt")),
TextSegment.from("Third paragraph", Metadata.from("source", "doc3.txt"))
);
Response<List<Embedding>> response = embeddingModel.embedAll(segments);
List<Embedding> embeddings = response.content();Combine with embedding stores for persistent vector search.
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
@Inject
EmbeddingModel embeddingModel;
// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
// Add documents
List<TextSegment> documents = List.of(
TextSegment.from("Quarkus is a Kubernetes-native Java framework."),
TextSegment.from("Ollama runs large language models locally."),
TextSegment.from("LangChain4j is a Java library for LLM integration.")
);
Response<List<Embedding>> embeddings = embeddingModel.embedAll(documents);
for (int i = 0; i < documents.size(); i++) {
embeddingStore.add(embeddings.content().get(i), documents.get(i));
}
// Search
Embedding queryEmbedding = embeddingModel.embed("What is Quarkus?").content();
List<EmbeddingMatch<TextSegment>> matches = embeddingStore.findRelevant(queryEmbedding, 3);
for (EmbeddingMatch<TextSegment> match : matches) {
System.out.println("Score: " + match.score());
System.out.println("Text: " + match.embedded().text());
}Use embeddings for context retrieval in RAG applications.
// 1. Embed and store knowledge base
List<TextSegment> knowledgeBase = loadDocuments();
Response<List<Embedding>> embeddings = embeddingModel.embedAll(knowledgeBase);
embeddingStore.addAll(embeddings.content(), knowledgeBase);
// 2. Retrieve relevant context for query
String userQuestion = "How do I configure Quarkus?";
Embedding questionEmbedding = embeddingModel.embed(userQuestion).content();
List<EmbeddingMatch<TextSegment>> relevant = embeddingStore.findRelevant(questionEmbedding, 5);
// 3. Build context from retrieved documents
String context = relevant.stream()
.map(match -> match.embedded().text())
.collect(Collectors.joining("\n\n"));
// 4. Generate answer with context
String prompt = String.format("""
Context: %s
Question: %s
Answer based on the context above:
""", context, userQuestion);
String answer = chatModel.chat(prompt);Compute semantic similarity between texts.
import static dev.langchain4j.model.embedding.EmbeddingUtils.cosineSimilarity;
// Embed two texts
Embedding embedding1 = embeddingModel.embed("I love programming").content();
Embedding embedding2 = embeddingModel.embed("I enjoy coding").content();
Embedding embedding3 = embeddingModel.embed("I like pizza").content();
// Compute cosine similarity (range: -1 to 1, higher = more similar)
double similarity12 = cosineSimilarity(embedding1, embedding2); // ~0.85
double similarity13 = cosineSimilarity(embedding1, embedding3); // ~0.45
System.out.println("'programming' vs 'coding': " + similarity12);
System.out.println("'programming' vs 'pizza': " + similarity13);Use embeddings for document clustering and classification.
// Embed document collection
List<String> documents = List.of(
"The stock market rose today",
"Sports team won championship",
"New AI model released",
"Economy shows growth",
"Player scored winning goal",
"Machine learning breakthrough"
);
List<Embedding> embeddings = documents.stream()
.map(doc -> embeddingModel.embed(doc).content())
.collect(Collectors.toList());
// Cluster by similarity (simplified example)
// In practice, use k-means or other clustering algorithms
for (int i = 0; i < embeddings.size(); i++) {
for (int j = i + 1; j < embeddings.size(); j++) {
double similarity = cosineSimilarity(embeddings.get(i), embeddings.get(j));
if (similarity > 0.7) {
System.out.println("Similar: " + documents.get(i) + " <-> " + documents.get(j));
}
}
}Low-level embedding request/response objects for direct API access.
class EmbeddingRequest {
static Builder builder();
String getModel();
String getInput();
}
class EmbeddingRequest.Builder {
Builder model(String val);
Builder input(String val);
EmbeddingRequest build();
}
class EmbeddingResponse {
float[][] getEmbeddings();
void setEmbeddings(float[][] embeddings);
static Builder builder();
}
class EmbeddingResponse.Builder {
Builder embeddings(float[][] val);
EmbeddingResponse build();
}Usage with OllamaClient:
EmbeddingRequest request = EmbeddingRequest.builder()
.model("nomic-embed-text")
.input("Text to embed")
.build();
EmbeddingResponse response = ollamaClient.embedding(request);
float[][] embeddings = response.getEmbeddings();
float[] vector = embeddings[0]; // First (and only) embeddingCommon Ollama embedding models:
| Model | Dimensions | Description | Use Case |
|---|---|---|---|
| nomic-embed-text | 768 | General purpose, efficient | General text embedding, RAG |
| mxbai-embed-large | 1024 | High quality, larger | Semantic search, classification |
| all-minilm | 384 | Fast, compact | Real-time search, large datasets |
| snowflake-arctic-embed | 1024 | High performance | Enterprise applications |
Install with Ollama:
ollama pull nomic-embed-text
ollama pull mxbai-embed-largeSee Configuration for complete embedding model configuration options including:
model-id)base-url, timeout)Example configuration:
# Default embedding model
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text
# Named configuration for different model
quarkus.langchain4j.ollama.large-embeddings.embedding-model.model-id=mxbai-embed-large
quarkus.langchain4j.ollama.large-embeddings.timeout=20sembedAll() for multiple texts to reduce network overheadBatch size optimization:
// Process large document collections in batches
List<TextSegment> allDocuments = loadLargeDataset(); // e.g., 10,000 documents
int batchSize = 100;
for (int i = 0; i < allDocuments.size(); i += batchSize) {
List<TextSegment> batch = allDocuments.subList(
i,
Math.min(i + batchSize, allDocuments.size())
);
Response<List<Embedding>> embeddings = embeddingModel.embedAll(batch);
// Store embeddings
}Embedding operations may throw exceptions:
RuntimeException - Connection failures, timeouts, model errorstry {
Response<Embedding> response = embeddingModel.embed("text");
float[] vector = response.content().vector();
} catch (Exception e) {
logger.error("Embedding failed", e);
// Fallback: use cached embedding or retry
}Install with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-ollama@1.7.0