Quarkus extension that integrates Hugging Face language models with Quarkus applications through LangChain4j
The QuarkusHuggingFaceEmbeddingModel class provides text embedding capabilities using Hugging Face feature extraction models. It converts text into vector representations for semantic search, document retrieval, and retrieval-augmented generation (RAG) workflows. It implements the LangChain4j EmbeddingModel interface.
Main class for Hugging Face embedding model integration.
package io.quarkiverse.langchain4j.huggingface;
/**
* Quarkus-specific implementation of Hugging Face embedding model.
* Implements dev.langchain4j.model.embedding.EmbeddingModel interface.
*/
public class QuarkusHuggingFaceEmbeddingModel implements dev.langchain4j.model.embedding.EmbeddingModel {
/**
* Shared client factory instance for creating Hugging Face REST clients.
*/
public static final QuarkusHuggingFaceClientFactory CLIENT_FACTORY;
/**
* Creates a new builder for configuring the embedding model.
*
* @return A new Builder instance
*/
public static QuarkusHuggingFaceEmbeddingModel.Builder builder();
/**
* Converts a list of text segments into vector embeddings.
*
* @param textSegments List of text segments to embed
* @return Response containing list of embeddings (one per text segment)
*/
public dev.langchain4j.model.output.Response<java.util.List<dev.langchain4j.data.embedding.Embedding>> embedAll(
java.util.List<dev.langchain4j.data.segment.TextSegment> textSegments
);
/**
* Converts a single text segment into a vector embedding.
* Convenience method that calls embedAll() with a single-item list.
*
* @param textSegment Text segment to embed
* @return Response containing the embedding
*/
public dev.langchain4j.model.output.Response<dev.langchain4j.data.embedding.Embedding> embed(
dev.langchain4j.data.segment.TextSegment textSegment
);
/**
* Converts a plain text string into a vector embedding.
* Convenience method that wraps the text in a TextSegment.
*
* @param text Text to embed
* @return Response containing the embedding
*/
public dev.langchain4j.model.output.Response<dev.langchain4j.data.embedding.Embedding> embed(
String text
);
}The Builder class provides a fluent API for configuring embedding model instances programmatically.
/**
* Builder for creating QuarkusHuggingFaceEmbeddingModel instances with custom configuration.
*/
public static class Builder {
/**
* Sets the Hugging Face API access token.
* Required when using Hugging Face hosted inference API.
*
* @param accessToken The Hugging Face API token (starts with "hf_")
* @return This builder instance
*/
public Builder accessToken(String accessToken);
/**
* Sets the inference endpoint URL.
* Can be Hugging Face Hub API, private endpoint, or local deployment.
*
* @param url The endpoint URL
* @return This builder instance
*/
public Builder url(java.net.URL url);
/**
* Sets the timeout duration for API calls.
*
* @param timeout Timeout duration (default: 15 seconds)
* @return This builder instance
*/
public Builder timeout(java.time.Duration timeout);
/**
* Sets whether to wait for the model to be ready.
* If true, the request will wait for model loading. If false, may receive 503 error if model not loaded.
*
* @param waitForModel true to wait for model (default), false to fail fast
* @return This builder instance
*/
public Builder waitForModel(Boolean waitForModel);
/**
* Builds and returns the configured embedding model instance.
*
* @return Configured QuarkusHuggingFaceEmbeddingModel instance
* @throws IllegalArgumentException if required configuration is missing
*/
public QuarkusHuggingFaceEmbeddingModel build();
}import io.quarkiverse.langchain4j.huggingface.QuarkusHuggingFaceEmbeddingModel;
import java.net.URL;
QuarkusHuggingFaceEmbeddingModel embeddingModel = QuarkusHuggingFaceEmbeddingModel.builder()
.accessToken("hf_your_token_here")
.url(new URL("https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2"))
.build();import io.quarkiverse.langchain4j.huggingface.QuarkusHuggingFaceEmbeddingModel;
import java.net.URL;
import java.time.Duration;
QuarkusHuggingFaceEmbeddingModel embeddingModel = QuarkusHuggingFaceEmbeddingModel.builder()
.accessToken("hf_your_token_here")
.url(new URL("https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2"))
.timeout(Duration.ofSeconds(30))
.waitForModel(true)
.build();import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
// Embed a simple string
Response<Embedding> response = embeddingModel.embed("This is a sample text to embed");
// Extract the embedding vector
Embedding embedding = response.content();
float[] vector = embedding.vector();
System.out.println("Embedding dimension: " + vector.length);import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import java.util.List;
// Create text segments
List<TextSegment> segments = List.of(
TextSegment.from("First document about machine learning"),
TextSegment.from("Second document about natural language processing"),
TextSegment.from("Third document about computer vision")
);
// Generate embeddings for all segments
Response<List<Embedding>> response = embeddingModel.embedAll(segments);
// Extract embeddings
List<Embedding> embeddings = response.content();
for (int i = 0; i < embeddings.size(); i++) {
float[] vector = embeddings.get(i).vector();
System.out.println("Document " + (i + 1) + " embedding dimension: " + vector.length);
}// Use multilingual model
QuarkusHuggingFaceEmbeddingModel multilingualModel = QuarkusHuggingFaceEmbeddingModel.builder()
.accessToken("hf_your_token_here")
.url(new URL("https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"))
.build();
// Use model optimized for semantic search
QuarkusHuggingFaceEmbeddingModel semanticSearchModel = QuarkusHuggingFaceEmbeddingModel.builder()
.accessToken("hf_your_token_here")
.url(new URL("https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/multi-qa-mpnet-base-dot-v1"))
.build();// Use locally deployed embedding model
QuarkusHuggingFaceEmbeddingModel localModel = QuarkusHuggingFaceEmbeddingModel.builder()
.accessToken("dummy") // May not need real token for local deployment
.url(new URL("http://localhost:8085"))
.build();
// Use AWS-hosted Hugging Face endpoint
QuarkusHuggingFaceEmbeddingModel awsModel = QuarkusHuggingFaceEmbeddingModel.builder()
.accessToken("your_endpoint_token")
.url(new URL("https://your-endpoint.endpoints.huggingface.cloud"))
.build();import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import java.util.List;
// 1. Embed your document corpus
List<TextSegment> documents = List.of(
TextSegment.from("The capital of France is Paris."),
TextSegment.from("Python is a programming language."),
TextSegment.from("Quantum computing uses qubits.")
);
Response<List<Embedding>> docEmbeddings = embeddingModel.embedAll(documents);
// 2. Embed the user query
String userQuery = "What is the capital of France?";
Response<Embedding> queryEmbedding = embeddingModel.embed(userQuery);
// 3. Find most similar documents using cosine similarity
// (Implementation of similarity search would go here)
// 4. Use similar documents as context for LLM generationimport dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import dev.langchain4j.data.segment.TextSegment;
// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
// Add documents to store (embeddings generated automatically)
TextSegment doc1 = TextSegment.from("Machine learning is a subset of AI");
Embedding embedding1 = embeddingModel.embed(doc1).content();
embeddingStore.add(embedding1, doc1);
TextSegment doc2 = TextSegment.from("Deep learning uses neural networks");
Embedding embedding2 = embeddingModel.embed(doc2).content();
embeddingStore.add(embedding2, doc2);
// Search for similar documents
String query = "Tell me about neural networks";
Embedding queryEmbedding = embeddingModel.embed(query).content();
List<dev.langchain4j.store.embedding.EmbeddingMatch<TextSegment>> matches =
embeddingStore.findRelevant(queryEmbedding, 3);When using declarative configuration, the following properties are available:
# API Key (required for Hugging Face Hub API)
quarkus.langchain4j.huggingface.api-key=hf_your_token_here
# Inference endpoint URL (default shown below)
quarkus.langchain4j.huggingface.embedding-model.inference-endpoint-url=https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2
# Timeout (default: 10s)
quarkus.langchain4j.huggingface.timeout=30s
# Wait for model to be ready (default: true)
quarkus.langchain4j.huggingface.embedding-model.wait-for-model=trueWhen using LangChain4j embedding capabilities in Quarkus, the configured embedding model is automatically used:
import io.quarkiverse.langchain4j.RegisterAiService;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.UserMessage;
@RegisterAiService
public interface ChatWithDocumentsService {
@UserMessage("Answer based on the documents: {question}")
String answerQuestion(@MemoryId String sessionId, String question);
}
// Quarkus automatically injects configured Hugging Face embedding model
// for document retrieval and RAG workflowsThe default embedding model when using Hugging Face Hub API:
sentence-transformers/all-MiniLM-L6-v2https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2sentence-transformers/all-MiniLM-L6-v2 (384 dim, fast, good quality)sentence-transformers/all-mpnet-base-v2 (768 dim, higher quality)sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (384 dim)sentence-transformers/paraphrase-multilingual-mpnet-base-v2 (768 dim)sentence-transformers/multi-qa-mpnet-base-dot-v1 (768 dim)sentence-transformers/msmarco-distilbert-base-v4 (768 dim)sentence-transformers/paraphrase-MiniLM-L3-v2 (384 dim, optimized for short texts)Convert documents and queries to embeddings, then find most similar documents using cosine similarity or dot product.
Group similar documents together based on embedding similarity.
Retrieve relevant context documents for LLM prompts using embedding similarity.
Find duplicate or near-duplicate content by comparing embeddings.
Match user questions with FAQ entries or knowledge base articles.
Recommend similar content based on embedding proximity.
Install with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-hugging-face