Quarkus extension that integrates Hugging Face language models with Quarkus applications through LangChain4j
The Quarkus LangChain4j Hugging Face extension provides a comprehensive configuration system supporting both declarative configuration via application.properties and programmatic configuration via builder patterns. The extension supports default configurations and named configurations for managing multiple model instances.
Root configuration interface for the Hugging Face extension.
package io.quarkiverse.langchain4j.huggingface.runtime.config;
/**
* Root configuration for Hugging Face integration.
* Configured via quarkus.langchain4j.huggingface.* properties.
*/
@io.smallrye.config.ConfigMapping(prefix = "quarkus.langchain4j.huggingface")
@io.quarkus.runtime.annotations.ConfigRoot(phase = io.quarkus.runtime.annotations.ConfigPhase.RUN_TIME)
public interface LangChain4jHuggingFaceConfig {
/**
* Returns the default Hugging Face configuration.
* Configured via quarkus.langchain4j.huggingface.* (without a name).
*
* @return Default configuration
*/
@io.smallrye.config.WithParentName
HuggingFaceConfig defaultConfig();
/**
* Returns a map of named Hugging Face configurations.
* Allows multiple model configurations with different settings.
* Configured via quarkus.langchain4j.huggingface.{name}.* properties.
*
* @return Map of named configurations
*/
@io.smallrye.config.WithDefaults
@io.smallrye.config.WithParentName
java.util.Map<String, HuggingFaceConfig> namedConfig();
}Configuration group for Hugging Face model settings.
/**
* Configuration for a specific Hugging Face model instance.
* Can be default or named configuration.
*/
@io.smallrye.config.ConfigGroup
public interface HuggingFaceConfig {
/**
* Hugging Face API key for authentication.
* Required when using Hugging Face Hub hosted inference API.
* Can also be set via QUARKUS_LANGCHAIN4J_HUGGINGFACE_API_KEY environment variable.
*
* @return API key (default: "dummy")
*/
@io.smallrye.config.WithDefault("dummy")
String apiKey();
/**
* Timeout for Hugging Face API calls.
* Falls back to quarkus.langchain4j.timeout if not specified.
*
* @return Timeout duration (default: 10s via global config)
*/
@io.smallrye.config.WithDefault("${quarkus.langchain4j.timeout}")
java.util.Optional<java.time.Duration> timeout();
/**
* Chat model specific configuration.
*
* @return Chat model configuration
*/
ChatModelConfig chatModel();
/**
* Embedding model specific configuration.
*
* @return Embedding model configuration
*/
EmbeddingModelConfig embeddingModel();
/**
* Whether to log requests to Hugging Face API.
* API keys are automatically masked in logs.
* Falls back to quarkus.langchain4j.log-requests if not specified.
*
* @return true to log requests (default: false via global config)
*/
@io.smallrye.config.WithDefault("${quarkus.langchain4j.log-requests}")
java.util.Optional<Boolean> logRequests();
/**
* Whether to log responses from Hugging Face API.
* Falls back to quarkus.langchain4j.log-responses if not specified.
*
* @return true to log responses (default: false via global config)
*/
@io.smallrye.config.WithDefault("${quarkus.langchain4j.log-responses}")
java.util.Optional<Boolean> logResponses();
/**
* Whether to enable the Hugging Face integration.
* When false, returns a disabled model that throws exceptions on use.
*
* @return true to enable integration (default: true)
*/
@io.smallrye.config.WithDefault("true")
Boolean enableIntegration();
}Configuration for Hugging Face chat models.
package io.quarkiverse.langchain4j.huggingface.runtime.config;
/**
* Configuration for Hugging Face chat model settings.
*/
@io.smallrye.config.ConfigGroup
public interface ChatModelConfig {
/**
* Default Hugging Face inference endpoint for chat models.
*/
String DEFAULT_INFERENCE_ENDPOINT = "https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct";
/**
* Inference endpoint URL for the chat model.
* Can be Hugging Face Hub API, private endpoint, or local deployment.
*
* @return Endpoint URL (default: falcon-7b-instruct)
*/
@io.smallrye.config.WithDefault(DEFAULT_INFERENCE_ENDPOINT)
java.net.URL inferenceEndpointUrl();
/**
* Sampling temperature for text generation.
* Controls randomness. Higher values make output more random.
* Falls back to quarkus.langchain4j.temperature if not specified.
*
* @return Temperature (default: 1.0 via global config)
*/
@io.smallrye.config.WithDefault("${quarkus.langchain4j.temperature:1.0}")
Double temperature();
/**
* Maximum number of new tokens to generate.
* Model-dependent, typically 0-250 for most models.
*
* @return Maximum new tokens (no default)
*/
java.util.Optional<Integer> maxNewTokens();
/**
* Whether to return the full text including the input prompt.
*
* @return true to include input in response (default: false)
*/
@io.smallrye.config.WithDefault("false")
Boolean returnFullText();
/**
* Whether to wait for the model to be ready.
* If true, waits for model loading. If false, may receive 503 error.
*
* @return true to wait for model (default: true)
*/
@io.smallrye.config.WithDefault("true")
Boolean waitForModel();
/**
* Whether to use sampling or greedy decoding.
* When true, uses probabilistic sampling. When false, uses greedy.
*
* @return Sampling strategy (no default)
*/
java.util.Optional<Boolean> doSample();
/**
* Top-K filtering value.
* Limits sampling to the K most likely next tokens.
*
* @return Top-K value (no default)
*/
java.util.OptionalInt topK();
/**
* Top-P (nucleus sampling) value.
* Limits sampling to tokens whose cumulative probability is below P.
*
* @return Top-P value between 0.0 and 1.0 (no default)
*/
java.util.OptionalDouble topP();
/**
* Repetition penalty.
* Values > 1.0 penalize repetition, 1.0 is neutral, < 1.0 encourages it.
*
* @return Repetition penalty (no default, 1.0 is neutral)
*/
java.util.OptionalDouble repetitionPenalty();
/**
* Whether to log requests for this specific chat model.
* Overrides parent configuration if specified.
*
* @return true to log requests (falls back to parent config)
*/
java.util.Optional<Boolean> logRequests();
/**
* Whether to log responses for this specific chat model.
* Overrides parent configuration if specified.
*
* @return true to log responses (falls back to parent config)
*/
java.util.Optional<Boolean> logResponses();
}Configuration for Hugging Face embedding models.
package io.quarkiverse.langchain4j.huggingface.runtime.config;
/**
* Configuration for Hugging Face embedding model settings.
*/
@io.smallrye.config.ConfigGroup
public interface EmbeddingModelConfig {
/**
* Default Hugging Face inference endpoint for embedding models.
*/
String DEFAULT_INFERENCE_ENDPOINT_EMBEDDING = "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2";
/**
* Inference endpoint URL for the embedding model.
* Can be Hugging Face Hub API, private endpoint, or local deployment.
*
* @return Endpoint URL (default: all-MiniLM-L6-v2)
*/
@io.smallrye.config.WithDefault(DEFAULT_INFERENCE_ENDPOINT_EMBEDDING)
java.net.URL inferenceEndpointUrl();
/**
* Whether to wait for the model to be ready.
* If true, waits for model loading. If false, may receive 503 error.
*
* @return true to wait for model (default: true)
*/
@io.smallrye.config.WithDefault("true")
Boolean waitForModel();
}Minimal configuration in application.properties:
# Required: Hugging Face API key
quarkus.langchain4j.huggingface.api-key=hf_your_token_hereFull configuration with all options in application.properties:
# API Key
quarkus.langchain4j.huggingface.api-key=hf_your_token_here
# Global timeout (default: 10s)
quarkus.langchain4j.huggingface.timeout=30s
# Global logging
quarkus.langchain4j.huggingface.log-requests=true
quarkus.langchain4j.huggingface.log-responses=true
# Enable/disable integration
quarkus.langchain4j.huggingface.enable-integration=true
# Chat model configuration
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct
quarkus.langchain4j.huggingface.chat-model.temperature=0.7
quarkus.langchain4j.huggingface.chat-model.max-new-tokens=150
quarkus.langchain4j.huggingface.chat-model.return-full-text=false
quarkus.langchain4j.huggingface.chat-model.wait-for-model=true
quarkus.langchain4j.huggingface.chat-model.do-sample=true
quarkus.langchain4j.huggingface.chat-model.top-k=50
quarkus.langchain4j.huggingface.chat-model.top-p=0.95
quarkus.langchain4j.huggingface.chat-model.repetition-penalty=1.1
quarkus.langchain4j.huggingface.chat-model.log-requests=true
quarkus.langchain4j.huggingface.chat-model.log-responses=true
# Embedding model configuration
quarkus.langchain4j.huggingface.embedding-model.inference-endpoint-url=https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2
quarkus.langchain4j.huggingface.embedding-model.wait-for-model=trueConfigure multiple Hugging Face model instances with different settings:
# Default configuration
quarkus.langchain4j.huggingface.api-key=hf_default_token
# Named configuration: "creative"
quarkus.langchain4j.huggingface.creative.api-key=hf_creative_token
quarkus.langchain4j.huggingface.creative.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct
quarkus.langchain4j.huggingface.creative.chat-model.temperature=1.2
quarkus.langchain4j.huggingface.creative.chat-model.top-p=0.95
# Named configuration: "precise"
quarkus.langchain4j.huggingface.precise.api-key=hf_precise_token
quarkus.langchain4j.huggingface.precise.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/google/flan-t5-small
quarkus.langchain4j.huggingface.precise.chat-model.temperature=0.3
quarkus.langchain4j.huggingface.precise.chat-model.top-k=30
# Named configuration: "local"
quarkus.langchain4j.huggingface.local.api-key=dummy
quarkus.langchain4j.huggingface.local.chat-model.inference-endpoint-url=http://localhost:8085
quarkus.langchain4j.huggingface.local.timeout=60sAll properties can be set via environment variables using uppercase and underscores:
# API Key
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_API_KEY=hf_your_token_here
# Timeout
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_TIMEOUT=30s
# Chat model settings
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CHAT_MODEL_TEMPERATURE=0.7
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CHAT_MODEL_MAX_NEW_TOKENS=150
# Embedding model settings
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_EMBEDDING_MODEL_INFERENCE_ENDPOINT_URL=https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2
# Named configuration
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CREATIVE_API_KEY=hf_creative_token
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CREATIVE_CHAT_MODEL_TEMPERATURE=1.2Use Quarkus profiles for different environments:
# Default configuration (all profiles)
quarkus.langchain4j.huggingface.api-key=hf_default
# Development profile
%dev.quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=http://localhost:8085
%dev.quarkus.langchain4j.huggingface.log-requests=true
%dev.quarkus.langchain4j.huggingface.log-responses=true
# Production profile
%prod.quarkus.langchain4j.huggingface.api-key=hf_production_token
%prod.quarkus.langchain4j.huggingface.timeout=60s
%prod.quarkus.langchain4j.huggingface.log-requests=false
%prod.quarkus.langchain4j.huggingface.log-responses=false# Different Hugging Face Hub model
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/google/flan-t5-large
# Private Hugging Face endpoint
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://your-endpoint.endpoints.huggingface.cloud
# AWS-hosted Hugging Face endpoint
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://your-sagemaker-endpoint.amazonaws.com
# Locally deployed model
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=http://localhost:8085
quarkus.langchain4j.huggingface.api-key=dummyThe configuration system follows this priority order (highest to lowest):
chat-model.log-requests)huggingface.log-requests)langchain4j.log-requests)Example:
# Global default
quarkus.langchain4j.log-requests=false
# Hugging Face override
quarkus.langchain4j.huggingface.log-requests=true
# Chat model specific override (highest priority)
quarkus.langchain4j.huggingface.chat-model.log-requests=falseRuntime recorder for creating model suppliers.
package io.quarkiverse.langchain4j.huggingface.runtime;
/**
* Quarkus recorder for creating chat and embedding model suppliers at runtime.
* Used internally by Quarkus build-time processing.
*/
@io.quarkus.runtime.annotations.Recorder
public class HuggingFaceRecorder {
/**
* Creates a new recorder with runtime configuration.
*
* @param runtimeConfig Runtime configuration value
*/
public HuggingFaceRecorder(
io.quarkus.runtime.RuntimeValue<LangChain4jHuggingFaceConfig> runtimeConfig
);
/**
* Creates a supplier for a chat model with the given configuration name.
*
* @param configName Configuration name ("default" for unnamed config, or named config key)
* @return Supplier that creates the configured chat model
*/
public java.util.function.Supplier<dev.langchain4j.model.chat.ChatModel> chatModel(String configName);
/**
* Creates a supplier for an embedding model with the given configuration name.
*
* @param configName Configuration name ("default" for unnamed config, or named config key)
* @return Supplier that creates the configured embedding model
*/
public java.util.function.Supplier<dev.langchain4j.model.embedding.EmbeddingModel> embeddingModel(String configName);
}The extension validates configuration at startup and fails fast with clear error messages:
| Property | Default Value | Source |
|---|---|---|
api-key | "dummy" | Extension |
timeout | 10s | Global LangChain4j config |
log-requests | false | Global LangChain4j config |
log-responses | false | Global LangChain4j config |
enable-integration | true | Extension |
chat-model.inference-endpoint-url | https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct | Extension |
chat-model.temperature | 1.0 | Global LangChain4j config |
chat-model.return-full-text | false | Extension |
chat-model.wait-for-model | true | Extension |
embedding-model.inference-endpoint-url | https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2 | Extension |
embedding-model.wait-for-model | true | Extension |
Disable the integration to prevent model creation:
quarkus.langchain4j.huggingface.enable-integration=falseWhen disabled, the extension returns disabled model instances that throw exceptions on use. This is useful for:
Build-time configuration controls whether models are included in the application build. These settings are processed during the Quarkus build phase and affect which beans are created.
Build-time configuration root for the Hugging Face extension.
package io.quarkiverse.langchain4j.huggingface.deployment;
/**
* Build-time configuration for Hugging Face integration.
* Configured via quarkus.langchain4j.huggingface.* properties.
*/
@io.quarkus.runtime.annotations.ConfigRoot(phase = io.quarkus.runtime.annotations.ConfigPhase.BUILD_TIME)
@io.smallrye.config.ConfigMapping(prefix = "quarkus.langchain4j.huggingface")
public interface LangChain4jHuggingFaceBuildConfig {
/**
* Chat model build-time settings.
*
* @return Chat model build configuration
*/
ChatModelBuildConfig chatModel();
/**
* Embedding model build-time settings.
*
* @return Embedding model build configuration
*/
EmbeddingModelBuildConfig embeddingModel();
/**
* Moderation model build-time settings.
* Infrastructure for future moderation capability.
*
* @return Moderation model build configuration
*/
ModerationModelBuildConfig moderationModel();
}Build-time configuration for chat models.
package io.quarkiverse.langchain4j.huggingface.deployment;
/**
* Build-time configuration for chat model.
*/
@io.smallrye.config.ConfigGroup
public interface ChatModelBuildConfig {
/**
* Whether the chat model should be enabled at build time.
* When false, the chat model bean will not be created during the build.
*
* @return true to enable chat model (default), false to disable
*/
@io.quarkus.runtime.annotations.ConfigDocDefault("true")
java.util.Optional<Boolean> enabled();
}Build-time configuration for embedding models.
package io.quarkiverse.langchain4j.huggingface.deployment;
/**
* Build-time configuration for embedding model.
*/
@io.smallrye.config.ConfigGroup
public interface EmbeddingModelBuildConfig {
/**
* Whether the embedding model should be enabled at build time.
* When false, the embedding model bean will not be created during the build.
*
* @return true to enable embedding model (default), false to disable
*/
@io.quarkus.runtime.annotations.ConfigDocDefault("true")
java.util.Optional<Boolean> enabled();
}Build-time configuration for moderation models (infrastructure for future feature).
package io.quarkiverse.langchain4j.huggingface.deployment;
/**
* Build-time configuration for moderation model.
* This is infrastructure for a future moderation capability.
*/
@io.smallrye.config.ConfigGroup
public interface ModerationModelBuildConfig {
/**
* Whether the moderation model should be enabled at build time.
* When false, the moderation model bean will not be created during the build.
*
* @return true to enable moderation model (default), false to disable
*/
@io.quarkus.runtime.annotations.ConfigDocDefault("true")
java.util.Optional<Boolean> enabled();
}Disable chat model at build time:
# Chat model will not be included in the build
quarkus.langchain4j.huggingface.chat-model.enabled=falseDisable embedding model at build time:
# Embedding model will not be included in the build
quarkus.langchain4j.huggingface.embedding-model.enabled=falseUse with profiles to conditionally include models:
# Default: include both models
quarkus.langchain4j.huggingface.chat-model.enabled=true
quarkus.langchain4j.huggingface.embedding-model.enabled=true
# Production: only include chat model
%prod.quarkus.langchain4j.huggingface.embedding-model.enabled=false
# Test: disable both models
%test.quarkus.langchain4j.huggingface.chat-model.enabled=false
%test.quarkus.langchain4j.huggingface.embedding-model.enabled=falseBuild-Time vs Runtime Configuration:
Build-time (chat-model.enabled, embedding-model.enabled): Controls whether the model beans are created during the Quarkus build. When false, the model will not be available at runtime at all. This reduces application size and startup time.
Runtime (enable-integration): Controls whether the models are active at runtime. When false, the beans exist but throw exceptions on use. Useful for temporarily disabling without rebuilding.
Use build-time configuration when you want to permanently exclude models from specific builds (e.g., different features in different environments). Use runtime configuration for temporary disabling or testing.
Install with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-hugging-face@1.7.0