CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-hugging-face

Quarkus extension that integrates Hugging Face language models with Quarkus applications through LangChain4j

Overview
Eval results
Files

configuration.mddocs/

Configuration

The Quarkus LangChain4j Hugging Face extension provides a comprehensive configuration system supporting both declarative configuration via application.properties and programmatic configuration via builder patterns. The extension supports default configurations and named configurations for managing multiple model instances.

Capabilities

LangChain4jHuggingFaceConfig

Root configuration interface for the Hugging Face extension.

package io.quarkiverse.langchain4j.huggingface.runtime.config;

/**
 * Root configuration for Hugging Face integration.
 * Configured via quarkus.langchain4j.huggingface.* properties.
 */
@io.smallrye.config.ConfigMapping(prefix = "quarkus.langchain4j.huggingface")
@io.quarkus.runtime.annotations.ConfigRoot(phase = io.quarkus.runtime.annotations.ConfigPhase.RUN_TIME)
public interface LangChain4jHuggingFaceConfig {

    /**
     * Returns the default Hugging Face configuration.
     * Configured via quarkus.langchain4j.huggingface.* (without a name).
     *
     * @return Default configuration
     */
    @io.smallrye.config.WithParentName
    HuggingFaceConfig defaultConfig();

    /**
     * Returns a map of named Hugging Face configurations.
     * Allows multiple model configurations with different settings.
     * Configured via quarkus.langchain4j.huggingface.{name}.* properties.
     *
     * @return Map of named configurations
     */
    @io.smallrye.config.WithDefaults
    @io.smallrye.config.WithParentName
    java.util.Map<String, HuggingFaceConfig> namedConfig();
}

HuggingFaceConfig

Configuration group for Hugging Face model settings.

/**
 * Configuration for a specific Hugging Face model instance.
 * Can be default or named configuration.
 */
@io.smallrye.config.ConfigGroup
public interface HuggingFaceConfig {

    /**
     * Hugging Face API key for authentication.
     * Required when using Hugging Face Hub hosted inference API.
     * Can also be set via QUARKUS_LANGCHAIN4J_HUGGINGFACE_API_KEY environment variable.
     *
     * @return API key (default: "dummy")
     */
    @io.smallrye.config.WithDefault("dummy")
    String apiKey();

    /**
     * Timeout for Hugging Face API calls.
     * Falls back to quarkus.langchain4j.timeout if not specified.
     *
     * @return Timeout duration (default: 10s via global config)
     */
    @io.smallrye.config.WithDefault("${quarkus.langchain4j.timeout}")
    java.util.Optional<java.time.Duration> timeout();

    /**
     * Chat model specific configuration.
     *
     * @return Chat model configuration
     */
    ChatModelConfig chatModel();

    /**
     * Embedding model specific configuration.
     *
     * @return Embedding model configuration
     */
    EmbeddingModelConfig embeddingModel();

    /**
     * Whether to log requests to Hugging Face API.
     * API keys are automatically masked in logs.
     * Falls back to quarkus.langchain4j.log-requests if not specified.
     *
     * @return true to log requests (default: false via global config)
     */
    @io.smallrye.config.WithDefault("${quarkus.langchain4j.log-requests}")
    java.util.Optional<Boolean> logRequests();

    /**
     * Whether to log responses from Hugging Face API.
     * Falls back to quarkus.langchain4j.log-responses if not specified.
     *
     * @return true to log responses (default: false via global config)
     */
    @io.smallrye.config.WithDefault("${quarkus.langchain4j.log-responses}")
    java.util.Optional<Boolean> logResponses();

    /**
     * Whether to enable the Hugging Face integration.
     * When false, returns a disabled model that throws exceptions on use.
     *
     * @return true to enable integration (default: true)
     */
    @io.smallrye.config.WithDefault("true")
    Boolean enableIntegration();
}

ChatModelConfig

Configuration for Hugging Face chat models.

package io.quarkiverse.langchain4j.huggingface.runtime.config;

/**
 * Configuration for Hugging Face chat model settings.
 */
@io.smallrye.config.ConfigGroup
public interface ChatModelConfig {

    /**
     * Default Hugging Face inference endpoint for chat models.
     */
    String DEFAULT_INFERENCE_ENDPOINT = "https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct";

    /**
     * Inference endpoint URL for the chat model.
     * Can be Hugging Face Hub API, private endpoint, or local deployment.
     *
     * @return Endpoint URL (default: falcon-7b-instruct)
     */
    @io.smallrye.config.WithDefault(DEFAULT_INFERENCE_ENDPOINT)
    java.net.URL inferenceEndpointUrl();

    /**
     * Sampling temperature for text generation.
     * Controls randomness. Higher values make output more random.
     * Falls back to quarkus.langchain4j.temperature if not specified.
     *
     * @return Temperature (default: 1.0 via global config)
     */
    @io.smallrye.config.WithDefault("${quarkus.langchain4j.temperature:1.0}")
    Double temperature();

    /**
     * Maximum number of new tokens to generate.
     * Model-dependent, typically 0-250 for most models.
     *
     * @return Maximum new tokens (no default)
     */
    java.util.Optional<Integer> maxNewTokens();

    /**
     * Whether to return the full text including the input prompt.
     *
     * @return true to include input in response (default: false)
     */
    @io.smallrye.config.WithDefault("false")
    Boolean returnFullText();

    /**
     * Whether to wait for the model to be ready.
     * If true, waits for model loading. If false, may receive 503 error.
     *
     * @return true to wait for model (default: true)
     */
    @io.smallrye.config.WithDefault("true")
    Boolean waitForModel();

    /**
     * Whether to use sampling or greedy decoding.
     * When true, uses probabilistic sampling. When false, uses greedy.
     *
     * @return Sampling strategy (no default)
     */
    java.util.Optional<Boolean> doSample();

    /**
     * Top-K filtering value.
     * Limits sampling to the K most likely next tokens.
     *
     * @return Top-K value (no default)
     */
    java.util.OptionalInt topK();

    /**
     * Top-P (nucleus sampling) value.
     * Limits sampling to tokens whose cumulative probability is below P.
     *
     * @return Top-P value between 0.0 and 1.0 (no default)
     */
    java.util.OptionalDouble topP();

    /**
     * Repetition penalty.
     * Values > 1.0 penalize repetition, 1.0 is neutral, < 1.0 encourages it.
     *
     * @return Repetition penalty (no default, 1.0 is neutral)
     */
    java.util.OptionalDouble repetitionPenalty();

    /**
     * Whether to log requests for this specific chat model.
     * Overrides parent configuration if specified.
     *
     * @return true to log requests (falls back to parent config)
     */
    java.util.Optional<Boolean> logRequests();

    /**
     * Whether to log responses for this specific chat model.
     * Overrides parent configuration if specified.
     *
     * @return true to log responses (falls back to parent config)
     */
    java.util.Optional<Boolean> logResponses();
}

EmbeddingModelConfig

Configuration for Hugging Face embedding models.

package io.quarkiverse.langchain4j.huggingface.runtime.config;

/**
 * Configuration for Hugging Face embedding model settings.
 */
@io.smallrye.config.ConfigGroup
public interface EmbeddingModelConfig {

    /**
     * Default Hugging Face inference endpoint for embedding models.
     */
    String DEFAULT_INFERENCE_ENDPOINT_EMBEDDING = "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2";

    /**
     * Inference endpoint URL for the embedding model.
     * Can be Hugging Face Hub API, private endpoint, or local deployment.
     *
     * @return Endpoint URL (default: all-MiniLM-L6-v2)
     */
    @io.smallrye.config.WithDefault(DEFAULT_INFERENCE_ENDPOINT_EMBEDDING)
    java.net.URL inferenceEndpointUrl();

    /**
     * Whether to wait for the model to be ready.
     * If true, waits for model loading. If false, may receive 503 error.
     *
     * @return true to wait for model (default: true)
     */
    @io.smallrye.config.WithDefault("true")
    Boolean waitForModel();
}

Configuration Examples

Basic Configuration

Minimal configuration in application.properties:

# Required: Hugging Face API key
quarkus.langchain4j.huggingface.api-key=hf_your_token_here

Complete Default Configuration

Full configuration with all options in application.properties:

# API Key
quarkus.langchain4j.huggingface.api-key=hf_your_token_here

# Global timeout (default: 10s)
quarkus.langchain4j.huggingface.timeout=30s

# Global logging
quarkus.langchain4j.huggingface.log-requests=true
quarkus.langchain4j.huggingface.log-responses=true

# Enable/disable integration
quarkus.langchain4j.huggingface.enable-integration=true

# Chat model configuration
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct
quarkus.langchain4j.huggingface.chat-model.temperature=0.7
quarkus.langchain4j.huggingface.chat-model.max-new-tokens=150
quarkus.langchain4j.huggingface.chat-model.return-full-text=false
quarkus.langchain4j.huggingface.chat-model.wait-for-model=true
quarkus.langchain4j.huggingface.chat-model.do-sample=true
quarkus.langchain4j.huggingface.chat-model.top-k=50
quarkus.langchain4j.huggingface.chat-model.top-p=0.95
quarkus.langchain4j.huggingface.chat-model.repetition-penalty=1.1
quarkus.langchain4j.huggingface.chat-model.log-requests=true
quarkus.langchain4j.huggingface.chat-model.log-responses=true

# Embedding model configuration
quarkus.langchain4j.huggingface.embedding-model.inference-endpoint-url=https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2
quarkus.langchain4j.huggingface.embedding-model.wait-for-model=true

Named Configurations

Configure multiple Hugging Face model instances with different settings:

# Default configuration
quarkus.langchain4j.huggingface.api-key=hf_default_token

# Named configuration: "creative"
quarkus.langchain4j.huggingface.creative.api-key=hf_creative_token
quarkus.langchain4j.huggingface.creative.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct
quarkus.langchain4j.huggingface.creative.chat-model.temperature=1.2
quarkus.langchain4j.huggingface.creative.chat-model.top-p=0.95

# Named configuration: "precise"
quarkus.langchain4j.huggingface.precise.api-key=hf_precise_token
quarkus.langchain4j.huggingface.precise.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/google/flan-t5-small
quarkus.langchain4j.huggingface.precise.chat-model.temperature=0.3
quarkus.langchain4j.huggingface.precise.chat-model.top-k=30

# Named configuration: "local"
quarkus.langchain4j.huggingface.local.api-key=dummy
quarkus.langchain4j.huggingface.local.chat-model.inference-endpoint-url=http://localhost:8085
quarkus.langchain4j.huggingface.local.timeout=60s

Environment Variables

All properties can be set via environment variables using uppercase and underscores:

# API Key
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_API_KEY=hf_your_token_here

# Timeout
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_TIMEOUT=30s

# Chat model settings
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CHAT_MODEL_TEMPERATURE=0.7
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CHAT_MODEL_MAX_NEW_TOKENS=150

# Embedding model settings
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_EMBEDDING_MODEL_INFERENCE_ENDPOINT_URL=https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2

# Named configuration
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CREATIVE_API_KEY=hf_creative_token
export QUARKUS_LANGCHAIN4J_HUGGINGFACE_CREATIVE_CHAT_MODEL_TEMPERATURE=1.2

Profile-Specific Configuration

Use Quarkus profiles for different environments:

# Default configuration (all profiles)
quarkus.langchain4j.huggingface.api-key=hf_default

# Development profile
%dev.quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=http://localhost:8085
%dev.quarkus.langchain4j.huggingface.log-requests=true
%dev.quarkus.langchain4j.huggingface.log-responses=true

# Production profile
%prod.quarkus.langchain4j.huggingface.api-key=hf_production_token
%prod.quarkus.langchain4j.huggingface.timeout=60s
%prod.quarkus.langchain4j.huggingface.log-requests=false
%prod.quarkus.langchain4j.huggingface.log-responses=false

Custom Model Endpoints

# Different Hugging Face Hub model
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://api-inference.huggingface.co/models/google/flan-t5-large

# Private Hugging Face endpoint
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://your-endpoint.endpoints.huggingface.cloud

# AWS-hosted Hugging Face endpoint
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=https://your-sagemaker-endpoint.amazonaws.com

# Locally deployed model
quarkus.langchain4j.huggingface.chat-model.inference-endpoint-url=http://localhost:8085
quarkus.langchain4j.huggingface.api-key=dummy

Configuration Hierarchy

The configuration system follows this priority order (highest to lowest):

  1. Model-specific properties (e.g., chat-model.log-requests)
  2. Parent configuration properties (e.g., huggingface.log-requests)
  3. Global LangChain4j properties (e.g., langchain4j.log-requests)
  4. Default values

Example:

# Global default
quarkus.langchain4j.log-requests=false

# Hugging Face override
quarkus.langchain4j.huggingface.log-requests=true

# Chat model specific override (highest priority)
quarkus.langchain4j.huggingface.chat-model.log-requests=false

HuggingFaceRecorder

Runtime recorder for creating model suppliers.

package io.quarkiverse.langchain4j.huggingface.runtime;

/**
 * Quarkus recorder for creating chat and embedding model suppliers at runtime.
 * Used internally by Quarkus build-time processing.
 */
@io.quarkus.runtime.annotations.Recorder
public class HuggingFaceRecorder {

    /**
     * Creates a new recorder with runtime configuration.
     *
     * @param runtimeConfig Runtime configuration value
     */
    public HuggingFaceRecorder(
        io.quarkus.runtime.RuntimeValue<LangChain4jHuggingFaceConfig> runtimeConfig
    );

    /**
     * Creates a supplier for a chat model with the given configuration name.
     *
     * @param configName Configuration name ("default" for unnamed config, or named config key)
     * @return Supplier that creates the configured chat model
     */
    public java.util.function.Supplier<dev.langchain4j.model.chat.ChatModel> chatModel(String configName);

    /**
     * Creates a supplier for an embedding model with the given configuration name.
     *
     * @param configName Configuration name ("default" for unnamed config, or named config key)
     * @return Supplier that creates the configured embedding model
     */
    public java.util.function.Supplier<dev.langchain4j.model.embedding.EmbeddingModel> embeddingModel(String configName);
}

Configuration Validation

The extension validates configuration at startup and fails fast with clear error messages:

  • Missing API Key: When using Hugging Face Hub API without a valid API key (not "dummy")
  • Invalid URL: When the endpoint URL is malformed
  • Invalid Parameters: When parameter values are out of valid ranges

Default Values Reference

PropertyDefault ValueSource
api-key"dummy"Extension
timeout10sGlobal LangChain4j config
log-requestsfalseGlobal LangChain4j config
log-responsesfalseGlobal LangChain4j config
enable-integrationtrueExtension
chat-model.inference-endpoint-urlhttps://api-inference.huggingface.co/models/tiiuae/falcon-7b-instructExtension
chat-model.temperature1.0Global LangChain4j config
chat-model.return-full-textfalseExtension
chat-model.wait-for-modeltrueExtension
embedding-model.inference-endpoint-urlhttps://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2Extension
embedding-model.wait-for-modeltrueExtension

Integration Toggle

Disable the integration to prevent model creation:

quarkus.langchain4j.huggingface.enable-integration=false

When disabled, the extension returns disabled model instances that throw exceptions on use. This is useful for:

  • Disabling Hugging Face in certain profiles
  • Testing without actual API calls
  • Temporarily disabling without removing configuration

Build-Time Configuration

Build-time configuration controls whether models are included in the application build. These settings are processed during the Quarkus build phase and affect which beans are created.

LangChain4jHuggingFaceBuildConfig

Build-time configuration root for the Hugging Face extension.

package io.quarkiverse.langchain4j.huggingface.deployment;

/**
 * Build-time configuration for Hugging Face integration.
 * Configured via quarkus.langchain4j.huggingface.* properties.
 */
@io.quarkus.runtime.annotations.ConfigRoot(phase = io.quarkus.runtime.annotations.ConfigPhase.BUILD_TIME)
@io.smallrye.config.ConfigMapping(prefix = "quarkus.langchain4j.huggingface")
public interface LangChain4jHuggingFaceBuildConfig {

    /**
     * Chat model build-time settings.
     *
     * @return Chat model build configuration
     */
    ChatModelBuildConfig chatModel();

    /**
     * Embedding model build-time settings.
     *
     * @return Embedding model build configuration
     */
    EmbeddingModelBuildConfig embeddingModel();

    /**
     * Moderation model build-time settings.
     * Infrastructure for future moderation capability.
     *
     * @return Moderation model build configuration
     */
    ModerationModelBuildConfig moderationModel();
}

ChatModelBuildConfig

Build-time configuration for chat models.

package io.quarkiverse.langchain4j.huggingface.deployment;

/**
 * Build-time configuration for chat model.
 */
@io.smallrye.config.ConfigGroup
public interface ChatModelBuildConfig {

    /**
     * Whether the chat model should be enabled at build time.
     * When false, the chat model bean will not be created during the build.
     *
     * @return true to enable chat model (default), false to disable
     */
    @io.quarkus.runtime.annotations.ConfigDocDefault("true")
    java.util.Optional<Boolean> enabled();
}

EmbeddingModelBuildConfig

Build-time configuration for embedding models.

package io.quarkiverse.langchain4j.huggingface.deployment;

/**
 * Build-time configuration for embedding model.
 */
@io.smallrye.config.ConfigGroup
public interface EmbeddingModelBuildConfig {

    /**
     * Whether the embedding model should be enabled at build time.
     * When false, the embedding model bean will not be created during the build.
     *
     * @return true to enable embedding model (default), false to disable
     */
    @io.quarkus.runtime.annotations.ConfigDocDefault("true")
    java.util.Optional<Boolean> enabled();
}

ModerationModelBuildConfig

Build-time configuration for moderation models (infrastructure for future feature).

package io.quarkiverse.langchain4j.huggingface.deployment;

/**
 * Build-time configuration for moderation model.
 * This is infrastructure for a future moderation capability.
 */
@io.smallrye.config.ConfigGroup
public interface ModerationModelBuildConfig {

    /**
     * Whether the moderation model should be enabled at build time.
     * When false, the moderation model bean will not be created during the build.
     *
     * @return true to enable moderation model (default), false to disable
     */
    @io.quarkus.runtime.annotations.ConfigDocDefault("true")
    java.util.Optional<Boolean> enabled();
}

Build-Time Configuration Examples

Disable chat model at build time:

# Chat model will not be included in the build
quarkus.langchain4j.huggingface.chat-model.enabled=false

Disable embedding model at build time:

# Embedding model will not be included in the build
quarkus.langchain4j.huggingface.embedding-model.enabled=false

Use with profiles to conditionally include models:

# Default: include both models
quarkus.langchain4j.huggingface.chat-model.enabled=true
quarkus.langchain4j.huggingface.embedding-model.enabled=true

# Production: only include chat model
%prod.quarkus.langchain4j.huggingface.embedding-model.enabled=false

# Test: disable both models
%test.quarkus.langchain4j.huggingface.chat-model.enabled=false
%test.quarkus.langchain4j.huggingface.embedding-model.enabled=false

Build-Time vs Runtime Configuration:

  • Build-time (chat-model.enabled, embedding-model.enabled): Controls whether the model beans are created during the Quarkus build. When false, the model will not be available at runtime at all. This reduces application size and startup time.

  • Runtime (enable-integration): Controls whether the models are active at runtime. When false, the beans exist but throw exceptions on use. Useful for temporarily disabling without rebuilding.

Use build-time configuration when you want to permanently exclude models from specific builds (e.g., different features in different environments). Use runtime configuration for temporary disabling or testing.

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-hugging-face

docs

chat-model.md

client-factory.md

configuration.md

embedding-model.md

index.md

tile.json