CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-hugging-face

LangChain4j integration library for Hugging Face inference capabilities including chat, language, and embedding models

Overview
Eval results
Files

configuration.mddocs/

Configuration Guide

Complete reference for all configuration options across all model types.

Common Configuration

All model builders (ChatModel, LanguageModel, EmbeddingModel) share these configuration options:

accessToken (Required)

Hugging Face API access token for authentication.

public Builder accessToken(String accessToken)

Get Token: https://huggingface.co/settings/tokens

.accessToken(System.getenv("HF_API_KEY"))

Required: Yes Default: None Environment Variable: Recommended to use HF_API_KEY

modelId

Identifier of the Hugging Face model to use.

public Builder modelId(String modelId)

Examples:

  • Embeddings: "sentence-transformers/all-MiniLM-L6-v2"
  • Chat/Language: "tiiuae/falcon-7b-instruct"
.modelId("sentence-transformers/all-MiniLM-L6-v2")

Required: Recommended (uses default if not specified) Default: Model-specific default Format: organization/model-name

baseUrl

Custom API base URL.

public Builder baseUrl(String baseUrl)
.baseUrl("https://custom-endpoint.example.com/")

Required: No Default: "https://router.huggingface.co/hf-inference/" Use Cases:

  • Custom inference endpoints
  • Private deployments
  • Regional endpoints

timeout

Request timeout duration.

public Builder timeout(java.time.Duration timeout)
import java.time.Duration;

.timeout(Duration.ofSeconds(30))
.timeout(Duration.ofMinutes(2))
.timeout(Duration.ofMillis(5000))

Required: No Default: 15 seconds Recommended: Increase for large models or slow networks

waitForModel

Whether to wait if model is loading.

public Builder waitForModel(Boolean waitForModel)
.waitForModel(true)   // Wait for model (recommended)
.waitForModel(false)  // Fail immediately if not ready

Required: No Default: true Recommendation: Keep true to avoid 503 errors

Text Generation Configuration

Options specific to HuggingFaceChatModel and HuggingFaceLanguageModel (both deprecated).

temperature

Sampling temperature for generation randomness.

public Builder temperature(Double temperature)
.temperature(0.2)   // Deterministic, focused
.temperature(0.7)   // Balanced
.temperature(1.5)   // Creative, random

Required: No Default: Model default (varies) Range: 0.0 to 2.0 (typical) Guidelines:

  • 0.0-0.3: Deterministic output (code, factual)
  • 0.4-0.9: Balanced creativity and coherence
  • 1.0-2.0: High creativity (stories, brainstorming)

maxNewTokens

Maximum number of new tokens to generate.

public Builder maxNewTokens(Integer maxNewTokens)
.maxNewTokens(100)   // Short responses
.maxNewTokens(500)   // Medium responses
.maxNewTokens(2000)  // Long responses

Required: No Default: Model default (varies) Note: Does not include input tokens, only generated tokens

returnFullText

Whether to return full text including prompt.

public Builder returnFullText(Boolean returnFullText)
.returnFullText(false)  // Only generated text (default)
.returnFullText(true)   // Prompt + generated text

Required: No Default: false Recommendation: Keep false for cleaner responses

Configuration Matrix

EmbeddingModel Configuration

ParameterTypeDefaultRequiredNotes
accessTokenString-✅ YesAPI authentication
modelIdStringdefault⚠️ RecommendedModel identifier
baseUrlStringHF endpoint❌ NoCustom endpoint
timeoutDuration15s❌ NoRequest timeout
waitForModelBooleantrue❌ NoWait if loading

ChatModel Configuration (Deprecated)

ParameterTypeDefaultRequiredNotes
accessTokenString-✅ YesAPI authentication
modelIdStringdefault⚠️ RecommendedModel identifier
baseUrlStringHF endpoint❌ NoCustom endpoint
timeoutDuration15s❌ NoRequest timeout
waitForModelBooleantrue❌ NoWait if loading
temperatureDoublemodel default❌ NoSampling temperature
maxNewTokensIntegermodel default❌ NoMax tokens to generate
returnFullTextBooleanfalse❌ NoInclude prompt in response

LanguageModel Configuration (Deprecated)

Same as ChatModel configuration above.

Configuration Examples

Minimal Configuration

HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
    .accessToken(System.getenv("HF_API_KEY"))
    .build();

Standard Configuration

HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
    .accessToken(System.getenv("HF_API_KEY"))
    .modelId("sentence-transformers/all-MiniLM-L6-v2")
    .build();

Production Configuration

import java.time.Duration;

HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
    .accessToken(System.getenv("HF_API_KEY"))
    .modelId("sentence-transformers/all-MiniLM-L6-v2")
    .timeout(Duration.ofSeconds(30))
    .waitForModel(true)
    .build();

Custom Endpoint Configuration

HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
    .accessToken(System.getenv("HF_API_KEY"))
    .modelId("custom-model")
    .baseUrl("https://custom-endpoint.example.com/")
    .timeout(Duration.ofMinutes(1))
    .build();

Chat Model Configuration (Deprecated)

HuggingFaceChatModel model = HuggingFaceChatModel.builder()
    .accessToken(System.getenv("HF_API_KEY"))
    .modelId("tiiuae/falcon-7b-instruct")
    .temperature(0.7)
    .maxNewTokens(200)
    .returnFullText(false)
    .waitForModel(true)
    .timeout(Duration.ofSeconds(30))
    .build();

Language Model Configuration (Deprecated)

HuggingFaceLanguageModel model = HuggingFaceLanguageModel.builder()
    .accessToken(System.getenv("HF_API_KEY"))
    .modelId("microsoft/Phi-3-mini-4k-instruct")
    .temperature(0.8)
    .maxNewTokens(150)
    .returnFullText(false)
    .waitForModel(true)
    .timeout(Duration.ofSeconds(30))
    .build();

Configuration Patterns

Environment-Based Configuration

public class ModelConfig {
    private static final String API_KEY =
        System.getenv("HF_API_KEY");
    private static final String MODEL_ID =
        System.getenv().getOrDefault(
            "HF_MODEL_ID",
            "sentence-transformers/all-MiniLM-L6-v2"
        );
    private static final Duration TIMEOUT = Duration.ofSeconds(
        Integer.parseInt(
            System.getenv().getOrDefault("HF_TIMEOUT", "15")
        )
    );

    public static HuggingFaceEmbeddingModel createModel() {
        return HuggingFaceEmbeddingModel.builder()
            .accessToken(API_KEY)
            .modelId(MODEL_ID)
            .timeout(TIMEOUT)
            .build();
    }
}

Configuration Object Pattern

public class HuggingFaceConfig {
    private String accessToken;
    private String modelId;
    private Duration timeout;
    private boolean waitForModel;

    // Getters and setters...

    public HuggingFaceEmbeddingModel buildEmbeddingModel() {
        return HuggingFaceEmbeddingModel.builder()
            .accessToken(accessToken)
            .modelId(modelId)
            .timeout(timeout)
            .waitForModel(waitForModel)
            .build();
    }
}

Spring Boot Configuration

@Configuration
public class HuggingFaceConfiguration {

    @Value("${huggingface.api.key}")
    private String apiKey;

    @Value("${huggingface.model.id:sentence-transformers/all-MiniLM-L6-v2}")
    private String modelId;

    @Value("${huggingface.timeout:30}")
    private int timeoutSeconds;

    @Bean
    public HuggingFaceEmbeddingModel embeddingModel() {
        return HuggingFaceEmbeddingModel.builder()
            .accessToken(apiKey)
            .modelId(modelId)
            .timeout(Duration.ofSeconds(timeoutSeconds))
            .waitForModel(true)
            .build();
    }
}

Quick Construction Methods

All models provide quick construction methods with minimal configuration:

withAccessToken()

public static HuggingFaceEmbeddingModel withAccessToken(String accessToken)
public static HuggingFaceChatModel withAccessToken(String accessToken)
public static HuggingFaceLanguageModel withAccessToken(String accessToken)

Creates model with only access token, using all defaults:

HuggingFaceEmbeddingModel model =
    HuggingFaceEmbeddingModel.withAccessToken(System.getenv("HF_API_KEY"));

HuggingFaceChatModel chatModel =
    HuggingFaceChatModel.withAccessToken(System.getenv("HF_API_KEY"));

HuggingFaceLanguageModel langModel =
    HuggingFaceLanguageModel.withAccessToken(System.getenv("HF_API_KEY"));

Legacy Constructors

Models also provide public constructors (not recommended, use builders):

EmbeddingModel Constructors

public HuggingFaceEmbeddingModel(
    String accessToken,
    String modelId,
    Boolean waitForModel,
    java.time.Duration timeout
)

public HuggingFaceEmbeddingModel(
    String baseUrl,
    String accessToken,
    String modelId,
    Boolean waitForModel,
    java.time.Duration timeout
)

ChatModel Constructors (Deprecated)

public HuggingFaceChatModel(
    String accessToken,
    String modelId,
    java.time.Duration timeout,
    Double temperature,
    Integer maxNewTokens,
    Boolean returnFullText,
    Boolean waitForModel
)

public HuggingFaceChatModel(
    String baseUrl,
    String accessToken,
    String modelId,
    java.time.Duration timeout,
    Double temperature,
    Integer maxNewTokens,
    Boolean returnFullText,
    Boolean waitForModel
)

LanguageModel Constructors (Deprecated)

Same signatures as ChatModel constructors.

Configuration Validation

Validation occurs at build time:

try {
    HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
        // Missing accessToken
        .modelId("sentence-transformers/all-MiniLM-L6-v2")
        .build();
} catch (IllegalArgumentException e) {
    System.err.println("Configuration error: " + e.getMessage());
}

Common Validation Errors:

  • Missing or null accessToken
  • Invalid timeout duration (negative or zero)
  • Invalid URL format for baseUrl

Best Practices

  1. Use Environment Variables for Secrets:
.accessToken(System.getenv("HF_API_KEY"))  // ✅ Good
.accessToken("hf_...")                      // ❌ Bad (hardcoded)
  1. Set Appropriate Timeouts:
// For embeddings (usually fast)
.timeout(Duration.ofSeconds(15))

// For large language models
.timeout(Duration.ofSeconds(60))
  1. Always Enable waitForModel:
.waitForModel(true)  // ✅ Recommended (avoid 503 errors)
  1. Use Builder Pattern:
// ✅ Good: Builder pattern
HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
    .accessToken(token)
    .build();

// ❌ Avoid: Direct constructor
HuggingFaceEmbeddingModel model = new HuggingFaceEmbeddingModel(
    token, null, true, Duration.ofSeconds(15)
);
  1. Configure Appropriate Temperature:
// For factual, deterministic tasks
.temperature(0.2)

// For creative tasks
.temperature(0.9)

Performance Tuning

Timeout Optimization

// Fast embeddings: shorter timeout
HuggingFaceEmbeddingModel fastModel = HuggingFaceEmbeddingModel.builder()
    .accessToken(apiKey)
    .timeout(Duration.ofSeconds(10))
    .build();

// Large models: longer timeout
HuggingFaceChatModel slowModel = HuggingFaceChatModel.builder()
    .accessToken(apiKey)
    .timeout(Duration.ofMinutes(2))
    .build();

Regional Endpoints

// Use regional endpoint for lower latency
HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
    .accessToken(apiKey)
    .baseUrl("https://region.huggingface.co/hf-inference/")
    .build();

Related Documentation

  • Quick Start Guide - Getting started examples
  • Embedding Model API - EmbeddingModel details
  • Chat Model API - ChatModel details (deprecated)
  • Language Model API - LanguageModel details (deprecated)
  • Error Handling - Error scenarios
  • SPI Extensions - Custom configuration via SPI

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-hugging-face

docs

chat-model.md

client-api.md

common-tasks.md

configuration.md

embedding-model.md

error-handling.md

index.md

language-model.md

migration-guide.md

model-names.md

quick-start.md

spi-extensions.md

tile.json