LangChain4j integration library for Hugging Face inference capabilities including chat, language, and embedding models
Complete reference for all configuration options across all model types.
All model builders (ChatModel, LanguageModel, EmbeddingModel) share these configuration options:
Hugging Face API access token for authentication.
public Builder accessToken(String accessToken)Get Token: https://huggingface.co/settings/tokens
.accessToken(System.getenv("HF_API_KEY"))Required: Yes
Default: None
Environment Variable: Recommended to use HF_API_KEY
Identifier of the Hugging Face model to use.
public Builder modelId(String modelId)Examples:
"sentence-transformers/all-MiniLM-L6-v2""tiiuae/falcon-7b-instruct".modelId("sentence-transformers/all-MiniLM-L6-v2")Required: Recommended (uses default if not specified)
Default: Model-specific default
Format: organization/model-name
Custom API base URL.
public Builder baseUrl(String baseUrl).baseUrl("https://custom-endpoint.example.com/")Required: No
Default: "https://router.huggingface.co/hf-inference/"
Use Cases:
Request timeout duration.
public Builder timeout(java.time.Duration timeout)import java.time.Duration;
.timeout(Duration.ofSeconds(30))
.timeout(Duration.ofMinutes(2))
.timeout(Duration.ofMillis(5000))Required: No Default: 15 seconds Recommended: Increase for large models or slow networks
Whether to wait if model is loading.
public Builder waitForModel(Boolean waitForModel).waitForModel(true) // Wait for model (recommended)
.waitForModel(false) // Fail immediately if not readyRequired: No
Default: true
Recommendation: Keep true to avoid 503 errors
Options specific to HuggingFaceChatModel and HuggingFaceLanguageModel (both deprecated).
Sampling temperature for generation randomness.
public Builder temperature(Double temperature).temperature(0.2) // Deterministic, focused
.temperature(0.7) // Balanced
.temperature(1.5) // Creative, randomRequired: No Default: Model default (varies) Range: 0.0 to 2.0 (typical) Guidelines:
Maximum number of new tokens to generate.
public Builder maxNewTokens(Integer maxNewTokens).maxNewTokens(100) // Short responses
.maxNewTokens(500) // Medium responses
.maxNewTokens(2000) // Long responsesRequired: No Default: Model default (varies) Note: Does not include input tokens, only generated tokens
Whether to return full text including prompt.
public Builder returnFullText(Boolean returnFullText).returnFullText(false) // Only generated text (default)
.returnFullText(true) // Prompt + generated textRequired: No
Default: false
Recommendation: Keep false for cleaner responses
| Parameter | Type | Default | Required | Notes |
|---|---|---|---|---|
| accessToken | String | - | ✅ Yes | API authentication |
| modelId | String | default | ⚠️ Recommended | Model identifier |
| baseUrl | String | HF endpoint | ❌ No | Custom endpoint |
| timeout | Duration | 15s | ❌ No | Request timeout |
| waitForModel | Boolean | true | ❌ No | Wait if loading |
| Parameter | Type | Default | Required | Notes |
|---|---|---|---|---|
| accessToken | String | - | ✅ Yes | API authentication |
| modelId | String | default | ⚠️ Recommended | Model identifier |
| baseUrl | String | HF endpoint | ❌ No | Custom endpoint |
| timeout | Duration | 15s | ❌ No | Request timeout |
| waitForModel | Boolean | true | ❌ No | Wait if loading |
| temperature | Double | model default | ❌ No | Sampling temperature |
| maxNewTokens | Integer | model default | ❌ No | Max tokens to generate |
| returnFullText | Boolean | false | ❌ No | Include prompt in response |
Same as ChatModel configuration above.
HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
.accessToken(System.getenv("HF_API_KEY"))
.build();HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
.accessToken(System.getenv("HF_API_KEY"))
.modelId("sentence-transformers/all-MiniLM-L6-v2")
.build();import java.time.Duration;
HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
.accessToken(System.getenv("HF_API_KEY"))
.modelId("sentence-transformers/all-MiniLM-L6-v2")
.timeout(Duration.ofSeconds(30))
.waitForModel(true)
.build();HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
.accessToken(System.getenv("HF_API_KEY"))
.modelId("custom-model")
.baseUrl("https://custom-endpoint.example.com/")
.timeout(Duration.ofMinutes(1))
.build();HuggingFaceChatModel model = HuggingFaceChatModel.builder()
.accessToken(System.getenv("HF_API_KEY"))
.modelId("tiiuae/falcon-7b-instruct")
.temperature(0.7)
.maxNewTokens(200)
.returnFullText(false)
.waitForModel(true)
.timeout(Duration.ofSeconds(30))
.build();HuggingFaceLanguageModel model = HuggingFaceLanguageModel.builder()
.accessToken(System.getenv("HF_API_KEY"))
.modelId("microsoft/Phi-3-mini-4k-instruct")
.temperature(0.8)
.maxNewTokens(150)
.returnFullText(false)
.waitForModel(true)
.timeout(Duration.ofSeconds(30))
.build();public class ModelConfig {
private static final String API_KEY =
System.getenv("HF_API_KEY");
private static final String MODEL_ID =
System.getenv().getOrDefault(
"HF_MODEL_ID",
"sentence-transformers/all-MiniLM-L6-v2"
);
private static final Duration TIMEOUT = Duration.ofSeconds(
Integer.parseInt(
System.getenv().getOrDefault("HF_TIMEOUT", "15")
)
);
public static HuggingFaceEmbeddingModel createModel() {
return HuggingFaceEmbeddingModel.builder()
.accessToken(API_KEY)
.modelId(MODEL_ID)
.timeout(TIMEOUT)
.build();
}
}public class HuggingFaceConfig {
private String accessToken;
private String modelId;
private Duration timeout;
private boolean waitForModel;
// Getters and setters...
public HuggingFaceEmbeddingModel buildEmbeddingModel() {
return HuggingFaceEmbeddingModel.builder()
.accessToken(accessToken)
.modelId(modelId)
.timeout(timeout)
.waitForModel(waitForModel)
.build();
}
}@Configuration
public class HuggingFaceConfiguration {
@Value("${huggingface.api.key}")
private String apiKey;
@Value("${huggingface.model.id:sentence-transformers/all-MiniLM-L6-v2}")
private String modelId;
@Value("${huggingface.timeout:30}")
private int timeoutSeconds;
@Bean
public HuggingFaceEmbeddingModel embeddingModel() {
return HuggingFaceEmbeddingModel.builder()
.accessToken(apiKey)
.modelId(modelId)
.timeout(Duration.ofSeconds(timeoutSeconds))
.waitForModel(true)
.build();
}
}All models provide quick construction methods with minimal configuration:
public static HuggingFaceEmbeddingModel withAccessToken(String accessToken)
public static HuggingFaceChatModel withAccessToken(String accessToken)
public static HuggingFaceLanguageModel withAccessToken(String accessToken)Creates model with only access token, using all defaults:
HuggingFaceEmbeddingModel model =
HuggingFaceEmbeddingModel.withAccessToken(System.getenv("HF_API_KEY"));
HuggingFaceChatModel chatModel =
HuggingFaceChatModel.withAccessToken(System.getenv("HF_API_KEY"));
HuggingFaceLanguageModel langModel =
HuggingFaceLanguageModel.withAccessToken(System.getenv("HF_API_KEY"));Models also provide public constructors (not recommended, use builders):
public HuggingFaceEmbeddingModel(
String accessToken,
String modelId,
Boolean waitForModel,
java.time.Duration timeout
)
public HuggingFaceEmbeddingModel(
String baseUrl,
String accessToken,
String modelId,
Boolean waitForModel,
java.time.Duration timeout
)public HuggingFaceChatModel(
String accessToken,
String modelId,
java.time.Duration timeout,
Double temperature,
Integer maxNewTokens,
Boolean returnFullText,
Boolean waitForModel
)
public HuggingFaceChatModel(
String baseUrl,
String accessToken,
String modelId,
java.time.Duration timeout,
Double temperature,
Integer maxNewTokens,
Boolean returnFullText,
Boolean waitForModel
)Same signatures as ChatModel constructors.
Validation occurs at build time:
try {
HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
// Missing accessToken
.modelId("sentence-transformers/all-MiniLM-L6-v2")
.build();
} catch (IllegalArgumentException e) {
System.err.println("Configuration error: " + e.getMessage());
}Common Validation Errors:
accessTokenbaseUrl.accessToken(System.getenv("HF_API_KEY")) // ✅ Good
.accessToken("hf_...") // ❌ Bad (hardcoded)// For embeddings (usually fast)
.timeout(Duration.ofSeconds(15))
// For large language models
.timeout(Duration.ofSeconds(60)).waitForModel(true) // ✅ Recommended (avoid 503 errors)// ✅ Good: Builder pattern
HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
.accessToken(token)
.build();
// ❌ Avoid: Direct constructor
HuggingFaceEmbeddingModel model = new HuggingFaceEmbeddingModel(
token, null, true, Duration.ofSeconds(15)
);// For factual, deterministic tasks
.temperature(0.2)
// For creative tasks
.temperature(0.9)// Fast embeddings: shorter timeout
HuggingFaceEmbeddingModel fastModel = HuggingFaceEmbeddingModel.builder()
.accessToken(apiKey)
.timeout(Duration.ofSeconds(10))
.build();
// Large models: longer timeout
HuggingFaceChatModel slowModel = HuggingFaceChatModel.builder()
.accessToken(apiKey)
.timeout(Duration.ofMinutes(2))
.build();// Use regional endpoint for lower latency
HuggingFaceEmbeddingModel model = HuggingFaceEmbeddingModel.builder()
.accessToken(apiKey)
.baseUrl("https://region.huggingface.co/hf-inference/")
.build();Install with Tessl CLI
npx tessl i tessl/maven-dev-langchain4j--langchain4j-hugging-face