Quarkus extension for integrating IBM watsonx.ai foundation models with LangChain4j. Provides chat models, generation models, streaming models, embedding models, and scoring models for IBM watsonx.ai. Includes comprehensive configuration options, support for tool/function calling, text extraction from documents in Cloud Object Storage, and experimental built-in services for Google search, weather, and web crawling. Designed for enterprise Java applications using the Quarkus framework with built-in dependency injection and native compilation support.
Override default model parameters on a per-request basis with Watsonx-specific parameter classes. These classes extend LangChain4j's DefaultChatRequestParameters and provide additional Watsonx-specific options for fine-grained control over model behavior.
Watsonx-specific request parameters for chat models with tool support.
public class WatsonxChatRequestParameters extends dev.langchain4j.model.chat.request.DefaultChatRequestParameters {
public static Builder builder();
// Watsonx-specific methods
public Map<String, Integer> logitBias();
public Boolean logprobs();
public Integer topLogprobs();
public Integer n();
public Integer seed();
public String toolChoiceName();
public Duration timeLimit();
// Inherited from DefaultChatRequestParameters
public String modelName();
public Integer maxOutputTokens();
public Double temperature();
public Double topP();
public Integer topK();
public Double frequencyPenalty();
public Double presencePenalty();
public List<String> stopSequences();
public ToolChoice toolChoice();
public List<ToolSpecification> toolSpecifications();
public ResponseFormat responseFormat();
// Merge with other parameters
public ChatRequestParameters overrideWith(ChatRequestParameters that);
public static class Builder {
// Inherited parameters
public Builder modelName(String modelName);
public Builder maxOutputTokens(Integer maxOutputTokens);
public Builder temperature(Double temperature);
public Builder topP(Double topP);
public Builder topK(Integer topK);
public Builder frequencyPenalty(Double frequencyPenalty);
public Builder presencePenalty(Double presencePenalty);
public Builder stopSequences(List<String> stopSequences);
public Builder toolChoice(ToolChoice toolChoice);
public Builder toolSpecifications(List<ToolSpecification> toolSpecifications);
public Builder responseFormat(ResponseFormat responseFormat);
// Watsonx-specific parameters
public Builder logitBias(Map<String, Integer> logitBias);
public Builder logprobs(Boolean logprobs);
public Builder topLogprobs(Integer topLogprobs);
public Builder n(Integer n);
public Builder seed(Integer seed);
public Builder toolChoiceName(String toolChoiceName);
public Builder timeLimit(Duration timeLimit);
public WatsonxChatRequestParameters build();
}
}Parameter Details:
modelName (String): Override model identifier for this request
maxOutputTokens (Integer): Maximum tokens to generate
temperature (Double): Sampling temperature
topP (Double): Nucleus sampling parameter
topK (Integer): Top-K sampling parameter
frequencyPenalty (Double): Penalize frequent tokens
presencePenalty (Double): Penalize tokens that have appeared
stopSequences (List<String>): Stop sequences (max 4)
toolChoice (ToolChoice): Tool selection strategy
toolSpecifications (List<ToolSpecification>): Available tools for this request
responseFormat (ResponseFormat): Structured output format
logitBias (Map<String, Integer>): Token bias adjustments
logprobs (Boolean): Return log probabilities
topLogprobs (Integer): Number of top log probabilities
n (Integer): Number of completions to generate
seed (Integer): Random seed for reproducibility
toolChoiceName (String): Specific tool name to call
timeLimit (Duration): Maximum time for request
Watsonx-specific request parameters for legacy generation models.
public class WatsonxGenerationRequestParameters extends dev.langchain4j.model.chat.request.DefaultChatRequestParameters {
public static Builder builder();
// Generation-specific methods
public String decodingMethod();
public LengthPenalty lengthPenalty();
public Integer minNewTokens();
public Integer randomSeed();
public Duration timeLimit();
public Double repetitionPenalty();
public Integer truncateInputTokens();
public Boolean includeStopSequence();
// Inherited from DefaultChatRequestParameters
public String modelName();
public Integer maxOutputTokens();
public Double temperature();
public Double topP();
public Integer topK();
public List<String> stopSequences();
// Merge with other parameters
public ChatRequestParameters overrideWith(ChatRequestParameters that);
public static class Builder {
// Inherited parameters
public Builder modelName(String modelName);
public Builder maxOutputTokens(Integer maxOutputTokens);
public Builder temperature(Double temperature);
public Builder topP(Double topP);
public Builder topK(Integer topK);
public Builder stopSequences(List<String> stopSequences);
// Generation-specific parameters
public Builder decodingMethod(String decodingMethod);
public Builder lengthPenalty(LengthPenalty lengthPenalty);
public Builder minNewTokens(Integer minNewTokens);
public Builder randomSeed(Integer randomSeed);
public Builder timeLimit(Duration timeLimit);
public Builder repetitionPenalty(Double repetitionPenalty);
public Builder truncateInputTokens(Integer truncateInputTokens);
public Builder includeStopSequence(Boolean includeStopSequence);
public WatsonxGenerationRequestParameters build();
}
}Parameter Details:
modelName (String): Override model identifier for this request
maxOutputTokens (Integer): Maximum new tokens to generate
temperature (Double): Sampling temperature
topP (Double): Nucleus sampling parameter
topK (Integer): Top-K sampling parameter
stopSequences (List<String>): Stop sequences (max 6)
decodingMethod (String): Decoding strategy
lengthPenalty (LengthPenalty): Length penalty configuration
minNewTokens (Integer): Minimum tokens to generate
randomSeed (Integer): Random seed for reproducibility
timeLimit (Duration): Maximum time for request
repetitionPenalty (Double): Penalty for repeated tokens
truncateInputTokens (Integer): Truncate input if exceeds limit
includeStopSequence (Boolean): Include stop sequence in output
Configuration for length-based penalties in generation models.
public record LengthPenalty(Double decayFactor, Integer startIndex) {
// decayFactor: > 1.0, controls penalty strength
// startIndex: >= 0, token position where penalty begins
}Usage:
// Create length penalty
LengthPenalty penalty = new LengthPenalty(1.5, 10);
// Use in parameters
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
.lengthPenalty(penalty)
.build();
// Penalty calculation:
// For token position >= startIndex:
// penalty = decayFactor ^ (position - startIndex)
// token_score = token_score / penaltyimport io.quarkiverse.langchain4j.watsonx.WatsonxChatRequestParameters;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.response.ChatResponse;
@ApplicationScoped
public class ChatService {
@Inject
ChatModel chatModel;
public String generateCreative(String prompt) {
// Override default parameters for creative generation
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.temperature(1.5)
.topP(0.95)
.frequencyPenalty(0.3)
.presencePenalty(0.5)
.maxOutputTokens(1000)
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from(prompt)))
.parameters(params)
.build();
ChatResponse response = chatModel.chat(request);
return response.aiMessage().text();
}
public String generateFactual(String query) {
// Override for factual, deterministic generation
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.temperature(0.1)
.topP(0.9)
.seed(42) // Reproducible
.maxOutputTokens(500)
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from(query)))
.parameters(params)
.build();
ChatResponse response = chatModel.chat(request);
return response.aiMessage().text();
}
}import io.quarkiverse.langchain4j.watsonx.WatsonxGenerationRequestParameters;
import dev.langchain4j.model.chat.request.ChatRequest;
@ApplicationScoped
public class GenerationService {
@Inject
ChatModel generationModel; // WatsonxGenerationModel
public String generateLong(String prompt) {
// Force longer responses
LengthPenalty penalty = new LengthPenalty(1.2, 100);
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
.decodingMethod("sample")
.temperature(0.7)
.topK(50)
.topP(0.9)
.minNewTokens(200)
.maxOutputTokens(1000)
.lengthPenalty(penalty)
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from(prompt)))
.parameters(params)
.build();
return generationModel.chat(request).aiMessage().text();
}
public String generateShort(String prompt) {
// Force shorter responses
LengthPenalty penalty = new LengthPenalty(2.0, 20);
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
.decodingMethod("greedy") // Deterministic
.maxOutputTokens(100)
.lengthPenalty(penalty)
.stopSequences(List.of("\n\n", "END"))
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from(prompt)))
.parameters(params)
.build();
return generationModel.chat(request).aiMessage().text();
}
}// Request log probabilities
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.logprobs(true)
.topLogprobs(5) // Get top 5 alternatives at each position
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from("What is AI?")))
.parameters(params)
.build();
ChatResponse response = chatModel.chat(request);
// Access log probabilities from response metadata
// (Implementation depends on response structure)// Bias token probabilities
Map<String, Integer> biases = new HashMap<>();
biases.put("12345", 10); // Increase probability of token 12345
biases.put("67890", -10); // Decrease probability of token 67890
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.logitBias(biases)
.build();
// Useful for:
// - Steering output vocabulary
// - Avoiding specific words/phrases
// - Emphasizing domain-specific termsimport dev.langchain4j.model.chat.request.ToolChoice;
import dev.langchain4j.agent.tool.ToolSpecification;
// Force tool usage
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.toolChoice(ToolChoice.REQUIRED)
.build();
// Force specific tool
WatsonxChatRequestParameters specificTool = WatsonxChatRequestParameters.builder()
.toolChoiceName("get_weather")
.build();
// Add tools for this request only
ToolSpecification weatherTool = ToolSpecification.builder()
.name("get_weather")
.description("Get current weather")
.addParameter("city", "string", "City name")
.build();
WatsonxChatRequestParameters withTools = WatsonxChatRequestParameters.builder()
.toolSpecifications(List.of(weatherTool))
.toolChoice(ToolChoice.AUTO)
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from("What's the weather in Paris?")))
.parameters(withTools)
.build();import dev.langchain4j.model.chat.request.ResponseFormat;
import dev.langchain4j.model.chat.request.json.JsonSchema;
// JSON object mode
WatsonxChatRequestParameters jsonParams = WatsonxChatRequestParameters.builder()
.responseFormat(ResponseFormat.JSON)
.build();
// JSON schema mode
String schemaJson = """
{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
""";
JsonSchema schema = JsonSchema.builder()
.name("person")
.schema(schemaJson)
.build();
WatsonxChatRequestParameters schemaParams = WatsonxChatRequestParameters.builder()
.responseFormat(ResponseFormat.jsonSchema(schema))
.build();import java.time.Duration;
// Set time limit for request
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.timeLimit(Duration.ofSeconds(30))
.build();
// Request will fail if takes longer than 30 seconds
// Useful for:
// - Latency-sensitive applications
// - Preventing long-running requests
// - Resource management// Generate multiple independent responses
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.n(3) // Generate 3 completions
.temperature(1.0)
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from("Tell me a joke")))
.parameters(params)
.build();
ChatResponse response = chatModel.chat(request);
// Response contains 3 different joke completions// Deterministic generation with seed
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.seed(42)
.temperature(0.7) // Still uses sampling but deterministic
.build();
ChatRequest request = ChatRequest.builder()
.messages(List.of(UserMessage.from("Hello")))
.parameters(params)
.build();
ChatResponse response1 = chatModel.chat(request);
ChatResponse response2 = chatModel.chat(request);
// response1 and response2 will be identical// Truncate long inputs
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
.truncateInputTokens(2048) // Truncate from left if exceeds 2048 tokens
.maxOutputTokens(500)
.build();
// Useful for:
// - Handling variable-length inputs
// - Staying within context limits
// - Preventing token overflow errors// Control stop sequences
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
.stopSequences(List.of("\n\n", "END", "---"))
.includeStopSequence(false) // Exclude matched sequence from output
.build();
// With includeStopSequence=true
WatsonxGenerationRequestParameters includeParams = WatsonxGenerationRequestParameters.builder()
.stopSequences(List.of("END"))
.includeStopSequence(true) // Include "END" in output
.build();// Default parameters
WatsonxChatRequestParameters defaults = WatsonxChatRequestParameters.builder()
.temperature(0.7)
.maxOutputTokens(1000)
.frequencyPenalty(0.5)
.build();
// Override specific parameters
WatsonxChatRequestParameters overrides = WatsonxChatRequestParameters.builder()
.temperature(1.2) // Override temperature
.seed(42) // Add seed
.build();
// Merge parameters
ChatRequestParameters merged = defaults.overrideWith(overrides);
// Result: temperature=1.2, maxOutputTokens=1000, frequencyPenalty=0.5, seed=42// Validate parameter ranges
public WatsonxChatRequestParameters buildValidParams(double temperature) {
if (temperature < 0.0 || temperature > 2.0) {
throw new IllegalArgumentException("Temperature must be between 0 and 2");
}
return WatsonxChatRequestParameters.builder()
.temperature(temperature)
.build();
}// Model-level: Default for all requests
WatsonxChatModel model = WatsonxChatModel.builder()
.temperature(0.7) // Default temperature
.maxTokens(1000) // Default max tokens
.build();
// Request-level: Override for specific request
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
.temperature(1.5) // Override for this request only
.build();
ChatRequest request = ChatRequest.builder()
.messages(messages)
.parameters(params)
.build();
// This request uses temperature=1.5, maxTokens=1000public class ParameterPresets {
// Creative generation
public static WatsonxChatRequestParameters creative() {
return WatsonxChatRequestParameters.builder()
.temperature(1.5)
.topP(0.95)
.frequencyPenalty(0.3)
.presencePenalty(0.5)
.maxOutputTokens(2000)
.build();
}
// Factual generation
public static WatsonxChatRequestParameters factual() {
return WatsonxChatRequestParameters.builder()
.temperature(0.1)
.topP(0.9)
.maxOutputTokens(500)
.build();
}
// Balanced generation
public static WatsonxChatRequestParameters balanced() {
return WatsonxChatRequestParameters.builder()
.temperature(0.7)
.topP(0.9)
.frequencyPenalty(0.5)
.maxOutputTokens(1000)
.build();
}
}
// Usage
ChatRequest request = ChatRequest.builder()
.messages(messages)
.parameters(ParameterPresets.creative())
.build();public class DynamicParameterSelector {
public WatsonxChatRequestParameters selectParameters(String taskType) {
return switch (taskType) {
case "creative_writing" -> WatsonxChatRequestParameters.builder()
.temperature(1.5)
.topP(0.95)
.maxOutputTokens(2000)
.build();
case "code_generation" -> WatsonxChatRequestParameters.builder()
.temperature(0.2)
.topP(0.9)
.maxOutputTokens(1500)
.build();
case "summarization" -> WatsonxChatRequestParameters.builder()
.temperature(0.3)
.maxOutputTokens(500)
.frequencyPenalty(0.3)
.build();
case "conversation" -> WatsonxChatRequestParameters.builder()
.temperature(0.8)
.topP(0.9)
.presencePenalty(0.6)
.maxOutputTokens(1000)
.build();
default -> WatsonxChatRequestParameters.builder()
.temperature(0.7)
.maxOutputTokens(1000)
.build();
};
}
}From LangChain4j:
public class DefaultChatRequestParameters implements ChatRequestParameters {
public String modelName();
public Integer maxOutputTokens();
public Double temperature();
public Double topP();
public Integer topK();
public Double frequencyPenalty();
public Double presencePenalty();
public List<String> stopSequences();
public ToolChoice toolChoice();
public List<ToolSpecification> toolSpecifications();
public ResponseFormat responseFormat();
}From LangChain4j:
public enum ToolChoice {
AUTO, // Model decides whether to use tools
REQUIRED // Model must use at least one tool
}From LangChain4j:
public class ResponseFormat {
public static ResponseFormat TEXT = new ResponseFormat("text");
public static ResponseFormat JSON = new ResponseFormat("json_object");
public static ResponseFormat jsonSchema(JsonSchema schema);
public String type();
public JsonSchema jsonSchema();
}From LangChain4j:
public class JsonSchema {
public static Builder builder();
public String name();
public String schema();
public Boolean strict();
public static class Builder {
public Builder name(String name);
public Builder schema(String schema);
public Builder strict(Boolean strict);
public JsonSchema build();
}
}Install with Tessl CLI
npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-watsonx@1.7.0