CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-watsonx

Quarkus extension for integrating IBM watsonx.ai foundation models with LangChain4j. Provides chat models, generation models, streaming models, embedding models, and scoring models for IBM watsonx.ai. Includes comprehensive configuration options, support for tool/function calling, text extraction from documents in Cloud Object Storage, and experimental built-in services for Google search, weather, and web crawling. Designed for enterprise Java applications using the Quarkus framework with built-in dependency injection and native compilation support.

Overview
Eval results
Files

request-parameters.mddocs/

Request Parameters

Override default model parameters on a per-request basis with Watsonx-specific parameter classes. These classes extend LangChain4j's DefaultChatRequestParameters and provide additional Watsonx-specific options for fine-grained control over model behavior.

Capabilities

Chat Request Parameters

Watsonx-specific request parameters for chat models with tool support.

public class WatsonxChatRequestParameters extends dev.langchain4j.model.chat.request.DefaultChatRequestParameters {
    public static Builder builder();

    // Watsonx-specific methods
    public Map<String, Integer> logitBias();
    public Boolean logprobs();
    public Integer topLogprobs();
    public Integer n();
    public Integer seed();
    public String toolChoiceName();
    public Duration timeLimit();

    // Inherited from DefaultChatRequestParameters
    public String modelName();
    public Integer maxOutputTokens();
    public Double temperature();
    public Double topP();
    public Integer topK();
    public Double frequencyPenalty();
    public Double presencePenalty();
    public List<String> stopSequences();
    public ToolChoice toolChoice();
    public List<ToolSpecification> toolSpecifications();
    public ResponseFormat responseFormat();

    // Merge with other parameters
    public ChatRequestParameters overrideWith(ChatRequestParameters that);

    public static class Builder {
        // Inherited parameters
        public Builder modelName(String modelName);
        public Builder maxOutputTokens(Integer maxOutputTokens);
        public Builder temperature(Double temperature);
        public Builder topP(Double topP);
        public Builder topK(Integer topK);
        public Builder frequencyPenalty(Double frequencyPenalty);
        public Builder presencePenalty(Double presencePenalty);
        public Builder stopSequences(List<String> stopSequences);
        public Builder toolChoice(ToolChoice toolChoice);
        public Builder toolSpecifications(List<ToolSpecification> toolSpecifications);
        public Builder responseFormat(ResponseFormat responseFormat);

        // Watsonx-specific parameters
        public Builder logitBias(Map<String, Integer> logitBias);
        public Builder logprobs(Boolean logprobs);
        public Builder topLogprobs(Integer topLogprobs);
        public Builder n(Integer n);
        public Builder seed(Integer seed);
        public Builder toolChoiceName(String toolChoiceName);
        public Builder timeLimit(Duration timeLimit);

        public WatsonxChatRequestParameters build();
    }
}

Parameter Details:

  • modelName (String): Override model identifier for this request

    • Allows using different model than default configured model
  • maxOutputTokens (Integer): Maximum tokens to generate

    • Overrides model's default maxTokens setting
    • Includes prompt tokens in some models
  • temperature (Double): Sampling temperature

    • Range: 0.0 to 2.0
    • Lower = more focused, Higher = more creative
  • topP (Double): Nucleus sampling parameter

    • Range: 0.0 to 1.0
    • Limits sampling to tokens in top P probability mass
  • topK (Integer): Top-K sampling parameter

    • Not used in Watsonx chat models
    • Provided for LangChain4j compatibility
  • frequencyPenalty (Double): Penalize frequent tokens

    • Range: -2.0 to 2.0
    • Positive values reduce repetition
  • presencePenalty (Double): Penalize tokens that have appeared

    • Range: -2.0 to 2.0
    • Positive values encourage topic diversity
  • stopSequences (List<String>): Stop sequences (max 4)

    • Generation stops when any sequence is encountered
  • toolChoice (ToolChoice): Tool selection strategy

    • ToolChoice.AUTO: Model decides whether to use tools
    • ToolChoice.REQUIRED: Model must use at least one tool
  • toolSpecifications (List<ToolSpecification>): Available tools for this request

    • Overrides or extends default tools
  • responseFormat (ResponseFormat): Structured output format

    • ResponseFormat.TEXT: Plain text
    • ResponseFormat.JSON: JSON object
    • ResponseFormat.jsonSchema(schema): JSON with schema validation
  • logitBias (Map<String, Integer>): Token bias adjustments

    • Map of token ID to bias value
    • Bias range: typically -100 to 100
    • Positive bias increases token probability
    • Negative bias decreases token probability
  • logprobs (Boolean): Return log probabilities

    • If true, response includes log probabilities for tokens
    • Useful for analyzing model confidence
  • topLogprobs (Integer): Number of top log probabilities

    • Range: 0 to 20
    • Requires logprobs=true
    • Returns top N most probable tokens at each position
  • n (Integer): Number of completions to generate

    • Default: 1
    • Generates multiple independent responses
  • seed (Integer): Random seed for reproducibility

    • Makes generations deterministic
    • Same seed + parameters = same output
  • toolChoiceName (String): Specific tool name to call

    • Forces model to use the named tool
    • Overrides toolChoice setting
  • timeLimit (Duration): Maximum time for request

    • Request fails if exceeds time limit
    • Useful for controlling latency

Generation Request Parameters

Watsonx-specific request parameters for legacy generation models.

public class WatsonxGenerationRequestParameters extends dev.langchain4j.model.chat.request.DefaultChatRequestParameters {
    public static Builder builder();

    // Generation-specific methods
    public String decodingMethod();
    public LengthPenalty lengthPenalty();
    public Integer minNewTokens();
    public Integer randomSeed();
    public Duration timeLimit();
    public Double repetitionPenalty();
    public Integer truncateInputTokens();
    public Boolean includeStopSequence();

    // Inherited from DefaultChatRequestParameters
    public String modelName();
    public Integer maxOutputTokens();
    public Double temperature();
    public Double topP();
    public Integer topK();
    public List<String> stopSequences();

    // Merge with other parameters
    public ChatRequestParameters overrideWith(ChatRequestParameters that);

    public static class Builder {
        // Inherited parameters
        public Builder modelName(String modelName);
        public Builder maxOutputTokens(Integer maxOutputTokens);
        public Builder temperature(Double temperature);
        public Builder topP(Double topP);
        public Builder topK(Integer topK);
        public Builder stopSequences(List<String> stopSequences);

        // Generation-specific parameters
        public Builder decodingMethod(String decodingMethod);
        public Builder lengthPenalty(LengthPenalty lengthPenalty);
        public Builder minNewTokens(Integer minNewTokens);
        public Builder randomSeed(Integer randomSeed);
        public Builder timeLimit(Duration timeLimit);
        public Builder repetitionPenalty(Double repetitionPenalty);
        public Builder truncateInputTokens(Integer truncateInputTokens);
        public Builder includeStopSequence(Boolean includeStopSequence);

        public WatsonxGenerationRequestParameters build();
    }
}

Parameter Details:

  • modelName (String): Override model identifier for this request

  • maxOutputTokens (Integer): Maximum new tokens to generate

    • Maps to maxNewTokens in generation API
    • Does not include prompt tokens
  • temperature (Double): Sampling temperature

    • Range: 0.0 to 2.0
    • Only applies when decodingMethod="sample"
  • topP (Double): Nucleus sampling parameter

    • Range: 0.0 to 1.0
    • Only applies when decodingMethod="sample"
  • topK (Integer): Top-K sampling parameter

    • Range: 1 to 100
    • Only applies when decodingMethod="sample"
  • stopSequences (List<String>): Stop sequences (max 6)

    • Generation stops when any sequence is encountered
  • decodingMethod (String): Decoding strategy

    • Values: "greedy", "sample"
    • "greedy": Always select highest probability token
    • "sample": Sample from probability distribution
  • lengthPenalty (LengthPenalty): Length penalty configuration

    • Record with decayFactor (>1) and startIndex (>=0)
    • Penalizes longer sequences
  • minNewTokens (Integer): Minimum tokens to generate

    • Forces model to generate at least this many tokens
  • randomSeed (Integer): Random seed for reproducibility

    • Range: >= 1
    • Makes sampling deterministic
  • timeLimit (Duration): Maximum time for request

    • Request fails if exceeds time limit
  • repetitionPenalty (Double): Penalty for repeated tokens

    • Range: 1.0 to 2.0
    • Higher values discourage repetition
  • truncateInputTokens (Integer): Truncate input if exceeds limit

    • Truncates from left if input exceeds this limit
  • includeStopSequence (Boolean): Include stop sequence in output

    • If true, matched stop sequence is included in generated text

Length Penalty

Configuration for length-based penalties in generation models.

public record LengthPenalty(Double decayFactor, Integer startIndex) {
    // decayFactor: > 1.0, controls penalty strength
    // startIndex: >= 0, token position where penalty begins
}

Usage:

// Create length penalty
LengthPenalty penalty = new LengthPenalty(1.5, 10);

// Use in parameters
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
    .lengthPenalty(penalty)
    .build();

// Penalty calculation:
// For token position >= startIndex:
//   penalty = decayFactor ^ (position - startIndex)
//   token_score = token_score / penalty

Usage Examples

Chat Model Parameter Override

import io.quarkiverse.langchain4j.watsonx.WatsonxChatRequestParameters;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.response.ChatResponse;

@ApplicationScoped
public class ChatService {
    @Inject
    ChatModel chatModel;

    public String generateCreative(String prompt) {
        // Override default parameters for creative generation
        WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
            .temperature(1.5)
            .topP(0.95)
            .frequencyPenalty(0.3)
            .presencePenalty(0.5)
            .maxOutputTokens(1000)
            .build();

        ChatRequest request = ChatRequest.builder()
            .messages(List.of(UserMessage.from(prompt)))
            .parameters(params)
            .build();

        ChatResponse response = chatModel.chat(request);
        return response.aiMessage().text();
    }

    public String generateFactual(String query) {
        // Override for factual, deterministic generation
        WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
            .temperature(0.1)
            .topP(0.9)
            .seed(42)  // Reproducible
            .maxOutputTokens(500)
            .build();

        ChatRequest request = ChatRequest.builder()
            .messages(List.of(UserMessage.from(query)))
            .parameters(params)
            .build();

        ChatResponse response = chatModel.chat(request);
        return response.aiMessage().text();
    }
}

Generation Model Parameter Override

import io.quarkiverse.langchain4j.watsonx.WatsonxGenerationRequestParameters;
import dev.langchain4j.model.chat.request.ChatRequest;

@ApplicationScoped
public class GenerationService {
    @Inject
    ChatModel generationModel;  // WatsonxGenerationModel

    public String generateLong(String prompt) {
        // Force longer responses
        LengthPenalty penalty = new LengthPenalty(1.2, 100);

        WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
            .decodingMethod("sample")
            .temperature(0.7)
            .topK(50)
            .topP(0.9)
            .minNewTokens(200)
            .maxOutputTokens(1000)
            .lengthPenalty(penalty)
            .build();

        ChatRequest request = ChatRequest.builder()
            .messages(List.of(UserMessage.from(prompt)))
            .parameters(params)
            .build();

        return generationModel.chat(request).aiMessage().text();
    }

    public String generateShort(String prompt) {
        // Force shorter responses
        LengthPenalty penalty = new LengthPenalty(2.0, 20);

        WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
            .decodingMethod("greedy")  // Deterministic
            .maxOutputTokens(100)
            .lengthPenalty(penalty)
            .stopSequences(List.of("\n\n", "END"))
            .build();

        ChatRequest request = ChatRequest.builder()
            .messages(List.of(UserMessage.from(prompt)))
            .parameters(params)
            .build();

        return generationModel.chat(request).aiMessage().text();
    }
}

Log Probabilities

// Request log probabilities
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
    .logprobs(true)
    .topLogprobs(5)  // Get top 5 alternatives at each position
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(List.of(UserMessage.from("What is AI?")))
    .parameters(params)
    .build();

ChatResponse response = chatModel.chat(request);

// Access log probabilities from response metadata
// (Implementation depends on response structure)

Token Bias

// Bias token probabilities
Map<String, Integer> biases = new HashMap<>();
biases.put("12345", 10);   // Increase probability of token 12345
biases.put("67890", -10);  // Decrease probability of token 67890

WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
    .logitBias(biases)
    .build();

// Useful for:
// - Steering output vocabulary
// - Avoiding specific words/phrases
// - Emphasizing domain-specific terms

Tool Choice Override

import dev.langchain4j.model.chat.request.ToolChoice;
import dev.langchain4j.agent.tool.ToolSpecification;

// Force tool usage
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
    .toolChoice(ToolChoice.REQUIRED)
    .build();

// Force specific tool
WatsonxChatRequestParameters specificTool = WatsonxChatRequestParameters.builder()
    .toolChoiceName("get_weather")
    .build();

// Add tools for this request only
ToolSpecification weatherTool = ToolSpecification.builder()
    .name("get_weather")
    .description("Get current weather")
    .addParameter("city", "string", "City name")
    .build();

WatsonxChatRequestParameters withTools = WatsonxChatRequestParameters.builder()
    .toolSpecifications(List.of(weatherTool))
    .toolChoice(ToolChoice.AUTO)
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(List.of(UserMessage.from("What's the weather in Paris?")))
    .parameters(withTools)
    .build();

Response Format Override

import dev.langchain4j.model.chat.request.ResponseFormat;
import dev.langchain4j.model.chat.request.json.JsonSchema;

// JSON object mode
WatsonxChatRequestParameters jsonParams = WatsonxChatRequestParameters.builder()
    .responseFormat(ResponseFormat.JSON)
    .build();

// JSON schema mode
String schemaJson = """
{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "age": {"type": "integer"}
  },
  "required": ["name", "age"]
}
""";

JsonSchema schema = JsonSchema.builder()
    .name("person")
    .schema(schemaJson)
    .build();

WatsonxChatRequestParameters schemaParams = WatsonxChatRequestParameters.builder()
    .responseFormat(ResponseFormat.jsonSchema(schema))
    .build();

Time Limits

import java.time.Duration;

// Set time limit for request
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
    .timeLimit(Duration.ofSeconds(30))
    .build();

// Request will fail if takes longer than 30 seconds
// Useful for:
// - Latency-sensitive applications
// - Preventing long-running requests
// - Resource management

Multiple Completions

// Generate multiple independent responses
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
    .n(3)  // Generate 3 completions
    .temperature(1.0)
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(List.of(UserMessage.from("Tell me a joke")))
    .parameters(params)
    .build();

ChatResponse response = chatModel.chat(request);
// Response contains 3 different joke completions

Reproducible Generation

// Deterministic generation with seed
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
    .seed(42)
    .temperature(0.7)  // Still uses sampling but deterministic
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(List.of(UserMessage.from("Hello")))
    .parameters(params)
    .build();

ChatResponse response1 = chatModel.chat(request);
ChatResponse response2 = chatModel.chat(request);
// response1 and response2 will be identical

Input Truncation (Generation Models)

// Truncate long inputs
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
    .truncateInputTokens(2048)  // Truncate from left if exceeds 2048 tokens
    .maxOutputTokens(500)
    .build();

// Useful for:
// - Handling variable-length inputs
// - Staying within context limits
// - Preventing token overflow errors

Stop Sequence Control (Generation Models)

// Control stop sequences
WatsonxGenerationRequestParameters params = WatsonxGenerationRequestParameters.builder()
    .stopSequences(List.of("\n\n", "END", "---"))
    .includeStopSequence(false)  // Exclude matched sequence from output
    .build();

// With includeStopSequence=true
WatsonxGenerationRequestParameters includeParams = WatsonxGenerationRequestParameters.builder()
    .stopSequences(List.of("END"))
    .includeStopSequence(true)  // Include "END" in output
    .build();

Parameter Merging

Override with Another Parameters Object

// Default parameters
WatsonxChatRequestParameters defaults = WatsonxChatRequestParameters.builder()
    .temperature(0.7)
    .maxOutputTokens(1000)
    .frequencyPenalty(0.5)
    .build();

// Override specific parameters
WatsonxChatRequestParameters overrides = WatsonxChatRequestParameters.builder()
    .temperature(1.2)  // Override temperature
    .seed(42)           // Add seed
    .build();

// Merge parameters
ChatRequestParameters merged = defaults.overrideWith(overrides);
// Result: temperature=1.2, maxOutputTokens=1000, frequencyPenalty=0.5, seed=42

Best Practices

Parameter Validation

// Validate parameter ranges
public WatsonxChatRequestParameters buildValidParams(double temperature) {
    if (temperature < 0.0 || temperature > 2.0) {
        throw new IllegalArgumentException("Temperature must be between 0 and 2");
    }

    return WatsonxChatRequestParameters.builder()
        .temperature(temperature)
        .build();
}

Request-Level vs Model-Level Configuration

// Model-level: Default for all requests
WatsonxChatModel model = WatsonxChatModel.builder()
    .temperature(0.7)  // Default temperature
    .maxTokens(1000)   // Default max tokens
    .build();

// Request-level: Override for specific request
WatsonxChatRequestParameters params = WatsonxChatRequestParameters.builder()
    .temperature(1.5)  // Override for this request only
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(messages)
    .parameters(params)
    .build();

// This request uses temperature=1.5, maxTokens=1000

Parameter Presets

public class ParameterPresets {
    // Creative generation
    public static WatsonxChatRequestParameters creative() {
        return WatsonxChatRequestParameters.builder()
            .temperature(1.5)
            .topP(0.95)
            .frequencyPenalty(0.3)
            .presencePenalty(0.5)
            .maxOutputTokens(2000)
            .build();
    }

    // Factual generation
    public static WatsonxChatRequestParameters factual() {
        return WatsonxChatRequestParameters.builder()
            .temperature(0.1)
            .topP(0.9)
            .maxOutputTokens(500)
            .build();
    }

    // Balanced generation
    public static WatsonxChatRequestParameters balanced() {
        return WatsonxChatRequestParameters.builder()
            .temperature(0.7)
            .topP(0.9)
            .frequencyPenalty(0.5)
            .maxOutputTokens(1000)
            .build();
    }
}

// Usage
ChatRequest request = ChatRequest.builder()
    .messages(messages)
    .parameters(ParameterPresets.creative())
    .build();

Dynamic Parameter Selection

public class DynamicParameterSelector {
    public WatsonxChatRequestParameters selectParameters(String taskType) {
        return switch (taskType) {
            case "creative_writing" -> WatsonxChatRequestParameters.builder()
                .temperature(1.5)
                .topP(0.95)
                .maxOutputTokens(2000)
                .build();

            case "code_generation" -> WatsonxChatRequestParameters.builder()
                .temperature(0.2)
                .topP(0.9)
                .maxOutputTokens(1500)
                .build();

            case "summarization" -> WatsonxChatRequestParameters.builder()
                .temperature(0.3)
                .maxOutputTokens(500)
                .frequencyPenalty(0.3)
                .build();

            case "conversation" -> WatsonxChatRequestParameters.builder()
                .temperature(0.8)
                .topP(0.9)
                .presencePenalty(0.6)
                .maxOutputTokens(1000)
                .build();

            default -> WatsonxChatRequestParameters.builder()
                .temperature(0.7)
                .maxOutputTokens(1000)
                .build();
        };
    }
}

Type Reference

Default Chat Request Parameters

From LangChain4j:

public class DefaultChatRequestParameters implements ChatRequestParameters {
    public String modelName();
    public Integer maxOutputTokens();
    public Double temperature();
    public Double topP();
    public Integer topK();
    public Double frequencyPenalty();
    public Double presencePenalty();
    public List<String> stopSequences();
    public ToolChoice toolChoice();
    public List<ToolSpecification> toolSpecifications();
    public ResponseFormat responseFormat();
}

Tool Choice

From LangChain4j:

public enum ToolChoice {
    AUTO,      // Model decides whether to use tools
    REQUIRED   // Model must use at least one tool
}

Response Format

From LangChain4j:

public class ResponseFormat {
    public static ResponseFormat TEXT = new ResponseFormat("text");
    public static ResponseFormat JSON = new ResponseFormat("json_object");

    public static ResponseFormat jsonSchema(JsonSchema schema);

    public String type();
    public JsonSchema jsonSchema();
}

JSON Schema

From LangChain4j:

public class JsonSchema {
    public static Builder builder();

    public String name();
    public String schema();
    public Boolean strict();

    public static class Builder {
        public Builder name(String name);
        public Builder schema(String schema);
        public Builder strict(Boolean strict);
        public JsonSchema build();
    }
}

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-watsonx@1.7.0

docs

builtin-services.md

chat-models.md

configuration.md

embedding-scoring.md

exceptions.md

generation-models.md

index.md

request-parameters.md

text-extraction.md

tile.json