tessl/maven-dev-langchain4j--langchain4j-azure-open-ai

LangChain4j integration for Azure OpenAI providing chat, streaming, embeddings, image generation, audio transcription, and token counting capabilities

Overview

Eval results

Files

Configuration

Name: tessl/maven-dev-langchain4j--langchain4j-azure-open-ai
Author: tessl

Common configuration patterns, authentication methods, and builder options shared across all Azure OpenAI models.

Imports

import com.azure.core.credential.TokenCredential;
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.identity.ManagedIdentityCredentialBuilder;
import com.azure.identity.ClientSecretCredentialBuilder;
import com.azure.identity.AzureCliCredentialBuilder;
import com.azure.core.http.policy.RetryOptions;
import com.azure.core.http.policy.ExponentialBackoffOptions;
import com.azure.core.http.ProxyOptions;
import com.azure.core.http.HttpClientProvider;
import com.azure.core.http.HttpClient;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIAsyncClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import java.time.Duration;
import java.util.Map;
import java.util.List;
import java.net.InetSocketAddress;

Authentication

All model builders support three authentication methods. Exactly one must be specified. Choose based on your deployment scenario and security requirements.

Azure OpenAI API Key

Standard authentication using an API key from your Azure OpenAI resource:

/**
 * Configure API key authentication.
 * @param apiKey 32-character hexadecimal key from Azure Portal
 * @throws IllegalArgumentException if apiKey is null or empty
 */
model.builder()
    .endpoint("https://your-resource.openai.azure.com/")
    .apiKey("your-api-key-from-azure-portal")
    .build();

Best for:

Development and testing environments
Simple deployments with centralized secret management
When using Azure Key Vault for secrets
Quick prototyping

Security note: Never hardcode API keys in source code. Use environment variables or secure secret storage:

// Recommended: Load from environment
String apiKey = System.getenv("AZURE_OPENAI_API_KEY");
if (apiKey == null || apiKey.isEmpty()) {
    throw new IllegalStateException("AZURE_OPENAI_API_KEY environment variable not set");
}

model.builder()
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .apiKey(apiKey)
    .build();

Key format: Azure OpenAI API keys are 32-character hexadecimal strings. Validate before use:

// Validation pattern (optional but recommended)
if (!apiKey.matches("[0-9a-fA-F]{32}")) {
    throw new IllegalArgumentException("Invalid Azure OpenAI API key format");
}

Non-Azure OpenAI API Key

Authenticate with the non-Azure OpenAI service (api.openai.com):

/**
 * Configure non-Azure OpenAI authentication.
 * Automatically sets endpoint to https://api.openai.com/v1.
 * Do NOT call endpoint() when using this method.
 * @param apiKey OpenAI API key starting with "sk-"
 * @throws IllegalArgumentException if apiKey is null or empty
 */
model.builder()
    .nonAzureApiKey("your-openai-api-key")
    .deploymentName("gpt-4")  // Use OpenAI model name
    .serviceVersion("2024-02-15-preview")
    .build();

Note: When using nonAzureApiKey(), the endpoint is automatically set to https://api.openai.com/v1. Do not call endpoint().

Best for:

Using OpenAI directly instead of Azure
Testing across both Azure and OpenAI services
Regions where Azure OpenAI is not available
Comparing Azure vs OpenAI behavior

Key format: OpenAI API keys start with "sk-" prefix. Example validation:

if (!openAiKey.startsWith("sk-")) {
    throw new IllegalArgumentException("OpenAI API keys must start with 'sk-'");
}

Azure AD / Entra ID Token Credential

Authenticate using Azure Active Directory (Microsoft Entra ID):

import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.core.credential.TokenCredential;

/**
 * Configure Azure AD authentication.
 * Provides zero-secret authentication using managed identities or service principals.
 * @param credential TokenCredential implementation
 * @throws IllegalArgumentException if credential is null
 */
TokenCredential credential = new DefaultAzureCredentialBuilder().build();

model.builder()
    .endpoint("https://your-resource.openai.azure.com/")
    .tokenCredential(credential)
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")
    .build();

Best for:

Production deployments
Managed identity scenarios (Azure VMs, App Service, AKS)
Zero-secret authentication
Enterprise security requirements
Centralized identity management
Audit and compliance requirements

Common credential types:

// Default credential chain (tries multiple auth methods in order)
// Order: Environment -> Managed Identity -> Azure CLI -> IntelliJ -> Visual Studio Code
TokenCredential credential = new DefaultAzureCredentialBuilder()
    .build();

// User-assigned managed identity (specify client ID)
TokenCredential credential = new ManagedIdentityCredentialBuilder()
    .clientId("your-managed-identity-client-id")
    .build();

// System-assigned managed identity (no client ID needed)
TokenCredential credential = new ManagedIdentityCredentialBuilder()
    .build();

// Service principal with client secret
TokenCredential credential = new ClientSecretCredentialBuilder()
    .tenantId("your-tenant-id")
    .clientId("your-client-id")
    .clientSecret("your-client-secret")
    .build();

// Azure CLI credential (for local development)
// Uses credentials from `az login`
TokenCredential credential = new AzureCliCredentialBuilder()
    .build();

Required Azure RBAC roles:

Cognitive Services OpenAI User: Read access, can call API
Cognitive Services OpenAI Contributor: Read/write access
Cognitive Services Contributor: Full access to resource

Assign roles using Azure Portal, CLI, or ARM templates:

# Assign role to managed identity
az role assignment create \
  --role "Cognitive Services OpenAI User" \
  --assignee <managed-identity-client-id> \
  --scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<resource-name>

Mandatory Configuration

All models require these three configuration parameters. All are mandatory unless using nonAzureApiKey().

/**
 * Mandatory configuration interface.
 * @param <T> Builder type for fluent chaining
 */
interface MandatoryConfiguration<T> {
    /**
     * Sets the Azure OpenAI resource endpoint.
     * Required: Yes (except when using nonAzureApiKey())
     * Format: https://{resource-name}.openai.azure.com/
     * @param endpoint Full endpoint URL with trailing slash optional
     * @return Builder instance for chaining
     * @throws IllegalArgumentException if endpoint is null, empty, or malformed
     */
    T endpoint(String endpoint);

    /**
     * Sets the Azure OpenAI API version.
     * Required: Yes
     * Examples: "2024-02-15-preview", "2023-12-01-preview"
     * Recommendation: Use latest preview for development, latest stable for production
     * @param serviceVersion API version string
     * @return Builder instance for chaining
     * @throws IllegalArgumentException if serviceVersion is null or empty
     */
    T serviceVersion(String serviceVersion);

    /**
     * Sets the name of the deployed model in Azure OpenAI.
     * Required: Yes
     * This is YOUR deployment name in Azure, not the base model name.
     * Examples: "gpt-4-deployment", "my-gpt35-turbo", "dall-e-3-prod"
     * @param deploymentName Your Azure deployment name
     * @return Builder instance for chaining
     * @throws IllegalArgumentException if deploymentName is null or empty
     */
    T deploymentName(String deploymentName);
}

Example with validation:

String endpoint = System.getenv("AZURE_OPENAI_ENDPOINT");
String deployment = System.getenv("AZURE_OPENAI_DEPLOYMENT");
String version = "2024-02-15-preview";

// Validate mandatory parameters
if (endpoint == null || endpoint.isEmpty()) {
    throw new IllegalStateException("AZURE_OPENAI_ENDPOINT not configured");
}
if (deployment == null || deployment.isEmpty()) {
    throw new IllegalStateException("AZURE_OPENAI_DEPLOYMENT not configured");
}

model.builder()
    .endpoint(endpoint)
    .serviceVersion(version)
    .deploymentName(deployment)
    .apiKey(System.getenv("AZURE_OPENAI_API_KEY"))
    .build();

HTTP Client Configuration

Timeout

Set request timeout duration to prevent indefinite hangs.

/**
 * Sets the request timeout.
 * @param timeout Duration, must be positive
 * @return Builder instance
 * @default 60 seconds (chat, embedding, language models)
 * @default 120 seconds (streaming models, image models)
 * @throws IllegalArgumentException if timeout is null or non-positive
 */
model.builder()
    .timeout(Duration.ofSeconds(60))  // 60 second timeout
    .build();

Default timeouts by model type:

Chat models: 60 seconds
Streaming chat models: 120 seconds
Embedding models: 60 seconds
Image models: 120 seconds (image generation is slower)
Audio transcription: 120 seconds (depends on audio length)
Language models: 60 seconds

Recommended timeouts:

Chat models: 30-60 seconds (short responses), 60-120 seconds (long responses with max_tokens)
Streaming models: 120-300 seconds (allow for long streaming responses)
Embedding models: 30-60 seconds (embeddings are fast)
Image models: 90-120 seconds (image generation takes longer)
Audio transcription: 120-300 seconds (proportional to audio file duration)

Timeout behavior:

Throws TimeoutException when exceeded
Automatically retried if retry policy allows
Includes network time + server processing time
Does not include time spent in retry backoff delays

// Example: Different timeouts for different use cases
AzureOpenAiChatModel fastModel = AzureOpenAiChatModel.builder()
    .timeout(Duration.ofSeconds(30))  // Short timeout for quick responses
    .maxTokens(100)  // Limit response length
    .build();

AzureOpenAiChatModel slowModel = AzureOpenAiChatModel.builder()
    .timeout(Duration.ofSeconds(180))  // Longer timeout for detailed responses
    .maxTokens(4000)  // Allow long responses
    .build();

Retry Configuration

Configure retry behavior for failed requests with automatic exponential backoff.

Simple retry count:

/**
 * Sets simple retry count with default exponential backoff.
 * Mutually exclusive with retryOptions().
 * @param maxRetries Number of retries, 0-10
 * @return Builder instance
 * @default 3 retries
 * @throws IllegalArgumentException if maxRetries < 0 or > 10
 */
model.builder()
    .maxRetries(3)  // Retry up to 3 times
    .build();

Advanced retry options:

import com.azure.core.http.policy.RetryOptions;
import com.azure.core.http.policy.ExponentialBackoffOptions;

/**
 * Sets advanced retry options with custom backoff strategy.
 * Mutually exclusive with maxRetries().
 * @param retryOptions Azure SDK retry configuration
 * @return Builder instance
 * @default 3 retries with exponential backoff (1s base, 10s max)
 */
RetryOptions retryOptions = new RetryOptions(
    new ExponentialBackoffOptions()
        .setMaxRetries(3)  // Maximum 3 retry attempts
        .setBaseDelay(Duration.ofSeconds(1))  // Start with 1s delay
        .setMaxDelay(Duration.ofSeconds(10))  // Maximum 10s delay
);

model.builder()
    .retryOptions(retryOptions)
    .build();

Default retry behavior:

Max retries: 3
Base delay: 1 second
Max delay: 10 seconds
Backoff: Exponential (1s, 2s, 4s, 8s capped at 10s)

Retry triggers (automatically retried):

Network errors and connection failures
Timeout errors (TimeoutException)
429 (Too Many Requests) - Rate limit hit, uses Retry-After header
500-level server errors (500, 502, 503, 504)
Connection reset, socket timeout

Non-retried errors (fail immediately):

400-level client errors (except 429): 400, 401, 403, 404
Content filter exceptions (ContentFilteredException)
Invalid authentication (401, 403)
Invalid request format (400)

Example custom retry strategy:

// Aggressive retry for production
RetryOptions aggressive = new RetryOptions(
    new ExponentialBackoffOptions()
        .setMaxRetries(5)  // More retries
        .setBaseDelay(Duration.ofMillis(500))  // Faster initial retry
        .setMaxDelay(Duration.ofSeconds(30))  // Longer max delay
);

// Conservative retry for development
RetryOptions conservative = new RetryOptions(
    new ExponentialBackoffOptions()
        .setMaxRetries(1)  // Fail fast
        .setBaseDelay(Duration.ofSeconds(2))
        .setMaxDelay(Duration.ofSeconds(5))
);

Proxy Configuration

Route requests through an HTTP proxy server.

Basic proxy:

import com.azure.core.http.ProxyOptions;
import java.net.InetSocketAddress;

/**
 * Sets HTTP proxy configuration.
 * @param proxyOptions Proxy settings including type and address
 * @return Builder instance
 * @default No proxy
 */
ProxyOptions proxyOptions = new ProxyOptions(
    ProxyOptions.Type.HTTP,  // or Type.SOCKS
    new InetSocketAddress("proxy.example.com", 8080)
);

model.builder()
    .proxyOptions(proxyOptions)
    .build();

Proxy with authentication:

ProxyOptions proxyOptions = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("proxy.example.com", 8080)
)
.setCredentials("proxy-username", "proxy-password")  // Optional authentication
.setNonProxyHosts("localhost|127.0.0.1");  // Bypass proxy for these hosts

model.builder()
    .proxyOptions(proxyOptions)
    .build();

Proxy types:

ProxyOptions.Type.HTTP: HTTP/HTTPS proxy (most common)
ProxyOptions.Type.SOCKS: SOCKS4/SOCKS5 proxy

Common proxy scenarios:

// Corporate proxy with authentication
ProxyOptions corpProxy = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("proxy.corp.example.com", 8080)
)
.setCredentials(System.getenv("PROXY_USER"), System.getenv("PROXY_PASS"))
.setNonProxyHosts("localhost|*.internal.corp");

// Development proxy (e.g., Fiddler, Charles)
ProxyOptions debugProxy = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("localhost", 8888)  // Fiddler default
);

Custom HTTP Client Provider

Provide a custom HTTP client implementation for advanced scenarios.

import com.azure.core.http.HttpClientProvider;
import com.azure.core.http.HttpClient;

/**
 * Sets custom HTTP client provider.
 * Allows full control over HTTP client configuration.
 * @param httpClientProvider Provider for creating HTTP client
 * @return Builder instance
 * @default Azure SDK default HTTP client (Netty or OkHttp)
 */
HttpClientProvider customProvider = new HttpClientProvider() {
    @Override
    public HttpClient createInstance() {
        // Return custom-configured HTTP client
        return HttpClient.createDefault()
            .setConnectionTimeout(Duration.ofSeconds(30))
            .setReadTimeout(Duration.ofSeconds(60))
            .setWriteTimeout(Duration.ofSeconds(30));
    }
};

model.builder()
    .httpClientProvider(customProvider)
    .build();

Use cases for custom HTTP client:

Custom connection pooling configuration
Special SSL/TLS requirements
Custom DNS resolution
Network monitoring and instrumentation
Testing with mock HTTP client

Custom Headers

Add custom HTTP headers to all requests for tracking, authentication, or API versioning.

/**
 * Sets custom HTTP headers added to all requests.
 * @param customHeaders Immutable map of header name to value
 * @return Builder instance
 * @default Empty map (no custom headers)
 */
Map<String, String> customHeaders = Map.of(
    "X-Custom-Header", "custom-value",
    "X-Request-ID", "unique-request-id",
    "X-API-Version", "v1",
    "X-Correlation-ID", UUID.randomUUID().toString()
);

model.builder()
    .customHeaders(customHeaders)
    .build();

Common use cases:

Request tracking: X-Request-ID, X-Correlation-ID for distributed tracing
Custom authentication: Additional auth tokens or session IDs
API versioning: X-API-Version for backwards compatibility
Analytics: X-User-ID, X-Session-ID for usage tracking
Debugging: X-Debug-Mode, X-Environment for troubleshooting

Important notes:

Headers are added to every request made by the model
Cannot override Azure-required headers (Authorization, Content-Type, etc.)
Header values must not contain newlines or control characters
Headers are case-insensitive per HTTP specification

// Example: Production tracking headers
Map<String, String> trackingHeaders = Map.of(
    "X-Application-Name", "MyApp",
    "X-Application-Version", "1.2.3",
    "X-Environment", "production",
    "X-Request-ID", UUID.randomUUID().toString()
);

User Agent Suffix

Append a custom suffix to the User-Agent header for application identification.

/**
 * Sets custom User-Agent suffix for request identification.
 * @param userAgentSuffix Suffix string appended to SDK user agent
 * @return Builder instance
 * @default None (only SDK user agent)
 */
model.builder()
    .userAgentSuffix("MyApp/1.0.0")
    .build();

Resulting User-Agent format:

azsdk-java-azure-ai-openai/{sdk-version} ({os-info}) MyApp/1.0.0

Best for:

Application identification in Azure logs
Usage analytics and monitoring
Version tracking across deployments
Support and debugging (identify client version)
Rate limit troubleshooting

Recommended format: AppName/Version

// Example: Version-aware user agent
String appVersion = "1.2.3";
String buildNumber = "456";
model.builder()
    .userAgentSuffix(String.format("MyApp/%s (build %s)", appVersion, buildNumber))
    .build();

Custom OpenAI Client

For advanced scenarios, provide a pre-configured OpenAI client instance instead of using builder configuration.

Synchronous Client

import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;

/**
 * Creates model using a custom OpenAI client.
 * Useful for sharing a single client across multiple models or advanced configuration.
 * @param client Pre-configured OpenAIClient instance
 * @return Builder instance
 */
OpenAIClient customClient = new OpenAIClientBuilder()
    .endpoint("https://your-resource.openai.azure.com/")
    .credential(new AzureKeyCredential("your-api-key"))
    .httpLogOptions(new HttpLogOptions().setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
    .buildClient();

AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    .openAIClient(customClient)
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")
    .build();

Asynchronous Client (for Streaming Models)

import com.azure.ai.openai.OpenAIAsyncClient;
import com.azure.ai.openai.OpenAIClientBuilder;

/**
 * Creates streaming model using a custom async OpenAI client.
 * @param client Pre-configured OpenAIAsyncClient instance
 * @return Builder instance
 */
OpenAIAsyncClient asyncClient = new OpenAIClientBuilder()
    .endpoint("https://your-resource.openai.azure.com/")
    .credential(new AzureKeyCredential("your-api-key"))
    .buildAsyncClient();

AzureOpenAiStreamingChatModel model = AzureOpenAiStreamingChatModel.builder()
    .openAIAsyncClient(asyncClient)
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")
    .build();

When to use custom client:

Need fine-grained control over Azure SDK configuration
Sharing a single client instance across multiple model instances
Custom pipeline policies or request/response interceptors
Advanced logging and diagnostics
Integration with Azure SDK monitoring tools

Benefits:

Reuse connection pool across models
Centralized client configuration
Advanced Azure SDK features (custom policies, interceptors)
Better resource management

// Example: Shared client for multiple models
OpenAIClient sharedClient = new OpenAIClientBuilder()
    .endpoint(endpoint)
    .credential(new AzureKeyCredential(apiKey))
    .buildClient();

// Create multiple models sharing the same client
AzureOpenAiChatModel chatModel = AzureOpenAiChatModel.builder()
    .openAIClient(sharedClient)
    .deploymentName("gpt-4")
    .build();

AzureOpenAiEmbeddingModel embeddingModel = AzureOpenAiEmbeddingModel.builder()
    .openAIClient(sharedClient)
    .deploymentName("text-embedding-ada-002")
    .build();

Observability

Request and Response Logging

Enable detailed logging of HTTP requests and responses for debugging and monitoring.

/**
 * Enables detailed HTTP logging of requests and responses.
 * WARNING: Logs full request/response body including prompts, completions, and API keys in headers.
 * Only enable in secure environments with proper log protection.
 * @param logRequestsAndResponses true to enable logging
 * @return Builder instance
 * @default false (logging disabled)
 */
model.builder()
    .logRequestsAndResponses(true)
    .build();

Logs include:

Full HTTP request (method, URL, headers, body)
Full HTTP response (status, headers, body)
Token usage (input tokens, output tokens, total)
Request latency and timing
Retry attempts and backoff delays

Security warning:

Logs contain full prompts and completions (potentially sensitive data)
Logs may contain API keys in Authorization headers
Only enable in development or with proper log security
Consider using chat model listeners instead for production monitoring

Log output format:

--> POST https://resource.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview
api-key: ********************************
Content-Type: application/json
{"messages":[{"role":"user","content":"Hello"}],"temperature":0.7}

<-- 200 OK (1234ms)
Content-Type: application/json
{"choices":[{"message":{"role":"assistant","content":"Hi there!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":8,"completion_tokens":4,"total_tokens":12}}

Chat Model Listeners

Monitor chat model events for metrics, cost tracking, and observability (chat models only).

import dev.langchain4j.model.chat.listener.ChatModelListener;
import dev.langchain4j.model.chat.listener.ChatModelRequest;
import dev.langchain4j.model.chat.listener.ChatModelResponse;

/**
 * Registers chat model event listeners.
 * Listeners receive callbacks for request, response, and error events.
 * All listeners are called synchronously before request/after response.
 * @param listeners List of listener implementations
 * @return Builder instance
 * @default Empty list (no listeners)
 */
ChatModelListener listener = new ChatModelListener() {
    @Override
    public void onRequest(ChatModelRequest request) {
        // Called before sending request
        System.out.println("Request: " + request.messages().size() + " messages");
        System.out.println("Model: " + request.model());
        System.out.println("Temperature: " + request.parameters().temperature());
    }

    @Override
    public void onResponse(ChatModelResponse response) {
        // Called after receiving successful response
        System.out.println("Response: " + response.aiMessage().text());
        System.out.println("Input tokens: " + response.tokenUsage().inputTokens());
        System.out.println("Output tokens: " + response.tokenUsage().outputTokens());
        System.out.println("Finish reason: " + response.metadata().finishReason());
    }

    @Override
    public void onError(Throwable error) {
        // Called on request failure
        System.err.println("Error: " + error.getMessage());
    }
};

model.builder()
    .listeners(List.of(listener))
    .build();

Use cases:

Metrics collection: Track request count, latency, token usage
Cost tracking: Monitor token consumption per user/session
Performance monitoring: Identify slow requests, timeout patterns
Error tracking: Log and alert on failures
Audit logging: Record all AI interactions for compliance
Rate limit monitoring: Track usage against quotas

Multiple listeners:

// Metrics listener
ChatModelListener metricsListener = new ChatModelListener() {
    @Override
    public void onResponse(ChatModelResponse response) {
        metrics.recordTokens(response.tokenUsage().totalTokens());
        metrics.recordLatency(response.metadata().latency());
    }
};

// Cost tracking listener
ChatModelListener costListener = new ChatModelListener() {
    @Override
    public void onResponse(ChatModelResponse response) {
        double cost = calculateCost(response.tokenUsage());
        costTracker.recordCost(userId, cost);
    }
};

// Audit logging listener
ChatModelListener auditListener = new ChatModelListener() {
    @Override
    public void onRequest(ChatModelRequest request) {
        auditLog.logRequest(userId, request);
    }
    @Override
    public void onResponse(ChatModelResponse response) {
        auditLog.logResponse(userId, response);
    }
};

// Register all listeners
model.builder()
    .listeners(List.of(metricsListener, costListener, auditListener))
    .build();

Listener behavior:

Called synchronously in request/response flow
Exceptions in listeners do not affect request execution
Listeners are called in registration order
Thread-safe: listener methods may be called from multiple threads

Configuration Patterns

Production Configuration

Secure, reliable configuration for production environments.

// Use managed identity (zero-secret authentication)
TokenCredential credential = new DefaultAzureCredentialBuilder()
    .build();

// Load configuration from environment
String endpoint = System.getenv("AZURE_OPENAI_ENDPOINT");
String deployment = System.getenv("AZURE_OPENAI_DEPLOYMENT");

// Production-grade chat model
AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    // Authentication - use managed identity
    .endpoint(endpoint)
    .tokenCredential(credential)
    .deploymentName(deployment)
    .serviceVersion("2024-02-15-preview")

    // Reliability - aggressive retries and reasonable timeout
    .timeout(Duration.ofSeconds(60))
    .maxRetries(3)

    // Observability - use listeners, not full logging
    .listeners(List.of(metricsListener, auditListener))
    .userAgentSuffix("MyApp/1.0.0")

    // Quality - set appropriate parameters
    .temperature(0.7)
    .maxTokens(2000)

    .build();

Development Configuration

Development-friendly configuration with enhanced debugging.

AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    // Authentication - use API key for simplicity
    .endpoint("https://my-resource.openai.azure.com/")
    .apiKey(System.getenv("AZURE_OPENAI_API_KEY"))
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")

    // Debug - enable full logging, longer timeout
    .logRequestsAndResponses(true)
    .timeout(Duration.ofSeconds(120))  // Longer for debugging

    // Conservative retries for faster failure
    .maxRetries(1)

    .build();

Enterprise Configuration with Proxy

Enterprise deployment with corporate proxy and service principal.

// Corporate proxy with authentication
ProxyOptions proxy = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("proxy.corp.example.com", 8080)
).setCredentials(
    System.getenv("PROXY_USER"),
    System.getenv("PROXY_PASSWORD")
);

// Service principal authentication
TokenCredential credential = new ClientSecretCredentialBuilder()
    .tenantId(System.getenv("AZURE_TENANT_ID"))
    .clientId(System.getenv("AZURE_CLIENT_ID"))
    .clientSecret(System.getenv("AZURE_CLIENT_SECRET"))
    .build();

AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    // Authentication
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .tokenCredential(credential)
    .deploymentName(System.getenv("AZURE_OPENAI_DEPLOYMENT"))
    .serviceVersion("2024-02-15-preview")

    // Network - proxy and custom headers
    .proxyOptions(proxy)
    .customHeaders(Map.of(
        "X-Corp-ID", System.getenv("CORP_ID"),
        "X-Cost-Center", System.getenv("COST_CENTER")
    ))

    // Reliability
    .timeout(Duration.ofSeconds(90))
    .maxRetries(5)

    .build();

Multi-Model Configuration

Share configuration across multiple model types for consistency.

// Shared configuration
String endpoint = "https://my-resource.openai.azure.com/";
String apiKey = System.getenv("AZURE_OPENAI_API_KEY");
String serviceVersion = "2024-02-15-preview";
Duration timeout = Duration.ofSeconds(60);
int maxRetries = 3;

// Chat model
AzureOpenAiChatModel chatModel = AzureOpenAiChatModel.builder()
    .endpoint(endpoint)
    .apiKey(apiKey)
    .serviceVersion(serviceVersion)
    .deploymentName("gpt-4")
    .timeout(timeout)
    .maxRetries(maxRetries)
    .build();

// Embedding model
AzureOpenAiEmbeddingModel embeddingModel = AzureOpenAiEmbeddingModel.builder()
    .endpoint(endpoint)
    .apiKey(apiKey)
    .serviceVersion(serviceVersion)
    .deploymentName("text-embedding-ada-002")
    .timeout(timeout)
    .maxRetries(maxRetries)
    .build();

// Image model (longer timeout)
AzureOpenAiImageModel imageModel = AzureOpenAiImageModel.builder()
    .endpoint(endpoint)
    .apiKey(apiKey)
    .serviceVersion(serviceVersion)
    .deploymentName("dall-e-3")
    .timeout(Duration.ofSeconds(120))  // Longer for images
    .maxRetries(maxRetries)
    .build();

Environment Variables

Standard environment variable names for configuration.

# Azure OpenAI configuration
export AZURE_OPENAI_ENDPOINT="https://my-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4"

# Non-Azure OpenAI configuration
export OPENAI_API_KEY="your-openai-api-key"

# Azure AD / Service Principal configuration
export AZURE_TENANT_ID="your-tenant-id"
export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"

# Proxy configuration
export HTTPS_PROXY="http://proxy.example.com:8080"
export HTTP_PROXY="http://proxy.example.com:8080"
export NO_PROXY="localhost,127.0.0.1,*.internal"
export PROXY_USER="proxy-username"
export PROXY_PASSWORD="proxy-password"

Usage in code:

model.builder()
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .apiKey(System.getenv("AZURE_OPENAI_API_KEY"))
    .deploymentName(System.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"))
    .serviceVersion("2024-02-15-preview")
    .build();

API Versions

Azure OpenAI service versions ordered by release date (use latest for newest features).

Available versions:

2024-02-15-preview - Latest preview - Newest features, GPT-4 Turbo, function calling v2
2023-12-01-preview - Stable release - GPT-4 Turbo, vision, DALL-E 3
2023-10-01-preview - Previous stable - GPT-4, function calling v1
2023-08-01-preview - Older version - GPT-3.5 Turbo 16K
2023-06-01-preview - Legacy - GPT-3.5 Turbo, function calling preview
2023-05-15 - GA version - GPT-3.5 Turbo, ChatGPT

Version selection guidelines:

Development/Testing: Use latest preview (2024-02-15-preview) for newest features
Production: Use latest stable release (2023-12-01-preview or newer GA)
Legacy systems: Use specific version required by your deployment

Feature availability by version:

2024-02-15-preview: GPT-4 Turbo, structured outputs, parallel function calling, reasoning models (o1)
2023-12-01-preview: GPT-4 Vision, DALL-E 3, JSON mode, improved function calling
2023-10-01-preview: GPT-4, function calling, system fingerprint
2023-08-01-preview: GPT-3.5 Turbo 16K, function calling

Check Azure OpenAI API versions documentation for the current list and feature availability.

Error Handling

All models may throw these common exceptions during configuration and operation.

Configuration Errors

/**
 * Thrown during model building if configuration is invalid.
 * Common causes:
 * - Missing required parameters (endpoint, deployment, auth)
 * - Invalid parameter values (negative timeout, invalid URL)
 * - Mutually exclusive options (maxRetries + retryOptions)
 */
class IllegalArgumentException extends RuntimeException {
    // Examples:
    // - "endpoint must not be null or empty"
    // - "timeout must be positive"
    // - "cannot specify both maxRetries and retryOptions"
}

Runtime Errors

import dev.langchain4j.exception.ContentFilteredException;
import java.util.concurrent.TimeoutException;

// Content filtered by Azure safety policies
// Not retried automatically
class ContentFilteredException extends RuntimeException {}

// Request timeout exceeded
// Automatically retried if retry policy allows
class TimeoutException extends Exception {}

// Invalid request parameters or state
// Not retried
class IllegalArgumentException extends RuntimeException {}

// Network, API, authentication errors
// Retry behavior depends on HTTP status code
class RuntimeException extends Exception {}

Error handling example:

try {
    Response<?> response = model.generate(input);
} catch (ContentFilteredException e) {
    // Content violated safety policy - do not retry
    logger.warn("Content filtered: {}", e.getMessage());
    // Prompt user to modify input or handle gracefully
} catch (TimeoutException e) {
    // Request timed out - safe to retry with exponential backoff
    logger.error("Request timed out after {}ms", timeout.toMillis());
    // Implement retry with backoff or increase timeout
} catch (IllegalArgumentException e) {
    // Invalid configuration or parameters - fix code
    logger.error("Invalid configuration: {}", e.getMessage());
    // Do not retry, fix the issue in code
} catch (RuntimeException e) {
    // Network, API, or authentication error
    logger.error("Unexpected error", e);
    // Check if retryable based on cause and implement retry logic
}

Best Practices

Security

Never hardcode secrets:

// BAD - Hardcoded API key
.apiKey("1234567890abcdef1234567890abcdef")

// GOOD - Load from environment
.apiKey(System.getenv("AZURE_OPENAI_API_KEY"))

// BETTER - Use managed identity (no secrets)
.tokenCredential(new DefaultAzureCredentialBuilder().build())

Recommendations:

Use managed identity in Azure (VM, App Service, AKS, Functions)
Store API keys in Azure Key Vault, AWS Secrets Manager, or HashiCorp Vault
Use environment variables for local development
Rotate API keys regularly (every 30-90 days)
Enable logging only in secure environments with proper log protection
Use Azure RBAC for fine-grained access control

Reliability

Set appropriate timeouts:

// Match timeout to expected response time
.timeout(Duration.ofSeconds(60))  // Standard requests
.timeout(Duration.ofSeconds(180))  // Long-form generation
.timeout(Duration.ofSeconds(30))   // Quick responses

Configure retry policies:

// Production: Aggressive retries for high availability
.maxRetries(5)

// Development: Fast failure for rapid iteration
.maxRetries(1)

Recommendations:

Set timeouts based on expected response time + buffer
Configure retries based on availability requirements
Implement circuit breakers for cascading failures
Monitor rate limits and implement backoff strategies
Handle specific exceptions (ContentFilteredException, TimeoutException)

Performance

Reuse model instances:

// GOOD - Create once, reuse across requests
private static final AzureOpenAiChatModel MODEL =
    AzureOpenAiChatModel.builder()
        .endpoint(endpoint)
        .apiKey(apiKey)
        .build();

// Use MODEL for all requests

Don't create per-request:

// BAD - Creates new instance for each request
for (String prompt : prompts) {
    AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
        .endpoint(endpoint)
        .apiKey(apiKey)
        .build();
    model.generate(prompt);  // Wasteful!
}

Recommendations:

Create model instances once and reuse them (thread-safe)
Share model instances across threads
Use connection pooling (default in Azure SDK)
Set reasonable timeouts to avoid hanging requests
Consider caching responses for repeated identical requests
Use batch operations where available (embedAll for embeddings)

Observability

Use listeners over full logging:

// PREFERRED - Structured observability
.listeners(List.of(metricsListener, costListener))

// AVOID IN PRODUCTION - Full logging with sensitive data
.logRequestsAndResponses(true)  // Only for development

Recommendations:

Implement chat model listeners for metrics and cost tracking
Track token usage per user/session for billing
Monitor latency and error rates for SLA compliance
Use correlation IDs via custom headers for distributed tracing
Add User-Agent suffix for application identification
Log errors but avoid logging full prompts/completions in production

Cost Optimization

Estimate before requesting:

// Estimate tokens before making expensive request
AzureOpenAiTokenCountEstimator estimator =
    new AzureOpenAiTokenCountEstimator(AzureOpenAiChatModelName.GPT_4);

int estimatedTokens = estimator.estimateTokenCountInMessages(messages);
if (estimatedTokens > budget) {
    // Trim messages or reject request
}

Recommendations:

Use token estimation before requests to avoid surprises
Implement token budgets per user/session
Choose appropriate model sizes (GPT-3.5 vs GPT-4)
Set maxTokens to limit response length and cost
Cache responses where possible to avoid duplicate requests
Monitor usage patterns and optimize accordingly
Consider using embeddings for search instead of repeated completions

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-azure-open-ai

docs

audio-transcription.md

tessl/maven-dev-langchain4j--langchain4j-azure-open-ai

configuration.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Configuration

Imports

Authentication

Azure OpenAI API Key

Non-Azure OpenAI API Key

Azure AD / Entra ID Token Credential

Mandatory Configuration

HTTP Client Configuration

Timeout

Retry Configuration

Proxy Configuration

Custom HTTP Client Provider

Custom Headers

User Agent Suffix

Custom OpenAI Client

Synchronous Client

Asynchronous Client (for Streaming Models)

Observability

Request and Response Logging

Chat Model Listeners

Configuration Patterns

Production Configuration

Development Configuration

Enterprise Configuration with Proxy

Multi-Model Configuration

Environment Variables

API Versions

Error Handling

Configuration Errors

Runtime Errors

Best Practices

Security

Reliability

Performance

Observability

Cost Optimization

configuration.mddocs/