CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-dev-langchain4j--langchain4j-azure-open-ai

LangChain4j integration for Azure OpenAI providing chat, streaming, embeddings, image generation, audio transcription, and token counting capabilities

Overview
Eval results
Files

configuration.mddocs/

Configuration

Common configuration patterns, authentication methods, and builder options shared across all Azure OpenAI models.

Imports

import com.azure.core.credential.TokenCredential;
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.identity.ManagedIdentityCredentialBuilder;
import com.azure.identity.ClientSecretCredentialBuilder;
import com.azure.identity.AzureCliCredentialBuilder;
import com.azure.core.http.policy.RetryOptions;
import com.azure.core.http.policy.ExponentialBackoffOptions;
import com.azure.core.http.ProxyOptions;
import com.azure.core.http.HttpClientProvider;
import com.azure.core.http.HttpClient;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIAsyncClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import java.time.Duration;
import java.util.Map;
import java.util.List;
import java.net.InetSocketAddress;

Authentication

All model builders support three authentication methods. Exactly one must be specified. Choose based on your deployment scenario and security requirements.

Azure OpenAI API Key

Standard authentication using an API key from your Azure OpenAI resource:

/**
 * Configure API key authentication.
 * @param apiKey 32-character hexadecimal key from Azure Portal
 * @throws IllegalArgumentException if apiKey is null or empty
 */
model.builder()
    .endpoint("https://your-resource.openai.azure.com/")
    .apiKey("your-api-key-from-azure-portal")
    .build();

Best for:

  • Development and testing environments
  • Simple deployments with centralized secret management
  • When using Azure Key Vault for secrets
  • Quick prototyping

Security note: Never hardcode API keys in source code. Use environment variables or secure secret storage:

// Recommended: Load from environment
String apiKey = System.getenv("AZURE_OPENAI_API_KEY");
if (apiKey == null || apiKey.isEmpty()) {
    throw new IllegalStateException("AZURE_OPENAI_API_KEY environment variable not set");
}

model.builder()
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .apiKey(apiKey)
    .build();

Key format: Azure OpenAI API keys are 32-character hexadecimal strings. Validate before use:

// Validation pattern (optional but recommended)
if (!apiKey.matches("[0-9a-fA-F]{32}")) {
    throw new IllegalArgumentException("Invalid Azure OpenAI API key format");
}

Non-Azure OpenAI API Key

Authenticate with the non-Azure OpenAI service (api.openai.com):

/**
 * Configure non-Azure OpenAI authentication.
 * Automatically sets endpoint to https://api.openai.com/v1.
 * Do NOT call endpoint() when using this method.
 * @param apiKey OpenAI API key starting with "sk-"
 * @throws IllegalArgumentException if apiKey is null or empty
 */
model.builder()
    .nonAzureApiKey("your-openai-api-key")
    .deploymentName("gpt-4")  // Use OpenAI model name
    .serviceVersion("2024-02-15-preview")
    .build();

Note: When using nonAzureApiKey(), the endpoint is automatically set to https://api.openai.com/v1. Do not call endpoint().

Best for:

  • Using OpenAI directly instead of Azure
  • Testing across both Azure and OpenAI services
  • Regions where Azure OpenAI is not available
  • Comparing Azure vs OpenAI behavior

Key format: OpenAI API keys start with "sk-" prefix. Example validation:

if (!openAiKey.startsWith("sk-")) {
    throw new IllegalArgumentException("OpenAI API keys must start with 'sk-'");
}

Azure AD / Entra ID Token Credential

Authenticate using Azure Active Directory (Microsoft Entra ID):

import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.core.credential.TokenCredential;

/**
 * Configure Azure AD authentication.
 * Provides zero-secret authentication using managed identities or service principals.
 * @param credential TokenCredential implementation
 * @throws IllegalArgumentException if credential is null
 */
TokenCredential credential = new DefaultAzureCredentialBuilder().build();

model.builder()
    .endpoint("https://your-resource.openai.azure.com/")
    .tokenCredential(credential)
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")
    .build();

Best for:

  • Production deployments
  • Managed identity scenarios (Azure VMs, App Service, AKS)
  • Zero-secret authentication
  • Enterprise security requirements
  • Centralized identity management
  • Audit and compliance requirements

Common credential types:

// Default credential chain (tries multiple auth methods in order)
// Order: Environment -> Managed Identity -> Azure CLI -> IntelliJ -> Visual Studio Code
TokenCredential credential = new DefaultAzureCredentialBuilder()
    .build();

// User-assigned managed identity (specify client ID)
TokenCredential credential = new ManagedIdentityCredentialBuilder()
    .clientId("your-managed-identity-client-id")
    .build();

// System-assigned managed identity (no client ID needed)
TokenCredential credential = new ManagedIdentityCredentialBuilder()
    .build();

// Service principal with client secret
TokenCredential credential = new ClientSecretCredentialBuilder()
    .tenantId("your-tenant-id")
    .clientId("your-client-id")
    .clientSecret("your-client-secret")
    .build();

// Azure CLI credential (for local development)
// Uses credentials from `az login`
TokenCredential credential = new AzureCliCredentialBuilder()
    .build();

Required Azure RBAC roles:

  • Cognitive Services OpenAI User: Read access, can call API
  • Cognitive Services OpenAI Contributor: Read/write access
  • Cognitive Services Contributor: Full access to resource

Assign roles using Azure Portal, CLI, or ARM templates:

# Assign role to managed identity
az role assignment create \
  --role "Cognitive Services OpenAI User" \
  --assignee <managed-identity-client-id> \
  --scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<resource-name>

Mandatory Configuration

All models require these three configuration parameters. All are mandatory unless using nonAzureApiKey().

/**
 * Mandatory configuration interface.
 * @param <T> Builder type for fluent chaining
 */
interface MandatoryConfiguration<T> {
    /**
     * Sets the Azure OpenAI resource endpoint.
     * Required: Yes (except when using nonAzureApiKey())
     * Format: https://{resource-name}.openai.azure.com/
     * @param endpoint Full endpoint URL with trailing slash optional
     * @return Builder instance for chaining
     * @throws IllegalArgumentException if endpoint is null, empty, or malformed
     */
    T endpoint(String endpoint);

    /**
     * Sets the Azure OpenAI API version.
     * Required: Yes
     * Examples: "2024-02-15-preview", "2023-12-01-preview"
     * Recommendation: Use latest preview for development, latest stable for production
     * @param serviceVersion API version string
     * @return Builder instance for chaining
     * @throws IllegalArgumentException if serviceVersion is null or empty
     */
    T serviceVersion(String serviceVersion);

    /**
     * Sets the name of the deployed model in Azure OpenAI.
     * Required: Yes
     * This is YOUR deployment name in Azure, not the base model name.
     * Examples: "gpt-4-deployment", "my-gpt35-turbo", "dall-e-3-prod"
     * @param deploymentName Your Azure deployment name
     * @return Builder instance for chaining
     * @throws IllegalArgumentException if deploymentName is null or empty
     */
    T deploymentName(String deploymentName);
}

Example with validation:

String endpoint = System.getenv("AZURE_OPENAI_ENDPOINT");
String deployment = System.getenv("AZURE_OPENAI_DEPLOYMENT");
String version = "2024-02-15-preview";

// Validate mandatory parameters
if (endpoint == null || endpoint.isEmpty()) {
    throw new IllegalStateException("AZURE_OPENAI_ENDPOINT not configured");
}
if (deployment == null || deployment.isEmpty()) {
    throw new IllegalStateException("AZURE_OPENAI_DEPLOYMENT not configured");
}

model.builder()
    .endpoint(endpoint)
    .serviceVersion(version)
    .deploymentName(deployment)
    .apiKey(System.getenv("AZURE_OPENAI_API_KEY"))
    .build();

HTTP Client Configuration

Timeout

Set request timeout duration to prevent indefinite hangs.

/**
 * Sets the request timeout.
 * @param timeout Duration, must be positive
 * @return Builder instance
 * @default 60 seconds (chat, embedding, language models)
 * @default 120 seconds (streaming models, image models)
 * @throws IllegalArgumentException if timeout is null or non-positive
 */
model.builder()
    .timeout(Duration.ofSeconds(60))  // 60 second timeout
    .build();

Default timeouts by model type:

  • Chat models: 60 seconds
  • Streaming chat models: 120 seconds
  • Embedding models: 60 seconds
  • Image models: 120 seconds (image generation is slower)
  • Audio transcription: 120 seconds (depends on audio length)
  • Language models: 60 seconds

Recommended timeouts:

  • Chat models: 30-60 seconds (short responses), 60-120 seconds (long responses with max_tokens)
  • Streaming models: 120-300 seconds (allow for long streaming responses)
  • Embedding models: 30-60 seconds (embeddings are fast)
  • Image models: 90-120 seconds (image generation takes longer)
  • Audio transcription: 120-300 seconds (proportional to audio file duration)

Timeout behavior:

  • Throws TimeoutException when exceeded
  • Automatically retried if retry policy allows
  • Includes network time + server processing time
  • Does not include time spent in retry backoff delays
// Example: Different timeouts for different use cases
AzureOpenAiChatModel fastModel = AzureOpenAiChatModel.builder()
    .timeout(Duration.ofSeconds(30))  // Short timeout for quick responses
    .maxTokens(100)  // Limit response length
    .build();

AzureOpenAiChatModel slowModel = AzureOpenAiChatModel.builder()
    .timeout(Duration.ofSeconds(180))  // Longer timeout for detailed responses
    .maxTokens(4000)  // Allow long responses
    .build();

Retry Configuration

Configure retry behavior for failed requests with automatic exponential backoff.

Simple retry count:

/**
 * Sets simple retry count with default exponential backoff.
 * Mutually exclusive with retryOptions().
 * @param maxRetries Number of retries, 0-10
 * @return Builder instance
 * @default 3 retries
 * @throws IllegalArgumentException if maxRetries < 0 or > 10
 */
model.builder()
    .maxRetries(3)  // Retry up to 3 times
    .build();

Advanced retry options:

import com.azure.core.http.policy.RetryOptions;
import com.azure.core.http.policy.ExponentialBackoffOptions;

/**
 * Sets advanced retry options with custom backoff strategy.
 * Mutually exclusive with maxRetries().
 * @param retryOptions Azure SDK retry configuration
 * @return Builder instance
 * @default 3 retries with exponential backoff (1s base, 10s max)
 */
RetryOptions retryOptions = new RetryOptions(
    new ExponentialBackoffOptions()
        .setMaxRetries(3)  // Maximum 3 retry attempts
        .setBaseDelay(Duration.ofSeconds(1))  // Start with 1s delay
        .setMaxDelay(Duration.ofSeconds(10))  // Maximum 10s delay
);

model.builder()
    .retryOptions(retryOptions)
    .build();

Default retry behavior:

  • Max retries: 3
  • Base delay: 1 second
  • Max delay: 10 seconds
  • Backoff: Exponential (1s, 2s, 4s, 8s capped at 10s)

Retry triggers (automatically retried):

  • Network errors and connection failures
  • Timeout errors (TimeoutException)
  • 429 (Too Many Requests) - Rate limit hit, uses Retry-After header
  • 500-level server errors (500, 502, 503, 504)
  • Connection reset, socket timeout

Non-retried errors (fail immediately):

  • 400-level client errors (except 429): 400, 401, 403, 404
  • Content filter exceptions (ContentFilteredException)
  • Invalid authentication (401, 403)
  • Invalid request format (400)

Example custom retry strategy:

// Aggressive retry for production
RetryOptions aggressive = new RetryOptions(
    new ExponentialBackoffOptions()
        .setMaxRetries(5)  // More retries
        .setBaseDelay(Duration.ofMillis(500))  // Faster initial retry
        .setMaxDelay(Duration.ofSeconds(30))  // Longer max delay
);

// Conservative retry for development
RetryOptions conservative = new RetryOptions(
    new ExponentialBackoffOptions()
        .setMaxRetries(1)  // Fail fast
        .setBaseDelay(Duration.ofSeconds(2))
        .setMaxDelay(Duration.ofSeconds(5))
);

Proxy Configuration

Route requests through an HTTP proxy server.

Basic proxy:

import com.azure.core.http.ProxyOptions;
import java.net.InetSocketAddress;

/**
 * Sets HTTP proxy configuration.
 * @param proxyOptions Proxy settings including type and address
 * @return Builder instance
 * @default No proxy
 */
ProxyOptions proxyOptions = new ProxyOptions(
    ProxyOptions.Type.HTTP,  // or Type.SOCKS
    new InetSocketAddress("proxy.example.com", 8080)
);

model.builder()
    .proxyOptions(proxyOptions)
    .build();

Proxy with authentication:

ProxyOptions proxyOptions = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("proxy.example.com", 8080)
)
.setCredentials("proxy-username", "proxy-password")  // Optional authentication
.setNonProxyHosts("localhost|127.0.0.1");  // Bypass proxy for these hosts

model.builder()
    .proxyOptions(proxyOptions)
    .build();

Proxy types:

  • ProxyOptions.Type.HTTP: HTTP/HTTPS proxy (most common)
  • ProxyOptions.Type.SOCKS: SOCKS4/SOCKS5 proxy

Common proxy scenarios:

// Corporate proxy with authentication
ProxyOptions corpProxy = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("proxy.corp.example.com", 8080)
)
.setCredentials(System.getenv("PROXY_USER"), System.getenv("PROXY_PASS"))
.setNonProxyHosts("localhost|*.internal.corp");

// Development proxy (e.g., Fiddler, Charles)
ProxyOptions debugProxy = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("localhost", 8888)  // Fiddler default
);

Custom HTTP Client Provider

Provide a custom HTTP client implementation for advanced scenarios.

import com.azure.core.http.HttpClientProvider;
import com.azure.core.http.HttpClient;

/**
 * Sets custom HTTP client provider.
 * Allows full control over HTTP client configuration.
 * @param httpClientProvider Provider for creating HTTP client
 * @return Builder instance
 * @default Azure SDK default HTTP client (Netty or OkHttp)
 */
HttpClientProvider customProvider = new HttpClientProvider() {
    @Override
    public HttpClient createInstance() {
        // Return custom-configured HTTP client
        return HttpClient.createDefault()
            .setConnectionTimeout(Duration.ofSeconds(30))
            .setReadTimeout(Duration.ofSeconds(60))
            .setWriteTimeout(Duration.ofSeconds(30));
    }
};

model.builder()
    .httpClientProvider(customProvider)
    .build();

Use cases for custom HTTP client:

  • Custom connection pooling configuration
  • Special SSL/TLS requirements
  • Custom DNS resolution
  • Network monitoring and instrumentation
  • Testing with mock HTTP client

Custom Headers

Add custom HTTP headers to all requests for tracking, authentication, or API versioning.

/**
 * Sets custom HTTP headers added to all requests.
 * @param customHeaders Immutable map of header name to value
 * @return Builder instance
 * @default Empty map (no custom headers)
 */
Map<String, String> customHeaders = Map.of(
    "X-Custom-Header", "custom-value",
    "X-Request-ID", "unique-request-id",
    "X-API-Version", "v1",
    "X-Correlation-ID", UUID.randomUUID().toString()
);

model.builder()
    .customHeaders(customHeaders)
    .build();

Common use cases:

  • Request tracking: X-Request-ID, X-Correlation-ID for distributed tracing
  • Custom authentication: Additional auth tokens or session IDs
  • API versioning: X-API-Version for backwards compatibility
  • Analytics: X-User-ID, X-Session-ID for usage tracking
  • Debugging: X-Debug-Mode, X-Environment for troubleshooting

Important notes:

  • Headers are added to every request made by the model
  • Cannot override Azure-required headers (Authorization, Content-Type, etc.)
  • Header values must not contain newlines or control characters
  • Headers are case-insensitive per HTTP specification
// Example: Production tracking headers
Map<String, String> trackingHeaders = Map.of(
    "X-Application-Name", "MyApp",
    "X-Application-Version", "1.2.3",
    "X-Environment", "production",
    "X-Request-ID", UUID.randomUUID().toString()
);

User Agent Suffix

Append a custom suffix to the User-Agent header for application identification.

/**
 * Sets custom User-Agent suffix for request identification.
 * @param userAgentSuffix Suffix string appended to SDK user agent
 * @return Builder instance
 * @default None (only SDK user agent)
 */
model.builder()
    .userAgentSuffix("MyApp/1.0.0")
    .build();

Resulting User-Agent format:

azsdk-java-azure-ai-openai/{sdk-version} ({os-info}) MyApp/1.0.0

Best for:

  • Application identification in Azure logs
  • Usage analytics and monitoring
  • Version tracking across deployments
  • Support and debugging (identify client version)
  • Rate limit troubleshooting

Recommended format: AppName/Version

// Example: Version-aware user agent
String appVersion = "1.2.3";
String buildNumber = "456";
model.builder()
    .userAgentSuffix(String.format("MyApp/%s (build %s)", appVersion, buildNumber))
    .build();

Custom OpenAI Client

For advanced scenarios, provide a pre-configured OpenAI client instance instead of using builder configuration.

Synchronous Client

import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;

/**
 * Creates model using a custom OpenAI client.
 * Useful for sharing a single client across multiple models or advanced configuration.
 * @param client Pre-configured OpenAIClient instance
 * @return Builder instance
 */
OpenAIClient customClient = new OpenAIClientBuilder()
    .endpoint("https://your-resource.openai.azure.com/")
    .credential(new AzureKeyCredential("your-api-key"))
    .httpLogOptions(new HttpLogOptions().setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
    .buildClient();

AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    .openAIClient(customClient)
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")
    .build();

Asynchronous Client (for Streaming Models)

import com.azure.ai.openai.OpenAIAsyncClient;
import com.azure.ai.openai.OpenAIClientBuilder;

/**
 * Creates streaming model using a custom async OpenAI client.
 * @param client Pre-configured OpenAIAsyncClient instance
 * @return Builder instance
 */
OpenAIAsyncClient asyncClient = new OpenAIClientBuilder()
    .endpoint("https://your-resource.openai.azure.com/")
    .credential(new AzureKeyCredential("your-api-key"))
    .buildAsyncClient();

AzureOpenAiStreamingChatModel model = AzureOpenAiStreamingChatModel.builder()
    .openAIAsyncClient(asyncClient)
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")
    .build();

When to use custom client:

  • Need fine-grained control over Azure SDK configuration
  • Sharing a single client instance across multiple model instances
  • Custom pipeline policies or request/response interceptors
  • Advanced logging and diagnostics
  • Integration with Azure SDK monitoring tools

Benefits:

  • Reuse connection pool across models
  • Centralized client configuration
  • Advanced Azure SDK features (custom policies, interceptors)
  • Better resource management
// Example: Shared client for multiple models
OpenAIClient sharedClient = new OpenAIClientBuilder()
    .endpoint(endpoint)
    .credential(new AzureKeyCredential(apiKey))
    .buildClient();

// Create multiple models sharing the same client
AzureOpenAiChatModel chatModel = AzureOpenAiChatModel.builder()
    .openAIClient(sharedClient)
    .deploymentName("gpt-4")
    .build();

AzureOpenAiEmbeddingModel embeddingModel = AzureOpenAiEmbeddingModel.builder()
    .openAIClient(sharedClient)
    .deploymentName("text-embedding-ada-002")
    .build();

Observability

Request and Response Logging

Enable detailed logging of HTTP requests and responses for debugging and monitoring.

/**
 * Enables detailed HTTP logging of requests and responses.
 * WARNING: Logs full request/response body including prompts, completions, and API keys in headers.
 * Only enable in secure environments with proper log protection.
 * @param logRequestsAndResponses true to enable logging
 * @return Builder instance
 * @default false (logging disabled)
 */
model.builder()
    .logRequestsAndResponses(true)
    .build();

Logs include:

  • Full HTTP request (method, URL, headers, body)
  • Full HTTP response (status, headers, body)
  • Token usage (input tokens, output tokens, total)
  • Request latency and timing
  • Retry attempts and backoff delays

Security warning:

  • Logs contain full prompts and completions (potentially sensitive data)
  • Logs may contain API keys in Authorization headers
  • Only enable in development or with proper log security
  • Consider using chat model listeners instead for production monitoring

Log output format:

--> POST https://resource.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview
api-key: ********************************
Content-Type: application/json
{"messages":[{"role":"user","content":"Hello"}],"temperature":0.7}

<-- 200 OK (1234ms)
Content-Type: application/json
{"choices":[{"message":{"role":"assistant","content":"Hi there!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":8,"completion_tokens":4,"total_tokens":12}}

Chat Model Listeners

Monitor chat model events for metrics, cost tracking, and observability (chat models only).

import dev.langchain4j.model.chat.listener.ChatModelListener;
import dev.langchain4j.model.chat.listener.ChatModelRequest;
import dev.langchain4j.model.chat.listener.ChatModelResponse;

/**
 * Registers chat model event listeners.
 * Listeners receive callbacks for request, response, and error events.
 * All listeners are called synchronously before request/after response.
 * @param listeners List of listener implementations
 * @return Builder instance
 * @default Empty list (no listeners)
 */
ChatModelListener listener = new ChatModelListener() {
    @Override
    public void onRequest(ChatModelRequest request) {
        // Called before sending request
        System.out.println("Request: " + request.messages().size() + " messages");
        System.out.println("Model: " + request.model());
        System.out.println("Temperature: " + request.parameters().temperature());
    }

    @Override
    public void onResponse(ChatModelResponse response) {
        // Called after receiving successful response
        System.out.println("Response: " + response.aiMessage().text());
        System.out.println("Input tokens: " + response.tokenUsage().inputTokens());
        System.out.println("Output tokens: " + response.tokenUsage().outputTokens());
        System.out.println("Finish reason: " + response.metadata().finishReason());
    }

    @Override
    public void onError(Throwable error) {
        // Called on request failure
        System.err.println("Error: " + error.getMessage());
    }
};

model.builder()
    .listeners(List.of(listener))
    .build();

Use cases:

  • Metrics collection: Track request count, latency, token usage
  • Cost tracking: Monitor token consumption per user/session
  • Performance monitoring: Identify slow requests, timeout patterns
  • Error tracking: Log and alert on failures
  • Audit logging: Record all AI interactions for compliance
  • Rate limit monitoring: Track usage against quotas

Multiple listeners:

// Metrics listener
ChatModelListener metricsListener = new ChatModelListener() {
    @Override
    public void onResponse(ChatModelResponse response) {
        metrics.recordTokens(response.tokenUsage().totalTokens());
        metrics.recordLatency(response.metadata().latency());
    }
};

// Cost tracking listener
ChatModelListener costListener = new ChatModelListener() {
    @Override
    public void onResponse(ChatModelResponse response) {
        double cost = calculateCost(response.tokenUsage());
        costTracker.recordCost(userId, cost);
    }
};

// Audit logging listener
ChatModelListener auditListener = new ChatModelListener() {
    @Override
    public void onRequest(ChatModelRequest request) {
        auditLog.logRequest(userId, request);
    }
    @Override
    public void onResponse(ChatModelResponse response) {
        auditLog.logResponse(userId, response);
    }
};

// Register all listeners
model.builder()
    .listeners(List.of(metricsListener, costListener, auditListener))
    .build();

Listener behavior:

  • Called synchronously in request/response flow
  • Exceptions in listeners do not affect request execution
  • Listeners are called in registration order
  • Thread-safe: listener methods may be called from multiple threads

Configuration Patterns

Production Configuration

Secure, reliable configuration for production environments.

// Use managed identity (zero-secret authentication)
TokenCredential credential = new DefaultAzureCredentialBuilder()
    .build();

// Load configuration from environment
String endpoint = System.getenv("AZURE_OPENAI_ENDPOINT");
String deployment = System.getenv("AZURE_OPENAI_DEPLOYMENT");

// Production-grade chat model
AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    // Authentication - use managed identity
    .endpoint(endpoint)
    .tokenCredential(credential)
    .deploymentName(deployment)
    .serviceVersion("2024-02-15-preview")

    // Reliability - aggressive retries and reasonable timeout
    .timeout(Duration.ofSeconds(60))
    .maxRetries(3)

    // Observability - use listeners, not full logging
    .listeners(List.of(metricsListener, auditListener))
    .userAgentSuffix("MyApp/1.0.0")

    // Quality - set appropriate parameters
    .temperature(0.7)
    .maxTokens(2000)

    .build();

Development Configuration

Development-friendly configuration with enhanced debugging.

AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    // Authentication - use API key for simplicity
    .endpoint("https://my-resource.openai.azure.com/")
    .apiKey(System.getenv("AZURE_OPENAI_API_KEY"))
    .deploymentName("gpt-4")
    .serviceVersion("2024-02-15-preview")

    // Debug - enable full logging, longer timeout
    .logRequestsAndResponses(true)
    .timeout(Duration.ofSeconds(120))  // Longer for debugging

    // Conservative retries for faster failure
    .maxRetries(1)

    .build();

Enterprise Configuration with Proxy

Enterprise deployment with corporate proxy and service principal.

// Corporate proxy with authentication
ProxyOptions proxy = new ProxyOptions(
    ProxyOptions.Type.HTTP,
    new InetSocketAddress("proxy.corp.example.com", 8080)
).setCredentials(
    System.getenv("PROXY_USER"),
    System.getenv("PROXY_PASSWORD")
);

// Service principal authentication
TokenCredential credential = new ClientSecretCredentialBuilder()
    .tenantId(System.getenv("AZURE_TENANT_ID"))
    .clientId(System.getenv("AZURE_CLIENT_ID"))
    .clientSecret(System.getenv("AZURE_CLIENT_SECRET"))
    .build();

AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
    // Authentication
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .tokenCredential(credential)
    .deploymentName(System.getenv("AZURE_OPENAI_DEPLOYMENT"))
    .serviceVersion("2024-02-15-preview")

    // Network - proxy and custom headers
    .proxyOptions(proxy)
    .customHeaders(Map.of(
        "X-Corp-ID", System.getenv("CORP_ID"),
        "X-Cost-Center", System.getenv("COST_CENTER")
    ))

    // Reliability
    .timeout(Duration.ofSeconds(90))
    .maxRetries(5)

    .build();

Multi-Model Configuration

Share configuration across multiple model types for consistency.

// Shared configuration
String endpoint = "https://my-resource.openai.azure.com/";
String apiKey = System.getenv("AZURE_OPENAI_API_KEY");
String serviceVersion = "2024-02-15-preview";
Duration timeout = Duration.ofSeconds(60);
int maxRetries = 3;

// Chat model
AzureOpenAiChatModel chatModel = AzureOpenAiChatModel.builder()
    .endpoint(endpoint)
    .apiKey(apiKey)
    .serviceVersion(serviceVersion)
    .deploymentName("gpt-4")
    .timeout(timeout)
    .maxRetries(maxRetries)
    .build();

// Embedding model
AzureOpenAiEmbeddingModel embeddingModel = AzureOpenAiEmbeddingModel.builder()
    .endpoint(endpoint)
    .apiKey(apiKey)
    .serviceVersion(serviceVersion)
    .deploymentName("text-embedding-ada-002")
    .timeout(timeout)
    .maxRetries(maxRetries)
    .build();

// Image model (longer timeout)
AzureOpenAiImageModel imageModel = AzureOpenAiImageModel.builder()
    .endpoint(endpoint)
    .apiKey(apiKey)
    .serviceVersion(serviceVersion)
    .deploymentName("dall-e-3")
    .timeout(Duration.ofSeconds(120))  // Longer for images
    .maxRetries(maxRetries)
    .build();

Environment Variables

Standard environment variable names for configuration.

# Azure OpenAI configuration
export AZURE_OPENAI_ENDPOINT="https://my-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4"

# Non-Azure OpenAI configuration
export OPENAI_API_KEY="your-openai-api-key"

# Azure AD / Service Principal configuration
export AZURE_TENANT_ID="your-tenant-id"
export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"

# Proxy configuration
export HTTPS_PROXY="http://proxy.example.com:8080"
export HTTP_PROXY="http://proxy.example.com:8080"
export NO_PROXY="localhost,127.0.0.1,*.internal"
export PROXY_USER="proxy-username"
export PROXY_PASSWORD="proxy-password"

Usage in code:

model.builder()
    .endpoint(System.getenv("AZURE_OPENAI_ENDPOINT"))
    .apiKey(System.getenv("AZURE_OPENAI_API_KEY"))
    .deploymentName(System.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"))
    .serviceVersion("2024-02-15-preview")
    .build();

API Versions

Azure OpenAI service versions ordered by release date (use latest for newest features).

Available versions:

  • 2024-02-15-preview - Latest preview - Newest features, GPT-4 Turbo, function calling v2
  • 2023-12-01-preview - Stable release - GPT-4 Turbo, vision, DALL-E 3
  • 2023-10-01-preview - Previous stable - GPT-4, function calling v1
  • 2023-08-01-preview - Older version - GPT-3.5 Turbo 16K
  • 2023-06-01-preview - Legacy - GPT-3.5 Turbo, function calling preview
  • 2023-05-15 - GA version - GPT-3.5 Turbo, ChatGPT

Version selection guidelines:

  • Development/Testing: Use latest preview (2024-02-15-preview) for newest features
  • Production: Use latest stable release (2023-12-01-preview or newer GA)
  • Legacy systems: Use specific version required by your deployment

Feature availability by version:

  • 2024-02-15-preview: GPT-4 Turbo, structured outputs, parallel function calling, reasoning models (o1)
  • 2023-12-01-preview: GPT-4 Vision, DALL-E 3, JSON mode, improved function calling
  • 2023-10-01-preview: GPT-4, function calling, system fingerprint
  • 2023-08-01-preview: GPT-3.5 Turbo 16K, function calling

Check Azure OpenAI API versions documentation for the current list and feature availability.

Error Handling

All models may throw these common exceptions during configuration and operation.

Configuration Errors

/**
 * Thrown during model building if configuration is invalid.
 * Common causes:
 * - Missing required parameters (endpoint, deployment, auth)
 * - Invalid parameter values (negative timeout, invalid URL)
 * - Mutually exclusive options (maxRetries + retryOptions)
 */
class IllegalArgumentException extends RuntimeException {
    // Examples:
    // - "endpoint must not be null or empty"
    // - "timeout must be positive"
    // - "cannot specify both maxRetries and retryOptions"
}

Runtime Errors

import dev.langchain4j.exception.ContentFilteredException;
import java.util.concurrent.TimeoutException;

// Content filtered by Azure safety policies
// Not retried automatically
class ContentFilteredException extends RuntimeException {}

// Request timeout exceeded
// Automatically retried if retry policy allows
class TimeoutException extends Exception {}

// Invalid request parameters or state
// Not retried
class IllegalArgumentException extends RuntimeException {}

// Network, API, authentication errors
// Retry behavior depends on HTTP status code
class RuntimeException extends Exception {}

Error handling example:

try {
    Response<?> response = model.generate(input);
} catch (ContentFilteredException e) {
    // Content violated safety policy - do not retry
    logger.warn("Content filtered: {}", e.getMessage());
    // Prompt user to modify input or handle gracefully
} catch (TimeoutException e) {
    // Request timed out - safe to retry with exponential backoff
    logger.error("Request timed out after {}ms", timeout.toMillis());
    // Implement retry with backoff or increase timeout
} catch (IllegalArgumentException e) {
    // Invalid configuration or parameters - fix code
    logger.error("Invalid configuration: {}", e.getMessage());
    // Do not retry, fix the issue in code
} catch (RuntimeException e) {
    // Network, API, or authentication error
    logger.error("Unexpected error", e);
    // Check if retryable based on cause and implement retry logic
}

Best Practices

Security

Never hardcode secrets:

// BAD - Hardcoded API key
.apiKey("1234567890abcdef1234567890abcdef")

// GOOD - Load from environment
.apiKey(System.getenv("AZURE_OPENAI_API_KEY"))

// BETTER - Use managed identity (no secrets)
.tokenCredential(new DefaultAzureCredentialBuilder().build())

Recommendations:

  • Use managed identity in Azure (VM, App Service, AKS, Functions)
  • Store API keys in Azure Key Vault, AWS Secrets Manager, or HashiCorp Vault
  • Use environment variables for local development
  • Rotate API keys regularly (every 30-90 days)
  • Enable logging only in secure environments with proper log protection
  • Use Azure RBAC for fine-grained access control

Reliability

Set appropriate timeouts:

// Match timeout to expected response time
.timeout(Duration.ofSeconds(60))  // Standard requests
.timeout(Duration.ofSeconds(180))  // Long-form generation
.timeout(Duration.ofSeconds(30))   // Quick responses

Configure retry policies:

// Production: Aggressive retries for high availability
.maxRetries(5)

// Development: Fast failure for rapid iteration
.maxRetries(1)

Recommendations:

  • Set timeouts based on expected response time + buffer
  • Configure retries based on availability requirements
  • Implement circuit breakers for cascading failures
  • Monitor rate limits and implement backoff strategies
  • Handle specific exceptions (ContentFilteredException, TimeoutException)

Performance

Reuse model instances:

// GOOD - Create once, reuse across requests
private static final AzureOpenAiChatModel MODEL =
    AzureOpenAiChatModel.builder()
        .endpoint(endpoint)
        .apiKey(apiKey)
        .build();

// Use MODEL for all requests

Don't create per-request:

// BAD - Creates new instance for each request
for (String prompt : prompts) {
    AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()
        .endpoint(endpoint)
        .apiKey(apiKey)
        .build();
    model.generate(prompt);  // Wasteful!
}

Recommendations:

  • Create model instances once and reuse them (thread-safe)
  • Share model instances across threads
  • Use connection pooling (default in Azure SDK)
  • Set reasonable timeouts to avoid hanging requests
  • Consider caching responses for repeated identical requests
  • Use batch operations where available (embedAll for embeddings)

Observability

Use listeners over full logging:

// PREFERRED - Structured observability
.listeners(List.of(metricsListener, costListener))

// AVOID IN PRODUCTION - Full logging with sensitive data
.logRequestsAndResponses(true)  // Only for development

Recommendations:

  • Implement chat model listeners for metrics and cost tracking
  • Track token usage per user/session for billing
  • Monitor latency and error rates for SLA compliance
  • Use correlation IDs via custom headers for distributed tracing
  • Add User-Agent suffix for application identification
  • Log errors but avoid logging full prompts/completions in production

Cost Optimization

Estimate before requesting:

// Estimate tokens before making expensive request
AzureOpenAiTokenCountEstimator estimator =
    new AzureOpenAiTokenCountEstimator(AzureOpenAiChatModelName.GPT_4);

int estimatedTokens = estimator.estimateTokenCountInMessages(messages);
if (estimatedTokens > budget) {
    // Trim messages or reject request
}

Recommendations:

  • Use token estimation before requests to avoid surprises
  • Implement token budgets per user/session
  • Choose appropriate model sizes (GPT-3.5 vs GPT-4)
  • Set maxTokens to limit response length and cost
  • Cache responses where possible to avoid duplicate requests
  • Monitor usage patterns and optimize accordingly
  • Consider using embeddings for search instead of repeated completions

Install with Tessl CLI

npx tessl i tessl/maven-dev-langchain4j--langchain4j-azure-open-ai@1.11.0

docs

audio-transcription.md

chat-models.md

configuration.md

embedding-model.md

image-model.md

index.md

language-models.md

token-counting.md

tile.json