Spring AI integration for Azure OpenAI services providing chat completion, text embeddings, image generation, and audio transcription with GPT, DALL-E, and Whisper models
The chat completion API provides conversational AI capabilities using Azure OpenAI's GPT models. It supports both synchronous and streaming responses, tool calling, structured outputs, and comprehensive configuration options.
import org.springframework.ai.azure.openai.AzureOpenAiChatModel;
import org.springframework.ai.azure.openai.AzureOpenAiChatOptions;
import org.springframework.ai.azure.openai.AzureOpenAiResponseFormat;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.metadata.ChatResponseMetadata;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import reactor.core.publisher.Flux;The main class for chat completion operations.
Thread-Safe: AzureOpenAiChatModel is fully thread-safe and can be safely used across multiple threads concurrently. A single instance can handle multiple concurrent requests.
Recommendation: Create one instance and reuse it across your application rather than creating new instances for each request.
class AzureOpenAiChatModel {
static Builder builder();
class Builder {
Builder openAIClientBuilder(OpenAIClientBuilder openAIClientBuilder);
Builder defaultOptions(AzureOpenAiChatOptions defaultOptions);
Builder toolCallingManager(ToolCallingManager toolCallingManager);
Builder toolExecutionEligibilityPredicate(ToolExecutionEligibilityPredicate predicate);
Builder observationRegistry(ObservationRegistry observationRegistry);
AzureOpenAiChatModel build();
}
}Builder Parameters:
openAIClientBuilder: Azure OpenAI client builder for authentication and connection (required, non-null)defaultOptions: Default chat options applied to all requests (optional, can be overridden per-request)toolCallingManager: Manages tool/function calling capabilities (optional, null disables tool calling)toolExecutionEligibilityPredicate: Predicate to determine if a tool should be executed (optional, for advanced tool control)observationRegistry: Micrometer observation registry for metrics and distributed tracing (optional, null disables observability)Example - Basic Builder:
AzureOpenAiChatModel chatModel = AzureOpenAiChatModel.builder()
.openAIClientBuilder(new OpenAIClientBuilder()
.credential(new AzureKeyCredential(apiKey))
.endpoint(endpoint))
.defaultOptions(AzureOpenAiChatOptions.builder()
.deploymentName("gpt-4o")
.temperature(0.7)
.build())
.observationRegistry(observationRegistry)
.build();Example - With Tool Execution Control:
// Define custom predicate for tool execution eligibility
ToolExecutionEligibilityPredicate toolPredicate = (toolCall, context) -> {
// Only allow certain tools to execute
Set<String> allowedTools = Set.of("get_weather", "search_database");
return allowedTools.contains(toolCall.getToolName());
};
AzureOpenAiChatModel chatModel = AzureOpenAiChatModel.builder()
.openAIClientBuilder(new OpenAIClientBuilder()
.credential(new AzureKeyCredential(apiKey))
.endpoint(endpoint))
.defaultOptions(defaultOptions)
.toolCallingManager(toolCallingManager)
.toolExecutionEligibilityPredicate(toolPredicate)
.build();class AzureOpenAiChatModel {
AzureOpenAiChatModel(
OpenAIClientBuilder openAIClientBuilder,
AzureOpenAiChatOptions defaultOptions,
ToolCallingManager toolCallingManager,
ObservationRegistry observationRegistry
);
AzureOpenAiChatModel(
OpenAIClientBuilder openAIClientBuilder,
AzureOpenAiChatOptions defaultOptions,
ToolCallingManager toolCallingManager,
ObservationRegistry observationRegistry,
ToolExecutionEligibilityPredicate toolExecutionEligibilityPredicate
);
}Constructor Parameters:
openAIClientBuilderopenAIClientBuilder: Required, throws NullPointerException if nulldefaultOptions: Optional, uses model defaults if nulltoolCallingManager: Optional, disables tool calling if nullobservationRegistry: Optional, disables observability if nulltoolExecutionEligibilityPredicate: Optional, all tools eligible if nullChatResponse call(Prompt prompt);Generate a chat response synchronously. Blocks until the complete response is received.
Parameters:
prompt: The prompt containing messages and optional options (non-null, throws NullPointerException if null)Returns: ChatResponse containing the generated response, metadata, and usage information (never null)
Throws:
HttpResponseException: HTTP errors from Azure API (400, 401, 403, 429, 500)ResourceNotFoundException: Deployment not found (404)NonTransientAiException: Permanent failures (invalid parameters, auth errors)TransientAiException: Temporary failures (rate limits, timeouts)NullPointerException: If prompt is nullExample:
Prompt prompt = new Prompt("What is the capital of France?");
ChatResponse response = chatModel.call(prompt);
String answer = response.getResult().getOutput().getText();With Options Override:
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.temperature(0.9)
.maxTokens(100)
.build();
Prompt prompt = new Prompt("Write a creative story", options);
ChatResponse response = chatModel.call(prompt);Multi-turn Conversation:
List<Message> messages = List.of(
new UserMessage("What is machine learning?"),
new AssistantMessage("Machine learning is..."),
new UserMessage("Can you give an example?")
);
Prompt prompt = new Prompt(messages);
ChatResponse response = chatModel.call(prompt);Error Handling:
try {
ChatResponse response = chatModel.call(prompt);
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 429) {
// Rate limit - implement retry with backoff
throw new RateLimitException("Rate limit exceeded", e);
} else if (e.getResponse().getStatusCode() == 401) {
// Auth error - check credentials
throw new AuthenticationException("Invalid credentials", e);
}
} catch (ResourceNotFoundException e) {
// Deployment not found
throw new ConfigurationException("Invalid deployment name", e);
}Flux<ChatResponse> stream(Prompt prompt);Generate a chat response as a reactive stream for real-time token delivery. Returns immediately with a Flux that emits tokens as they become available.
Parameters:
prompt: The prompt containing messages and optional options (non-null)Returns: Flux<ChatResponse> emitting partial responses as tokens arrive (never null, may complete immediately on error)
Throws (via Flux error signal):
call() method, signaled through Flux error channelStream Behavior:
ChatResponse contains one or more new tokensresponse.getResult().getOutput().getText()streamUsage(true) is setExample:
Prompt prompt = new Prompt("Explain quantum physics");
Flux<ChatResponse> responseStream = chatModel.stream(prompt);
responseStream.subscribe(
chatResponse -> {
String token = chatResponse.getResult().getOutput().getText();
System.out.print(token);
},
error -> System.err.println("Error: " + error),
() -> System.out.println("\nComplete")
);Collecting Full Response:
String fullResponse = chatModel.stream(prompt)
.map(response -> response.getResult().getOutput().getText())
.collectList()
.map(tokens -> String.join("", tokens))
.block();Streaming with Usage Tracking:
AzureOpenAiChatOptions optionsWithUsage = AzureOpenAiChatOptions.builder()
.streamUsage(true) // Enable usage reporting in stream
.build();
Prompt prompt = new Prompt("Tell me a story", optionsWithUsage);
Flux<ChatResponse> stream = chatModel.stream(prompt);
stream.subscribe(
chatResponse -> {
String token = chatResponse.getResult().getOutput().getText();
if (token != null) {
System.out.print(token);
}
// Check for usage metadata in final chunk
if (chatResponse.getMetadata() != null) {
Usage usage = chatResponse.getMetadata().getUsage();
if (usage != null) {
System.out.println("\nTotal tokens used: " + usage.getTotalTokens());
}
}
}
);Error Handling in Streams:
chatModel.stream(prompt)
.onErrorResume(throwable -> {
if (throwable instanceof HttpResponseException) {
HttpResponseException httpEx = (HttpResponseException) throwable;
if (httpEx.getResponse().getStatusCode() == 429) {
// Retry after delay
return Mono.delay(Duration.ofSeconds(1))
.flatMapMany(tick -> chatModel.stream(prompt));
}
}
return Flux.error(throwable);
})
.subscribe(/* ... */);AzureOpenAiChatOptions getDefaultOptions();
void setObservationConvention(ChatModelObservationConvention observationConvention);getDefaultOptions():
setObservationConvention():
Static factory methods for creating metadata from Azure responses:
static ChatResponseMetadata from(ChatCompletions chatCompletions, PromptMetadata promptFilterMetadata);
static ChatResponseMetadata from(ChatCompletions chatCompletions, PromptMetadata promptFilterMetadata, Usage usage);
static ChatResponseMetadata from(ChatCompletions chatCompletions, PromptMetadata promptFilterMetadata, CompletionsUsage usage);
static ChatResponseMetadata from(ChatResponse chatResponse, Usage usage);These methods convert Azure SDK response objects into Spring AI metadata objects.
Method Descriptions:
from(ChatCompletions, PromptMetadata): Create metadata from Azure chat completions with prompt filter metadata (returns non-null ChatResponseMetadata)from(ChatCompletions, PromptMetadata, Usage): Create metadata with Spring AI Usage object (returns non-null ChatResponseMetadata)from(ChatCompletions, PromptMetadata, CompletionsUsage): Create metadata with Azure CompletionsUsage object (returns non-null ChatResponseMetadata)from(ChatResponse, Usage): Create metadata from existing ChatResponse (returns non-null ChatResponseMetadata)Example - Accessing Response Metadata:
ChatResponse response = chatModel.call(prompt);
ChatResponseMetadata metadata = response.getMetadata();
// Access usage information
Usage usage = metadata.getUsage();
System.out.println("Prompt tokens: " + usage.getPromptTokens());
System.out.println("Generation tokens: " + usage.getGenerationTokens());
System.out.println("Total tokens: " + usage.getTotalTokens());
// Access finish reason
String finishReason = response.getResult().getMetadata().getFinishReason();
System.out.println("Finish reason: " + finishReason);Configuration class for chat completion requests.
class AzureOpenAiChatOptions {
static Builder builder();
static AzureOpenAiChatOptions fromOptions(AzureOpenAiChatOptions fromOptions);
AzureOpenAiChatOptions copy();
}fromOptions(): Creates new instance copying all settings from another instance (parameter non-null, returns non-null)
copy(): Creates deep copy of this instance (returns non-null)
class Builder {
Builder deploymentName(String deploymentName);
Builder temperature(Double temperature);
Builder topP(Double topP);
Builder maxTokens(Integer maxTokens);
Builder maxCompletionTokens(Integer maxCompletionTokens);
Builder N(Integer n);
Builder frequencyPenalty(Double frequencyPenalty);
Builder presencePenalty(Double presencePenalty);
Builder logitBias(Map<String, Integer> logitBias);
Builder stop(List<String> stop);
Builder user(String user);
Builder seed(Long seed);
Builder responseFormat(AzureOpenAiResponseFormat responseFormat);
Builder logprobs(Boolean logprobs);
Builder topLogprobs(Integer topLogprobs);
Builder reasoningEffort(String reasoningEffort);
Builder enhancements(AzureChatEnhancementConfiguration enhancements);
Builder streamOptions(ChatCompletionStreamOptions streamOptions);
Builder streamUsage(Boolean enableStreamUsage);
Builder toolCallbacks(List<ToolCallback> toolCallbacks);
Builder toolCallbacks(ToolCallback... toolCallbacks);
Builder toolNames(Set<String> toolNames);
Builder toolNames(String... toolNames);
Builder internalToolExecutionEnabled(Boolean internalToolExecutionEnabled);
Builder toolContext(Map<String, Object> toolContext);
AzureOpenAiChatOptions build();
}Builder Methods:
this for fluent chaining (never null)build() returns non-null AzureOpenAiChatOptions instanceString getDeploymentName();
void setDeploymentName(String deploymentName);
String getModel();
void setModel(String model);The deployment name specifies which Azure OpenAI deployment to use (e.g., "gpt-4o", "gpt-4", "gpt-35-turbo").
Constraints:
IllegalArgumentException if invalidCommon Deployments:
Example:
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.deploymentName("gpt-4o")
.build();Integer getMaxTokens();
void setMaxTokens(Integer maxTokens);
Integer getMaxCompletionTokens();
void setMaxCompletionTokens(Integer maxCompletionTokens);Important: These two parameters are mutually exclusive and serve different purposes:
maxTokens: Maximum total tokens (prompt + completion). Used by non-reasoning models (GPT-4, GPT-3.5, etc.)maxCompletionTokens: Maximum tokens in the completion only. Required for reasoning models (o1, o3, o4-mini)Constraints:
IllegalArgumentException)maxTokens for standard models, maxCompletionTokens for reasoning modelsModel Context Limits:
For Standard Models (GPT-4, GPT-3.5, etc.):
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.deploymentName("gpt-4o")
.maxTokens(1000) // Total tokens including prompt
.build();For Reasoning Models (o1, o3, o4-mini):
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.deploymentName("o1")
.maxCompletionTokens(2000) // Completion tokens only
.build();Do not use both parameters together - use maxTokens for standard models and maxCompletionTokens for reasoning models.
Double getTemperature();
void setTemperature(Double temperature);
Double getTopP();
void setTopP(Double topP);
Integer getN();
void setN(Integer n);temperature: Controls randomness (0.0 = deterministic, 2.0 = very random). Default: 0.7topP: Nucleus sampling threshold (0.0-1.0)n: Number of completions to generate per promptConstraints:
temperature: Must be 0.0-2.0 (throws IllegalArgumentException if out of range)topP: Must be 0.0-1.0 (throws IllegalArgumentException if out of range)n: Must be >= 1 (throws IllegalArgumentException if < 1)Temperature Guidelines:
Example:
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.temperature(0.8)
.topP(0.9)
.N(1)
.build();Double getFrequencyPenalty();
void setFrequencyPenalty(Double frequencyPenalty);
Double getPresencePenalty();
void setPresencePenalty(Double presencePenalty);
Map<String, Integer> getLogitBias();
void setLogitBias(Map<String, Integer> logitBias);frequencyPenalty: Reduce repetition based on frequency (-2.0 to 2.0)presencePenalty: Reduce repetition based on presence (-2.0 to 2.0)logitBias: Modify likelihood of specific tokensConstraints:
frequencyPenalty: Must be -2.0 to 2.0presencePenalty: Must be -2.0 to 2.0logitBias: Map keys are token IDs (as strings), values are -100 to 100Penalty Guidelines:
Example:
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.frequencyPenalty(0.5)
.presencePenalty(0.5)
.build();Logit Bias Example:
// Discourage specific words (get token IDs from tokenizer)
Map<String, Integer> bias = Map.of(
"1234", -100, // Completely ban token 1234
"5678", 50 // Boost likelihood of token 5678
);
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.logitBias(bias)
.build();List<String> getStop();
void setStop(List<String> stop);
List<String> getStopSequences();
void setStopSequences(List<String> stopSequences);Define sequences where the model will stop generating tokens.
Constraints:
Use Cases:
Example:
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.stop(List.of("\n", "END", "###"))
.build();AzureOpenAiResponseFormat getResponseFormat();
void setResponseFormat(AzureOpenAiResponseFormat responseFormat);Control the format of the model's output (text, JSON object, or JSON schema).
Format Types:
TEXT: Plain text response (default)JSON_OBJECT: Valid JSON object (no schema enforcement)JSON_SCHEMA: JSON conforming to specific schema (strict validation)Constraints:
Example - JSON Object:
AzureOpenAiResponseFormat format = AzureOpenAiResponseFormat.builder()
.type(AzureOpenAiResponseFormat.Type.JSON_OBJECT)
.build();
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.responseFormat(format)
.build();
// Prompt must request JSON output
Prompt prompt = new Prompt("List 3 colors in JSON format", options);Example - JSON Schema:
Map<String, Object> schema = Map.of(
"type", "object",
"properties", Map.of(
"name", Map.of("type", "string"),
"age", Map.of("type", "number")
),
"required", List.of("name")
);
AzureOpenAiResponseFormat format = AzureOpenAiResponseFormat.builder()
.type(AzureOpenAiResponseFormat.Type.JSON_SCHEMA)
.jsonSchema(AzureOpenAiResponseFormat.JsonSchema.builder()
.name("PersonSchema")
.schema(schema)
.strict(true)
.build())
.build();Long getSeed();
void setSeed(Long seed);
Boolean isLogprobs();
void setLogprobs(Boolean logprobs);
Integer getTopLogProbs();
void setTopLogProbs(Integer topLogProbs);
String getReasoningEffort();
void setReasoningEffort(String reasoningEffort);
AzureChatEnhancementConfiguration getEnhancements();
void setEnhancements(AzureChatEnhancementConfiguration enhancements);
ChatCompletionStreamOptions getStreamOptions();
void setStreamOptions(ChatCompletionStreamOptions streamOptions);
Boolean getStreamUsage();
void setStreamUsage(Boolean enableStreamUsage);
String getUser();
void setUser(String user);seed: Integer seed for deterministic samplinglogprobs: Return log probabilities for tokenstopLogProbs: Number of top log probabilities to return (1-20)reasoningEffort: Control reasoning effort for reasoning models (o1, o3, o4-mini). Valid values: "low", "medium", "high"enhancements: Azure-specific enhancements (e.g., grounding, OCR)streamOptions: Azure ChatCompletionStreamOptions for fine-grained streaming controlstreamUsage: Include usage token counts in streaming responses (convenience alternative to streamOptions)user: Identifier for the end-user (for abuse monitoring)Constraints:
seed: Any long value, null for non-deterministiclogprobs: Boolean flagtopLogProbs: 1-20 if logprobs is true, null otherwisereasoningEffort: Must be "low", "medium", or "high" (reasoning models only)user: Max 256 charactersNote: streamOptions and streamUsage are related but serve different purposes:
streamUsage is a boolean convenience flag to enable usage reporting in streamsstreamOptions provides more detailed control via Azure's ChatCompletionStreamOptions objectExample - Standard Options:
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.seed(12345L)
.logprobs(true)
.topLogprobs(5)
.user("user-123")
.build();Example - Reasoning Models:
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.deploymentName("o1")
.maxCompletionTokens(5000)
.reasoningEffort("high") // Increases thinking time for better results
.build();The reasoningEffort parameter controls how much computational effort reasoning models invest in problem-solving:
"low": Faster responses with less reasoning depth"medium": Balanced speed and reasoning quality (default)"high": Maximum reasoning depth, slower but more thoroughOnly applicable to reasoning models: o1, o3, o4-mini. This parameter is ignored by standard models like GPT-4 and GPT-3.5.
List<ToolCallback> getToolCallbacks();
void setToolCallbacks(List<ToolCallback> toolCallbacks);
Set<String> getToolNames();
void setToolNames(Set<String> toolNames);
void setFunctions(Set<String> functions);
Boolean getInternalToolExecutionEnabled();
void setInternalToolExecutionEnabled(Boolean internalToolExecutionEnabled);
Map<String, Object> getToolContext();
void setToolContext(Map<String, Object> toolContext);Configure tool/function calling for the model.
Parameters:
toolCallbacks: List of tool implementations (can be empty or null)toolNames: Set of tool names to make available (must match callback names)internalToolExecutionEnabled: Whether to automatically execute tools (default: true)toolContext: Additional context passed to tool callbacks (can be null or empty)Constraints:
toolNames must have corresponding callbacks in toolCallbacksExample:
// Define tool callback
ToolCallback weatherTool = new ToolCallback(
"get_weather",
"Get current weather for a location",
(args) -> {
String location = (String) args.get("location");
return getWeatherData(location);
}
);
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.toolCallbacks(weatherTool)
.toolNames("get_weather")
.internalToolExecutionEnabled(true)
.toolContext(Map.of("api_key", weatherApiKey))
.build();Defines the structure of the model's response.
class AzureOpenAiResponseFormat {
static Builder builder();
Type getType();
void setType(Type type);
JsonSchema getJsonSchema();
void setJsonSchema(JsonSchema jsonSchema);
String getSchema();
void setSchema(String schema);
}class Builder {
Builder type(Type type);
Builder jsonSchema(JsonSchema jsonSchema);
Builder jsonSchema(String jsonSchema);
AzureOpenAiResponseFormat build();
}enum Type {
TEXT,
JSON_OBJECT,
JSON_SCHEMA
}TEXT: Plain text response (default)JSON_OBJECT: Valid JSON objectJSON_SCHEMA: JSON conforming to a specific schemaclass JsonSchema {
static Builder builder();
String getName();
Map<String, Object> getSchema();
Boolean getStrict();
class Builder {
Builder name(String name);
Builder schema(Map<String, Object> schema);
Builder schema(String schema);
Builder strict(Boolean strict);
JsonSchema build();
}
}JsonSchema Properties:
name: Schema name (default: "custom_schema" if not specified, must not be null or empty)schema: The JSON schema definition as a Map or JSON string (non-null, must be valid JSON Schema Draft 7)strict: Whether to enforce strict schema matching (default: true)Strict Mode:
true: Model output must exactly match schema (validation errors if mismatch)false: Model attempts to match schema but may deviateBuilder Methods:
name(String): Set the schema name (parameter non-null)schema(Map<String, Object>): Set schema from Map (parameter non-null)schema(String): Set schema from JSON string (convenience method, parameter non-null)strict(Boolean): Enable or disable strict schema enforcementbuild(): Build the JsonSchema instance (returns non-null)Example - Using Map:
AzureOpenAiResponseFormat.JsonSchema schema =
AzureOpenAiResponseFormat.JsonSchema.builder()
.name("ProductInfo")
.schema(Map.of(
"type", "object",
"properties", Map.of(
"product_name", Map.of("type", "string"),
"price", Map.of("type", "number"),
"in_stock", Map.of("type", "boolean")
),
"required", List.of("product_name", "price")
))
.strict(true)
.build();
AzureOpenAiResponseFormat format = AzureOpenAiResponseFormat.builder()
.type(AzureOpenAiResponseFormat.Type.JSON_SCHEMA)
.jsonSchema(schema)
.build();Example - Using JSON String:
String schemaJson = """
{
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
},
"required": ["product_name", "price"]
}
""";
AzureOpenAiResponseFormat.JsonSchema schema =
AzureOpenAiResponseFormat.JsonSchema.builder()
.name("ProductInfo")
.schema(schemaJson) // Convenience method accepts JSON string
.strict(true)
.build();
AzureOpenAiResponseFormat format = AzureOpenAiResponseFormat.builder()
.type(AzureOpenAiResponseFormat.Type.JSON_SCHEMA)
.jsonSchema(schema)
.build();Example - Using setSchema Convenience Method:
AzureOpenAiResponseFormat format = AzureOpenAiResponseFormat.builder()
.type(AzureOpenAiResponseFormat.Type.JSON_SCHEMA)
.build();
// Set schema as JSON string directly
String schemaJson = "{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"}}}";
format.setSchema(schemaJson);
// The schema will be automatically parsed and setAzureOpenAiChatModel chatModel = AzureOpenAiChatModel.builder()
.openAIClientBuilder(new OpenAIClientBuilder()
.credential(new AzureKeyCredential(apiKey))
.endpoint(endpoint))
.defaultOptions(AzureOpenAiChatOptions.builder()
.deploymentName("gpt-4o")
.build())
.build();
ChatResponse response = chatModel.call(new Prompt("Hello, how are you?"));
System.out.println(response.getResult().getOutput().getText());Flux<ChatResponse> stream = chatModel.stream(new Prompt("Tell me a story"));
stream.subscribe(chunk -> {
String content = chunk.getResult().getOutput().getText();
if (content != null) {
System.out.print(content);
}
});Map<String, Object> schema = Map.of(
"type", "object",
"properties", Map.of(
"city", Map.of("type", "string"),
"country", Map.of("type", "string"),
"population", Map.of("type", "integer")
),
"required", List.of("city", "country")
);
AzureOpenAiResponseFormat format = AzureOpenAiResponseFormat.builder()
.type(AzureOpenAiResponseFormat.Type.JSON_SCHEMA)
.jsonSchema(AzureOpenAiResponseFormat.JsonSchema.builder()
.name("CityInfo")
.schema(schema)
.strict(true)
.build())
.build();
AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.responseFormat(format)
.build();
Prompt prompt = new Prompt("Tell me about Paris", options);
ChatResponse response = chatModel.call(prompt);
String jsonResponse = response.getResult().getOutput().getText();
// jsonResponse will be valid JSON matching the schemaList<Message> conversation = new ArrayList<>();
conversation.add(new UserMessage("What is the weather in Seattle?"));
// Get response
ChatResponse response1 = chatModel.call(new Prompt(conversation));
conversation.add(new AssistantMessage(response1.getResult().getOutput().getText()));
// Continue conversation
conversation.add(new UserMessage("What about tomorrow?"));
ChatResponse response2 = chatModel.call(new Prompt(conversation));// Creative writing (high temperature)
AzureOpenAiChatOptions creativeOptions = AzureOpenAiChatOptions.builder()
.temperature(1.5)
.build();
Prompt creativePrompt = new Prompt("Write a creative story", creativeOptions);
ChatResponse creativeResponse = chatModel.call(creativePrompt);
// Factual response (low temperature)
AzureOpenAiChatOptions factualOptions = AzureOpenAiChatOptions.builder()
.temperature(0.1)
.build();
Prompt factualPrompt = new Prompt("What is 2+2?", factualOptions);
ChatResponse factualResponse = chatModel.call(factualPrompt);AzureOpenAiChatOptions options = AzureOpenAiChatOptions.builder()
.maxTokens(100)
.build();
Prompt prompt = new Prompt("Explain quantum physics", options);
ChatResponse response = chatModel.call(prompt);
// Response will be limited to 100 tokensReasoning models (o1, o3, o4-mini) are specialized models that spend more time "thinking" before responding, making them ideal for complex problem-solving, mathematical reasoning, and coding tasks.
Key Differences from Standard Models:
maxCompletionTokens instead of maxTokensreasoningEffort parameter to control thinking depthBasic Reasoning Model Usage:
AzureOpenAiChatOptions reasoningOptions = AzureOpenAiChatOptions.builder()
.deploymentName("o1")
.maxCompletionTokens(2000)
.build();
Prompt prompt = new Prompt(
"Solve this complex math problem: Find the derivative of f(x) = x^3 * ln(x)",
reasoningOptions
);
ChatResponse response = chatModel.call(prompt);With Reasoning Effort Control:
// High effort for complex problems
AzureOpenAiChatOptions highEffortOptions = AzureOpenAiChatOptions.builder()
.deploymentName("o1")
.maxCompletionTokens(5000)
.reasoningEffort("high") // More thorough reasoning
.build();
Prompt complexPrompt = new Prompt(
"Design a distributed system architecture for a global e-commerce platform with 100M+ users",
highEffortOptions
);
ChatResponse detailedResponse = chatModel.call(complexPrompt);
// Low effort for simpler questions
AzureOpenAiChatOptions lowEffortOptions = AzureOpenAiChatOptions.builder()
.deploymentName("o1")
.maxCompletionTokens(1000)
.reasoningEffort("low") // Faster responses
.build();
Prompt simplePrompt = new Prompt("What is 15 * 24?", lowEffortOptions);
ChatResponse quickResponse = chatModel.call(simplePrompt);Code Generation with Reasoning Models:
AzureOpenAiChatOptions codeOptions = AzureOpenAiChatOptions.builder()
.deploymentName("o1")
.maxCompletionTokens(3000)
.reasoningEffort("high")
.build();
Prompt codePrompt = new Prompt(
"Write a Java implementation of a thread-safe LRU cache with O(1) operations",
codeOptions
);
ChatResponse codeResponse = chatModel.call(codePrompt);Mathematical Problem Solving:
AzureOpenAiChatOptions mathOptions = AzureOpenAiChatOptions.builder()
.deploymentName("o1")
.maxCompletionTokens(2000)
.reasoningEffort("medium")
.build();
Prompt mathPrompt = new Prompt(
"Prove that the square root of 2 is irrational using contradiction",
mathOptions
);
ChatResponse proofResponse = chatModel.call(mathPrompt);// Azure SDK exceptions
com.azure.core.exception.HttpResponseException // HTTP errors (400, 401, 403, 429, 500)
com.azure.core.exception.ResourceNotFoundException // Deployment not found (404)
// Spring AI exceptions
org.springframework.ai.retry.NonTransientAiException // Permanent failures
org.springframework.ai.retry.TransientAiException // Temporary failures (retry-able)HttpResponseException Status Codes:
ResourceNotFoundExceptionCommon Error Scenarios:
try {
response = chatModel.call(prompt);
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 429) {
// Wait and retry with exponential backoff
int retryAfterSeconds = extractRetryAfter(e.getResponse());
Thread.sleep(retryAfterSeconds * 1000);
response = chatModel.call(prompt);
}
}try {
response = chatModel.call(prompt);
} catch (ResourceNotFoundException e) {
// Deployment name is incorrect or doesn't exist
throw new ConfigurationException(
"Deployment '" + options.getDeploymentName() + "' not found. " +
"Check Azure portal for valid deployment names.", e
);
}try {
response = chatModel.call(prompt);
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 401) {
// API key is invalid or expired
throw new AuthenticationException(
"Invalid Azure OpenAI credentials. Check API key and endpoint.", e
);
}
}try {
response = chatModel.call(prompt);
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 403) {
// Content filtered or quota exceeded
String errorBody = e.getResponse().getBodyAsString().block();
if (errorBody.contains("content_filter")) {
throw new ContentFilterException(
"Request blocked by content filter. Review prompt content.", e
);
} else {
throw new QuotaException("Quota exceeded. Check Azure usage limits.", e);
}
}
}try {
response = chatModel.call(prompt);
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 400) {
String errorMessage = e.getMessage();
if (errorMessage.contains("maximum context length")) {
// Prompt + max_tokens exceeds model limit
throw new TokenLimitException(
"Total tokens exceed model limit. Reduce prompt length or maxTokens.", e
);
}
}
}Exponential Backoff with Jitter:
public ChatResponse callWithRetry(AzureOpenAiChatModel model, Prompt prompt) {
int maxRetries = 3;
int baseDelayMs = 1000;
for (int attempt = 0; attempt < maxRetries; attempt++) {
try {
return model.call(prompt);
} catch (HttpResponseException e) {
int statusCode = e.getResponse().getStatusCode();
// Only retry on transient errors
if (statusCode == 429 || statusCode == 500 || statusCode == 503) {
if (attempt < maxRetries - 1) {
// Exponential backoff with jitter
int delayMs = baseDelayMs * (1 << attempt);
int jitter = ThreadLocalRandom.current().nextInt(0, delayMs / 2);
Thread.sleep(delayMs + jitter);
continue;
}
}
// Non-retryable error or max retries exceeded
throw e;
}
}
throw new RuntimeException("Max retries exceeded");
}Deployment Name:
Temperature:
Top P:
Max Tokens:
Max Completion Tokens:
N (Number of Completions):
Frequency Penalty:
Presence Penalty:
Logit Bias:
Stop Sequences:
Seed:
Top Log Probs:
Reasoning Effort:
User Identifier:
Symptom: 400 Bad Request with parameter validation error
Common Causes:
maxTokens with reasoning models (use maxCompletionTokens instead)maxTokens and maxCompletionTokens togetherSolution: Validate parameters match the constraints documented above.
Symptom: Flux completes before full response generated
Common Causes:
Solution:
chatModel.stream(prompt)
.doOnComplete(() -> System.out.println("Stream completed"))
.doOnError(error -> System.err.println("Stream error: " + error))
.subscribe(/* ... */);Symptom: Model doesn't return valid JSON matching schema
Solutions:
strict(true) in JsonSchema builderSymptom: Requests take longer than expected
Causes & Solutions:
reasoningEffort("low")Recommended:
// Create once at application startup
@Bean
public AzureOpenAiChatModel chatModel() {
return AzureOpenAiChatModel.builder()
.openAIClientBuilder(clientBuilder)
.defaultOptions(defaultOptions)
.build();
}
// Inject and reuse
@Autowired
private AzureOpenAiChatModel chatModel;
public void handleRequest() {
chatModel.call(prompt); // Reuse same instance
}Avoid:
// Don't create new instance per request
public void handleRequest() {
AzureOpenAiChatModel model = AzureOpenAiChatModel.builder()...build();
model.call(prompt); // Inefficient
}Use streaming for:
Flux<ChatResponse> stream = chatModel.stream(prompt);
stream.subscribe(
chunk -> updateUI(chunk.getResult().getOutput().getText()),
error -> handleError(error),
() -> markComplete()
);Models are thread-safe and can handle concurrent requests:
ExecutorService executor = Executors.newFixedThreadPool(10);
List<CompletableFuture<ChatResponse>> futures = new ArrayList<>();
for (Prompt prompt : prompts) {
CompletableFuture<ChatResponse> future = CompletableFuture.supplyAsync(
() -> chatModel.call(prompt),
executor
);
futures.add(future);
}
// Wait for all to complete
List<ChatResponse> responses = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());Implement caching for repeated identical requests:
private final Map<String, ChatResponse> responseCache = new ConcurrentHashMap<>();
public ChatResponse getCachedResponse(String promptText) {
return responseCache.computeIfAbsent(promptText, key -> {
Prompt prompt = new Prompt(key);
return chatModel.call(prompt);
});
}tessl i tessl/maven-org-springframework-ai--spring-ai-azure-openai@1.1.1