Spring AI integration for Azure OpenAI services providing chat completion, text embeddings, image generation, and audio transcription with GPT, DALL-E, and Whisper models
The embeddings API converts text into vector representations for semantic search, clustering, similarity comparison, and retrieval-augmented generation (RAG) applications.
import org.springframework.ai.azure.openai.AzureOpenAiEmbeddingModel;
import org.springframework.ai.azure.openai.AzureOpenAiEmbeddingOptions;
import org.springframework.ai.embedding.EmbeddingRequest;
import org.springframework.ai.embedding.EmbeddingResponse;
import org.springframework.ai.embedding.Embedding;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.MetadataMode;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import io.micrometer.observation.ObservationRegistry;The main class for generating text embeddings.
Thread-Safe: AzureOpenAiEmbeddingModel is fully thread-safe and can be safely used across multiple threads concurrently. A single instance can handle multiple concurrent embedding requests.
Recommendation: Create one instance and reuse it across your application rather than creating new instances for each request.
class AzureOpenAiEmbeddingModel extends AbstractEmbeddingModel {
AzureOpenAiEmbeddingModel(OpenAIClient azureOpenAiClient);
AzureOpenAiEmbeddingModel(
OpenAIClient azureOpenAiClient,
MetadataMode metadataMode
);
AzureOpenAiEmbeddingModel(
OpenAIClient azureOpenAiClient,
MetadataMode metadataMode,
AzureOpenAiEmbeddingOptions options
);
AzureOpenAiEmbeddingModel(
OpenAIClient azureOpenAiClient,
MetadataMode metadataMode,
AzureOpenAiEmbeddingOptions options,
ObservationRegistry observationRegistry
);
}Parameters:
azureOpenAiClient: Azure OpenAI client instance (required, non-null, throws NullPointerException if null)metadataMode: How to handle document metadata (optional, defaults to NONE if not specified)options: Default embedding options (optional, uses model defaults if null)observationRegistry: Micrometer observation registry for metrics (optional, disables observability if null)Metadata Mode Values:
NONE: Exclude document metadata from embeddingEMBED: Include metadata in the text to be embeddedALL: Include all metadata fieldsExample:
OpenAIClient openAIClient = new OpenAIClientBuilder()
.credential(new AzureKeyCredential(apiKey))
.endpoint(endpoint)
.buildClient();
AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-ada-002")
.build();
AzureOpenAiEmbeddingModel embeddingModel = new AzureOpenAiEmbeddingModel(
openAIClient,
MetadataMode.EMBED,
options
);EmbeddingResponse call(EmbeddingRequest request);Generate embeddings for one or more text inputs.
Parameters:
request: The embedding request containing texts and optional options (non-null, throws NullPointerException if null)Returns: EmbeddingResponse containing embeddings and metadata (never null)
Throws:
HttpResponseException: HTTP errors from Azure API (400, 401, 403, 429, 500)ResourceNotFoundException: Deployment not found (404)NonTransientAiException: Permanent failures (invalid parameters, auth errors)TransientAiException: Temporary failures (rate limits, timeouts)NullPointerException: If request is nullIllegalArgumentException: If request contains empty text list or more than 2048 textsConstraints:
Example - Single Text:
List<String> texts = List.of("Machine learning is fascinating");
EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);
float[] embedding = response.getResults().get(0).getOutput();
System.out.println("Embedding dimension: " + embedding.length);Example - Multiple Texts:
List<String> texts = List.of(
"Natural language processing",
"Computer vision",
"Deep learning"
);
EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);
for (int i = 0; i < response.getResults().size(); i++) {
float[] embedding = response.getResults().get(i).getOutput();
System.out.println("Text " + i + " embedding: " + embedding.length + " dimensions");
}Example - With Options:
AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-small")
.dimensions(512)
.build();
List<String> texts = List.of("Semantic search example");
EmbeddingRequest request = new EmbeddingRequest(texts, options);
EmbeddingResponse response = embeddingModel.call(request);Error Handling:
try {
EmbeddingResponse response = embeddingModel.call(request);
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 429) {
// Rate limit exceeded - implement retry with backoff
throw new RateLimitException("Rate limit exceeded", e);
} else if (e.getResponse().getStatusCode() == 400) {
// Invalid request - check input text length and count
throw new ValidationException("Invalid embedding request", e);
}
} catch (IllegalArgumentException e) {
// Empty text list or too many texts
throw new ValidationException("Invalid input texts: " + e.getMessage(), e);
}float[] embed(Document document);Generate embedding for a single document, returning a float array.
Parameters:
document: The document to embed (non-null, throws NullPointerException if null)Returns: float[] containing the embedding vector (never null, length depends on model)
Throws:
call() methodNullPointerException: If document is nullBehavior:
MetadataMode is EMBED or ALL, document metadata is included in embedded textMetadataMode is NONE, only document content is embeddedExample:
Document doc = new Document("This is a sample document for embedding");
float[] embedding = embeddingModel.embed(doc);
System.out.println("Embedding length: " + embedding.length);Example - Document with Metadata:
Document doc = new Document(
"Sample text",
Map.of("source", "article", "date", "2024-01-01")
);
float[] embedding = embeddingModel.embed(doc);AzureOpenAiEmbeddingOptions getDefaultOptions();
void setObservationConvention(EmbeddingModelObservationConvention observationConvention);getDefaultOptions():
setObservationConvention():
Example:
AzureOpenAiEmbeddingOptions currentOptions = embeddingModel.getDefaultOptions();
embeddingModel.setObservationConvention(customConvention);Configuration class for embedding requests.
class AzureOpenAiEmbeddingOptions implements EmbeddingOptions {
static Builder builder();
}class Builder {
Builder from(AzureOpenAiEmbeddingOptions fromOptions);
Builder merge(EmbeddingOptions from);
Builder from(com.azure.ai.openai.models.EmbeddingsOptions azureOptions);
Builder user(String user);
Builder deploymentName(String model);
Builder inputType(String inputType);
Builder dimensions(Integer dimensions);
AzureOpenAiEmbeddingOptions build();
}Builder Methods:
this for fluent chaining (never null)from(): Copy settings from another options instance (parameter non-null)merge(): Merge settings from generic EmbeddingOptions (parameter non-null)build(): Returns non-null AzureOpenAiEmbeddingOptions instanceString getDeploymentName();
void setDeploymentName(String deploymentName);
String getModel();
void setModel(String model);Specifies which Azure OpenAI embedding deployment to use.
Common Deployments:
text-embedding-ada-002: OpenAI's ada-002 model (1536 dimensions)text-embedding-3-small: Smaller, faster model (configurable dimensions up to 1536)text-embedding-3-large: Larger, more capable model (configurable dimensions up to 3072)Constraints:
Example:
AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-large")
.build();String getUser();
void setUser(String user);Optional identifier for the end-user, used for abuse monitoring.
Constraints:
Example:
AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.user("user-123")
.build();String getInputType();
void setInputType(String inputType);Hint about the type of input being embedded. Helps the model optimize the embedding for your use case.
Common Values:
query: Text is a search query (for semantic search queries)document: Text is a document to be searched (for indexing documents)Constraints:
Use Cases:
Example:
// For query embeddings
AzureOpenAiEmbeddingOptions queryOptions = AzureOpenAiEmbeddingOptions.builder()
.inputType("query")
.build();
// For document embeddings
AzureOpenAiEmbeddingOptions docOptions = AzureOpenAiEmbeddingOptions.builder()
.inputType("document")
.build();Integer getDimensions();
void setDimensions(Integer dimensions);Number of dimensions for the output embeddings. Only supported by newer models (e.g., text-embedding-3-small, text-embedding-3-large).
Constraints:
IllegalArgumentException if out of rangeBenefits of Reducing Dimensions:
Example:
AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-small")
.dimensions(512) // Reduce from default 1536
.build();com.azure.ai.openai.models.EmbeddingsOptions toAzureOptions(List<String> instructions);Convert to Azure SDK's native options format.
Parameters:
instructions: List of text strings to embed (non-null, can be empty)Returns: Azure SDK EmbeddingsOptions object (never null)
Usage: Internal method used by the model to convert Spring AI options to Azure SDK format.
OpenAIClient client = new OpenAIClientBuilder()
.credential(new AzureKeyCredential(apiKey))
.endpoint(endpoint)
.buildClient();
AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-ada-002")
.build();
AzureOpenAiEmbeddingModel embeddingModel = new AzureOpenAiEmbeddingModel(
client,
MetadataMode.EMBED,
options
);
List<String> texts = List.of("Hello world");
EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);
float[] embedding = response.getResults().get(0).getOutput();List<String> documents = List.of(
"First document about AI",
"Second document about machine learning",
"Third document about neural networks",
"Fourth document about deep learning"
);
EmbeddingRequest request = new EmbeddingRequest(documents, null);
EmbeddingResponse response = embeddingModel.call(request);
List<Embedding> embeddings = response.getResults();
for (int i = 0; i < embeddings.size(); i++) {
float[] vector = embeddings.get(i).getOutput();
System.out.println("Document " + i + ": " + vector.length + " dimensions");
}// Embed documents
List<String> documents = List.of(
"The quick brown fox jumps over the lazy dog",
"Machine learning is a subset of artificial intelligence",
"Python is a popular programming language"
);
EmbeddingRequest docRequest = new EmbeddingRequest(documents, null);
EmbeddingResponse docResponse = embeddingModel.call(docRequest);
List<float[]> docEmbeddings = docResponse.getResults().stream()
.map(Embedding::getOutput)
.toList();
// Embed query
String query = "AI and ML concepts";
EmbeddingRequest queryRequest = new EmbeddingRequest(List.of(query), null);
EmbeddingResponse queryResponse = embeddingModel.call(queryRequest);
float[] queryEmbedding = queryResponse.getResults().get(0).getOutput();
// Calculate cosine similarity
for (int i = 0; i < docEmbeddings.size(); i++) {
double similarity = cosineSimilarity(queryEmbedding, docEmbeddings.get(i));
System.out.println("Document " + i + " similarity: " + similarity);
}AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-small")
.dimensions(256) // Reduce from default for faster processing
.build();
AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(
client,
MetadataMode.EMBED,
options
);
List<String> texts = List.of("Sample text for embedding");
EmbeddingResponse response = model.call(new EmbeddingRequest(texts, null));
float[] embedding = response.getResults().get(0).getOutput();
System.out.println("Dimensions: " + embedding.length); // 256AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(
client,
MetadataMode.EMBED // Include metadata in embedding
);
Document doc = new Document(
"Spring AI makes it easy to build AI applications",
Map.of(
"source", "documentation",
"category", "tutorial",
"date", "2024-01-15"
)
);
float[] embedding = model.embed(doc);// Configure for query embeddings
AzureOpenAiEmbeddingOptions queryOptions = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-large")
.inputType("query")
.build();
EmbeddingRequest queryRequest = new EmbeddingRequest(
List.of("What is machine learning?"),
queryOptions
);
EmbeddingResponse queryResponse = embeddingModel.call(queryRequest);
// Configure for document embeddings
AzureOpenAiEmbeddingOptions docOptions = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-large")
.inputType("document")
.build();
EmbeddingRequest docRequest = new EmbeddingRequest(
List.of("Machine learning is a method of data analysis..."),
docOptions
);
EmbeddingResponse docResponse = embeddingModel.call(docRequest);ObservationRegistry observationRegistry = ObservationRegistry.create();
AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(
client,
MetadataMode.EMBED,
options,
observationRegistry
);
// Set custom observation convention
model.setObservationConvention(new CustomEmbeddingObservationConvention());
// Embeddings will now be observable
EmbeddingResponse response = model.call(request);// Default options
AzureOpenAiEmbeddingOptions defaultOptions = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-ada-002")
.user("default-user")
.build();
// Override specific options
AzureOpenAiEmbeddingOptions overrideOptions = AzureOpenAiEmbeddingOptions.builder()
.from(defaultOptions)
.dimensions(512)
.build();
// Use overridden options
EmbeddingRequest request = new EmbeddingRequest(texts, overrideOptions);// 1. Embed and store documents
List<String> knowledge = List.of(
"Spring AI is a framework for AI applications",
"It provides abstractions for AI models",
"Supports OpenAI, Azure OpenAI, and more"
);
EmbeddingRequest request = new EmbeddingRequest(knowledge, null);
EmbeddingResponse response = embeddingModel.call(request);
// Store embeddings in vector database
// 2. Embed user query
String userQuery = "What is Spring AI?";
EmbeddingResponse queryResponse = embeddingModel.call(
new EmbeddingRequest(List.of(userQuery), null)
);
float[] queryVector = queryResponse.getResults().get(0).getOutput();
// 3. Retrieve relevant documents using similarity search
// 4. Use retrieved context with chat modelList<String> documents = loadDocuments();
EmbeddingRequest request = new EmbeddingRequest(documents, null);
EmbeddingResponse response = embeddingModel.call(request);
List<float[]> embeddings = response.getResults().stream()
.map(Embedding::getOutput)
.toList();
// Apply clustering algorithm (k-means, hierarchical, etc.)
List<Cluster> clusters = clusterEmbeddings(embeddings);List<String> texts = List.of(
"This is the original text",
"This is the original text", // Exact duplicate
"This is an original text", // Near duplicate
"Completely different content"
);
EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);
// Compare embeddings to find duplicates
for (int i = 0; i < texts.size(); i++) {
for (int j = i + 1; j < texts.size(); j++) {
double similarity = cosineSimilarity(
response.getResults().get(i).getOutput(),
response.getResults().get(j).getOutput()
);
if (similarity > 0.95) {
System.out.println("Potential duplicate: " + i + " and " + j);
}
}
}// Azure SDK exceptions
com.azure.core.exception.HttpResponseException // HTTP errors (400, 401, 403, 429, 500)
com.azure.core.exception.ResourceNotFoundException // Deployment not found (404)
// Spring AI exceptions
org.springframework.ai.retry.NonTransientAiException // Permanent failures
org.springframework.ai.retry.TransientAiException // Temporary failures (retry-able)
// Java exceptions
java.lang.IllegalArgumentException // Invalid parameters
java.lang.NullPointerException // Null required parameters1. Text Too Long (400):
try {
// Text exceeds 8191 tokens
String veryLongText = generateLongText(10000);
response = embeddingModel.call(new EmbeddingRequest(List.of(veryLongText), null));
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 400 &&
e.getMessage().contains("maximum context length")) {
// Split text into chunks
List<String> chunks = splitIntoChunks(veryLongText, 8000);
for (String chunk : chunks) {
embeddingModel.call(new EmbeddingRequest(List.of(chunk), null));
}
}
}2. Too Many Texts (400):
try {
List<String> tooManyTexts = generateTexts(3000); // Exceeds 2048 limit
response = embeddingModel.call(new EmbeddingRequest(tooManyTexts, null));
} catch (IllegalArgumentException e) {
// Batch into groups of 2048
for (int i = 0; i < tooManyTexts.size(); i += 2048) {
List<String> batch = tooManyTexts.subList(
i,
Math.min(i + 2048, tooManyTexts.size())
);
embeddingModel.call(new EmbeddingRequest(batch, null));
}
}3. Rate Limiting (429):
public EmbeddingResponse callWithRetry(List<String> texts) {
int maxRetries = 3;
int baseDelayMs = 1000;
for (int attempt = 0; attempt < maxRetries; attempt++) {
try {
return embeddingModel.call(new EmbeddingRequest(texts, null));
} catch (HttpResponseException e) {
if (e.getResponse().getStatusCode() == 429 && attempt < maxRetries - 1) {
int delayMs = baseDelayMs * (1 << attempt);
Thread.sleep(delayMs);
continue;
}
throw e;
}
}
throw new RuntimeException("Max retries exceeded");
}4. Invalid Dimensions (400):
try {
AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-small")
.dimensions(2000) // Exceeds max of 1536
.build();
} catch (IllegalArgumentException e) {
// Use valid dimension
options = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-small")
.dimensions(1536)
.build();
}Deployment Name:
Dimensions:
User Identifier:
Input Type:
Input Texts:
| Model | Default Dimensions | Configurable | Max Input | Use Case |
|---|---|---|---|---|
| text-embedding-ada-002 | 1536 | No | 8191 tokens | General purpose, cost-effective |
| text-embedding-3-small | 1536 | Yes (up to 1536) | 8191 tokens | Improved performance, configurable size |
| text-embedding-3-large | 3072 | Yes (up to 3072) | 8191 tokens | Best performance, larger dimensions |
Choosing a Model:
Efficient:
// Batch multiple texts in one request
List<String> texts = List.of("text1", "text2", "text3", ...);
EmbeddingResponse response = embeddingModel.call(
new EmbeddingRequest(texts, null)
);Inefficient:
// Don't make separate requests for each text
for (String text : texts) {
embeddingModel.call(new EmbeddingRequest(List.of(text), null));
}// Full dimensions (higher quality)
AzureOpenAiEmbeddingOptions fullOptions = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-small")
.dimensions(1536) // Full dimensions
.build();
// Reduced dimensions (lower storage, faster search)
AzureOpenAiEmbeddingOptions reducedOptions = AzureOpenAiEmbeddingOptions.builder()
.deploymentName("text-embedding-3-small")
.dimensions(512) // 67% reduction in storage
.build();
// Test to find optimal dimension for your use caseRecommended:
// Create once at application startup
@Bean
public AzureOpenAiEmbeddingModel embeddingModel() {
return new AzureOpenAiEmbeddingModel(client, MetadataMode.EMBED, options);
}
// Inject and reuse
@Autowired
private AzureOpenAiEmbeddingModel embeddingModel;Avoid:
// Don't create new instance per request
for (String text : texts) {
AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(...);
model.call(request); // Inefficient
}ExecutorService executor = Executors.newFixedThreadPool(10);
List<CompletableFuture<EmbeddingResponse>> futures = new ArrayList<>();
// Split large dataset into batches
List<List<String>> batches = partition(allTexts, 100);
for (List<String> batch : batches) {
CompletableFuture<EmbeddingResponse> future = CompletableFuture.supplyAsync(
() -> embeddingModel.call(new EmbeddingRequest(batch, null)),
executor
);
futures.add(future);
}
// Wait for all batches
List<EmbeddingResponse> responses = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());Symptoms: Low similarity scores for semantically similar texts
Solutions:
Solutions:
Solutions:
Solution: Split long texts into chunks
public List<float[]> embedLongText(String longText, int maxTokens) {
List<String> chunks = splitIntoChunks(longText, maxTokens);
EmbeddingResponse response = embeddingModel.call(
new EmbeddingRequest(chunks, null)
);
return response.getResults().stream()
.map(Embedding::getOutput)
.collect(Collectors.toList());
}
private List<String> splitIntoChunks(String text, int maxTokens) {
// Rough approximation: 1 token ≈ 4 characters
int maxChars = maxTokens * 4;
List<String> chunks = new ArrayList<>();
for (int i = 0; i < text.length(); i += maxChars) {
chunks.add(text.substring(i, Math.min(i + maxChars, text.length())));
}
return chunks;
}public static double cosineSimilarity(float[] vectorA, float[] vectorB) {
if (vectorA.length != vectorB.length) {
throw new IllegalArgumentException("Vectors must have same length");
}
double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
for (int i = 0; i < vectorA.length; i++) {
dotProduct += vectorA[i] * vectorB[i];
normA += vectorA[i] * vectorA[i];
normB += vectorB[i] * vectorB[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}public static float[] normalize(float[] vector) {
double norm = 0.0;
for (float v : vector) {
norm += v * v;
}
norm = Math.sqrt(norm);
float[] normalized = new float[vector.length];
for (int i = 0; i < vector.length; i++) {
normalized[i] = (float) (vector[i] / norm);
}
return normalized;
}NONE: Exclude document metadata from embeddingEMBED: Include metadata in the text to be embeddedALL: Include all metadata fieldsExample:
// Without metadata
AzureOpenAiEmbeddingModel modelNoMetadata = new AzureOpenAiEmbeddingModel(
client,
MetadataMode.NONE
);
// With metadata
AzureOpenAiEmbeddingModel modelWithMetadata = new AzureOpenAiEmbeddingModel(
client,
MetadataMode.EMBED
);tessl i tessl/maven-org-springframework-ai--spring-ai-azure-openai@1.1.1