CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/maven-org-springframework-ai--spring-ai-azure-openai

Spring AI integration for Azure OpenAI services providing chat completion, text embeddings, image generation, and audio transcription with GPT, DALL-E, and Whisper models

Overview
Eval results
Files

embeddings-api.mddocs/reference/

Text Embeddings

The embeddings API converts text into vector representations for semantic search, clustering, similarity comparison, and retrieval-augmented generation (RAG) applications.

Imports

import org.springframework.ai.azure.openai.AzureOpenAiEmbeddingModel;
import org.springframework.ai.azure.openai.AzureOpenAiEmbeddingOptions;
import org.springframework.ai.embedding.EmbeddingRequest;
import org.springframework.ai.embedding.EmbeddingResponse;
import org.springframework.ai.embedding.Embedding;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.MetadataMode;
import com.azure.ai.openai.OpenAIClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import io.micrometer.observation.ObservationRegistry;

AzureOpenAiEmbeddingModel

The main class for generating text embeddings.

Thread Safety

Thread-Safe: AzureOpenAiEmbeddingModel is fully thread-safe and can be safely used across multiple threads concurrently. A single instance can handle multiple concurrent embedding requests.

Recommendation: Create one instance and reuse it across your application rather than creating new instances for each request.

Construction

class AzureOpenAiEmbeddingModel extends AbstractEmbeddingModel {
    AzureOpenAiEmbeddingModel(OpenAIClient azureOpenAiClient);

    AzureOpenAiEmbeddingModel(
        OpenAIClient azureOpenAiClient,
        MetadataMode metadataMode
    );

    AzureOpenAiEmbeddingModel(
        OpenAIClient azureOpenAiClient,
        MetadataMode metadataMode,
        AzureOpenAiEmbeddingOptions options
    );

    AzureOpenAiEmbeddingModel(
        OpenAIClient azureOpenAiClient,
        MetadataMode metadataMode,
        AzureOpenAiEmbeddingOptions options,
        ObservationRegistry observationRegistry
    );
}

Parameters:

  • azureOpenAiClient: Azure OpenAI client instance (required, non-null, throws NullPointerException if null)
  • metadataMode: How to handle document metadata (optional, defaults to NONE if not specified)
  • options: Default embedding options (optional, uses model defaults if null)
  • observationRegistry: Micrometer observation registry for metrics (optional, disables observability if null)

Metadata Mode Values:

  • NONE: Exclude document metadata from embedding
  • EMBED: Include metadata in the text to be embedded
  • ALL: Include all metadata fields

Example:

OpenAIClient openAIClient = new OpenAIClientBuilder()
    .credential(new AzureKeyCredential(apiKey))
    .endpoint(endpoint)
    .buildClient();

AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-ada-002")
    .build();

AzureOpenAiEmbeddingModel embeddingModel = new AzureOpenAiEmbeddingModel(
    openAIClient,
    MetadataMode.EMBED,
    options
);

Core Methods

Generate Embeddings

EmbeddingResponse call(EmbeddingRequest request);

Generate embeddings for one or more text inputs.

Parameters:

  • request: The embedding request containing texts and optional options (non-null, throws NullPointerException if null)

Returns: EmbeddingResponse containing embeddings and metadata (never null)

Throws:

  • HttpResponseException: HTTP errors from Azure API (400, 401, 403, 429, 500)
  • ResourceNotFoundException: Deployment not found (404)
  • NonTransientAiException: Permanent failures (invalid parameters, auth errors)
  • TransientAiException: Temporary failures (rate limits, timeouts)
  • NullPointerException: If request is null
  • IllegalArgumentException: If request contains empty text list or more than 2048 texts

Constraints:

  • Input text list cannot be empty
  • Maximum 2048 texts per request
  • Each text maximum 8191 tokens
  • Total request size should not exceed API limits

Example - Single Text:

List<String> texts = List.of("Machine learning is fascinating");
EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);

float[] embedding = response.getResults().get(0).getOutput();
System.out.println("Embedding dimension: " + embedding.length);

Example - Multiple Texts:

List<String> texts = List.of(
    "Natural language processing",
    "Computer vision",
    "Deep learning"
);

EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);

for (int i = 0; i < response.getResults().size(); i++) {
    float[] embedding = response.getResults().get(i).getOutput();
    System.out.println("Text " + i + " embedding: " + embedding.length + " dimensions");
}

Example - With Options:

AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-small")
    .dimensions(512)
    .build();

List<String> texts = List.of("Semantic search example");
EmbeddingRequest request = new EmbeddingRequest(texts, options);
EmbeddingResponse response = embeddingModel.call(request);

Error Handling:

try {
    EmbeddingResponse response = embeddingModel.call(request);
} catch (HttpResponseException e) {
    if (e.getResponse().getStatusCode() == 429) {
        // Rate limit exceeded - implement retry with backoff
        throw new RateLimitException("Rate limit exceeded", e);
    } else if (e.getResponse().getStatusCode() == 400) {
        // Invalid request - check input text length and count
        throw new ValidationException("Invalid embedding request", e);
    }
} catch (IllegalArgumentException e) {
    // Empty text list or too many texts
    throw new ValidationException("Invalid input texts: " + e.getMessage(), e);
}

Embed Document

float[] embed(Document document);

Generate embedding for a single document, returning a float array.

Parameters:

  • document: The document to embed (non-null, throws NullPointerException if null)

Returns: float[] containing the embedding vector (never null, length depends on model)

Throws:

  • Same exceptions as call() method
  • NullPointerException: If document is null

Behavior:

  • If MetadataMode is EMBED or ALL, document metadata is included in embedded text
  • If MetadataMode is NONE, only document content is embedded
  • Document content cannot be null or empty (throws IllegalArgumentException)

Example:

Document doc = new Document("This is a sample document for embedding");
float[] embedding = embeddingModel.embed(doc);
System.out.println("Embedding length: " + embedding.length);

Example - Document with Metadata:

Document doc = new Document(
    "Sample text",
    Map.of("source", "article", "date", "2024-01-01")
);
float[] embedding = embeddingModel.embed(doc);

Configuration Methods

AzureOpenAiEmbeddingOptions getDefaultOptions();
void setObservationConvention(EmbeddingModelObservationConvention observationConvention);

getDefaultOptions():

  • Returns the default options configured for this model instance
  • Returns null if no default options were provided
  • Changes to returned object do not affect the model

setObservationConvention():

  • Sets custom observation convention for metrics/tracing
  • Parameter can be null to use default convention
  • Thread-safe, can be called while model is in use

Example:

AzureOpenAiEmbeddingOptions currentOptions = embeddingModel.getDefaultOptions();
embeddingModel.setObservationConvention(customConvention);

AzureOpenAiEmbeddingOptions

Configuration class for embedding requests.

Construction

class AzureOpenAiEmbeddingOptions implements EmbeddingOptions {
    static Builder builder();
}

Builder

class Builder {
    Builder from(AzureOpenAiEmbeddingOptions fromOptions);
    Builder merge(EmbeddingOptions from);
    Builder from(com.azure.ai.openai.models.EmbeddingsOptions azureOptions);
    Builder user(String user);
    Builder deploymentName(String model);
    Builder inputType(String inputType);
    Builder dimensions(Integer dimensions);
    AzureOpenAiEmbeddingOptions build();
}

Builder Methods:

  • All builder methods return this for fluent chaining (never null)
  • All parameters are optional (can be null)
  • from(): Copy settings from another options instance (parameter non-null)
  • merge(): Merge settings from generic EmbeddingOptions (parameter non-null)
  • build(): Returns non-null AzureOpenAiEmbeddingOptions instance

Properties

Deployment Name

String getDeploymentName();
void setDeploymentName(String deploymentName);
String getModel();
void setModel(String model);

Specifies which Azure OpenAI embedding deployment to use.

Common Deployments:

  • text-embedding-ada-002: OpenAI's ada-002 model (1536 dimensions)
  • text-embedding-3-small: Smaller, faster model (configurable dimensions up to 1536)
  • text-embedding-3-large: Larger, more capable model (configurable dimensions up to 3072)

Constraints:

  • Cannot be null or empty string (throws IllegalArgumentException)
  • Must match an existing deployment in your Azure OpenAI resource

Example:

AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-large")
    .build();

User Identifier

String getUser();
void setUser(String user);

Optional identifier for the end-user, used for abuse monitoring.

Constraints:

  • Max length: 256 characters
  • Optional (can be null)

Example:

AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
    .user("user-123")
    .build();

Input Type

String getInputType();
void setInputType(String inputType);

Hint about the type of input being embedded. Helps the model optimize the embedding for your use case.

Common Values:

  • query: Text is a search query (for semantic search queries)
  • document: Text is a document to be searched (for indexing documents)

Constraints:

  • Optional (can be null, model uses default behavior)
  • Only supported by text-embedding-3-small and text-embedding-3-large
  • Ignored by text-embedding-ada-002

Use Cases:

  • Use "query" when embedding search queries for retrieval
  • Use "document" when embedding documents for indexing
  • Different input types may produce slightly different embeddings optimized for their use case

Example:

// For query embeddings
AzureOpenAiEmbeddingOptions queryOptions = AzureOpenAiEmbeddingOptions.builder()
    .inputType("query")
    .build();

// For document embeddings
AzureOpenAiEmbeddingOptions docOptions = AzureOpenAiEmbeddingOptions.builder()
    .inputType("document")
    .build();

Dimensions

Integer getDimensions();
void setDimensions(Integer dimensions);

Number of dimensions for the output embeddings. Only supported by newer models (e.g., text-embedding-3-small, text-embedding-3-large).

Constraints:

  • Must be > 0 and <= model's maximum dimensions
  • text-embedding-3-small: max 1536
  • text-embedding-3-large: max 3072
  • text-embedding-ada-002: Not configurable (always 1536)
  • Throws IllegalArgumentException if out of range

Benefits of Reducing Dimensions:

  • Smaller vectors = less storage space
  • Faster similarity computations
  • Lower memory usage
  • Acceptable tradeoff in search quality for many use cases

Example:

AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-small")
    .dimensions(512)  // Reduce from default 1536
    .build();

Conversion Methods

com.azure.ai.openai.models.EmbeddingsOptions toAzureOptions(List<String> instructions);

Convert to Azure SDK's native options format.

Parameters:

  • instructions: List of text strings to embed (non-null, can be empty)

Returns: Azure SDK EmbeddingsOptions object (never null)

Usage: Internal method used by the model to convert Spring AI options to Azure SDK format.

Usage Examples

Basic Embedding Generation

OpenAIClient client = new OpenAIClientBuilder()
    .credential(new AzureKeyCredential(apiKey))
    .endpoint(endpoint)
    .buildClient();

AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-ada-002")
    .build();

AzureOpenAiEmbeddingModel embeddingModel = new AzureOpenAiEmbeddingModel(
    client,
    MetadataMode.EMBED,
    options
);

List<String> texts = List.of("Hello world");
EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);

float[] embedding = response.getResults().get(0).getOutput();

Batch Embedding

List<String> documents = List.of(
    "First document about AI",
    "Second document about machine learning",
    "Third document about neural networks",
    "Fourth document about deep learning"
);

EmbeddingRequest request = new EmbeddingRequest(documents, null);
EmbeddingResponse response = embeddingModel.call(request);

List<Embedding> embeddings = response.getResults();
for (int i = 0; i < embeddings.size(); i++) {
    float[] vector = embeddings.get(i).getOutput();
    System.out.println("Document " + i + ": " + vector.length + " dimensions");
}

Semantic Search

// Embed documents
List<String> documents = List.of(
    "The quick brown fox jumps over the lazy dog",
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language"
);

EmbeddingRequest docRequest = new EmbeddingRequest(documents, null);
EmbeddingResponse docResponse = embeddingModel.call(docRequest);
List<float[]> docEmbeddings = docResponse.getResults().stream()
    .map(Embedding::getOutput)
    .toList();

// Embed query
String query = "AI and ML concepts";
EmbeddingRequest queryRequest = new EmbeddingRequest(List.of(query), null);
EmbeddingResponse queryResponse = embeddingModel.call(queryRequest);
float[] queryEmbedding = queryResponse.getResults().get(0).getOutput();

// Calculate cosine similarity
for (int i = 0; i < docEmbeddings.size(); i++) {
    double similarity = cosineSimilarity(queryEmbedding, docEmbeddings.get(i));
    System.out.println("Document " + i + " similarity: " + similarity);
}

Reduced Dimensions for Efficiency

AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-small")
    .dimensions(256)  // Reduce from default for faster processing
    .build();

AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(
    client,
    MetadataMode.EMBED,
    options
);

List<String> texts = List.of("Sample text for embedding");
EmbeddingResponse response = model.call(new EmbeddingRequest(texts, null));
float[] embedding = response.getResults().get(0).getOutput();
System.out.println("Dimensions: " + embedding.length);  // 256

Document Embedding with Metadata

AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(
    client,
    MetadataMode.EMBED  // Include metadata in embedding
);

Document doc = new Document(
    "Spring AI makes it easy to build AI applications",
    Map.of(
        "source", "documentation",
        "category", "tutorial",
        "date", "2024-01-15"
    )
);

float[] embedding = model.embed(doc);

Query vs Document Embeddings

// Configure for query embeddings
AzureOpenAiEmbeddingOptions queryOptions = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-large")
    .inputType("query")
    .build();

EmbeddingRequest queryRequest = new EmbeddingRequest(
    List.of("What is machine learning?"),
    queryOptions
);
EmbeddingResponse queryResponse = embeddingModel.call(queryRequest);

// Configure for document embeddings
AzureOpenAiEmbeddingOptions docOptions = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-large")
    .inputType("document")
    .build();

EmbeddingRequest docRequest = new EmbeddingRequest(
    List.of("Machine learning is a method of data analysis..."),
    docOptions
);
EmbeddingResponse docResponse = embeddingModel.call(docRequest);

With Observability

ObservationRegistry observationRegistry = ObservationRegistry.create();

AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(
    client,
    MetadataMode.EMBED,
    options,
    observationRegistry
);

// Set custom observation convention
model.setObservationConvention(new CustomEmbeddingObservationConvention());

// Embeddings will now be observable
EmbeddingResponse response = model.call(request);

Merging Options

// Default options
AzureOpenAiEmbeddingOptions defaultOptions = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-ada-002")
    .user("default-user")
    .build();

// Override specific options
AzureOpenAiEmbeddingOptions overrideOptions = AzureOpenAiEmbeddingOptions.builder()
    .from(defaultOptions)
    .dimensions(512)
    .build();

// Use overridden options
EmbeddingRequest request = new EmbeddingRequest(texts, overrideOptions);

Common Use Cases

Retrieval-Augmented Generation (RAG)

// 1. Embed and store documents
List<String> knowledge = List.of(
    "Spring AI is a framework for AI applications",
    "It provides abstractions for AI models",
    "Supports OpenAI, Azure OpenAI, and more"
);

EmbeddingRequest request = new EmbeddingRequest(knowledge, null);
EmbeddingResponse response = embeddingModel.call(request);
// Store embeddings in vector database

// 2. Embed user query
String userQuery = "What is Spring AI?";
EmbeddingResponse queryResponse = embeddingModel.call(
    new EmbeddingRequest(List.of(userQuery), null)
);
float[] queryVector = queryResponse.getResults().get(0).getOutput();

// 3. Retrieve relevant documents using similarity search
// 4. Use retrieved context with chat model

Clustering Documents

List<String> documents = loadDocuments();

EmbeddingRequest request = new EmbeddingRequest(documents, null);
EmbeddingResponse response = embeddingModel.call(request);

List<float[]> embeddings = response.getResults().stream()
    .map(Embedding::getOutput)
    .toList();

// Apply clustering algorithm (k-means, hierarchical, etc.)
List<Cluster> clusters = clusterEmbeddings(embeddings);

Duplicate Detection

List<String> texts = List.of(
    "This is the original text",
    "This is the original text",  // Exact duplicate
    "This is an original text",   // Near duplicate
    "Completely different content"
);

EmbeddingRequest request = new EmbeddingRequest(texts, null);
EmbeddingResponse response = embeddingModel.call(request);

// Compare embeddings to find duplicates
for (int i = 0; i < texts.size(); i++) {
    for (int j = i + 1; j < texts.size(); j++) {
        double similarity = cosineSimilarity(
            response.getResults().get(i).getOutput(),
            response.getResults().get(j).getOutput()
        );
        if (similarity > 0.95) {
            System.out.println("Potential duplicate: " + i + " and " + j);
        }
    }
}

Error Handling

Common Exceptions

// Azure SDK exceptions
com.azure.core.exception.HttpResponseException  // HTTP errors (400, 401, 403, 429, 500)
com.azure.core.exception.ResourceNotFoundException  // Deployment not found (404)

// Spring AI exceptions
org.springframework.ai.retry.NonTransientAiException  // Permanent failures
org.springframework.ai.retry.TransientAiException  // Temporary failures (retry-able)

// Java exceptions
java.lang.IllegalArgumentException  // Invalid parameters
java.lang.NullPointerException  // Null required parameters

Exception Scenarios

1. Text Too Long (400):

try {
    // Text exceeds 8191 tokens
    String veryLongText = generateLongText(10000);
    response = embeddingModel.call(new EmbeddingRequest(List.of(veryLongText), null));
} catch (HttpResponseException e) {
    if (e.getResponse().getStatusCode() == 400 && 
        e.getMessage().contains("maximum context length")) {
        // Split text into chunks
        List<String> chunks = splitIntoChunks(veryLongText, 8000);
        for (String chunk : chunks) {
            embeddingModel.call(new EmbeddingRequest(List.of(chunk), null));
        }
    }
}

2. Too Many Texts (400):

try {
    List<String> tooManyTexts = generateTexts(3000);  // Exceeds 2048 limit
    response = embeddingModel.call(new EmbeddingRequest(tooManyTexts, null));
} catch (IllegalArgumentException e) {
    // Batch into groups of 2048
    for (int i = 0; i < tooManyTexts.size(); i += 2048) {
        List<String> batch = tooManyTexts.subList(
            i, 
            Math.min(i + 2048, tooManyTexts.size())
        );
        embeddingModel.call(new EmbeddingRequest(batch, null));
    }
}

3. Rate Limiting (429):

public EmbeddingResponse callWithRetry(List<String> texts) {
    int maxRetries = 3;
    int baseDelayMs = 1000;
    
    for (int attempt = 0; attempt < maxRetries; attempt++) {
        try {
            return embeddingModel.call(new EmbeddingRequest(texts, null));
        } catch (HttpResponseException e) {
            if (e.getResponse().getStatusCode() == 429 && attempt < maxRetries - 1) {
                int delayMs = baseDelayMs * (1 << attempt);
                Thread.sleep(delayMs);
                continue;
            }
            throw e;
        }
    }
    throw new RuntimeException("Max retries exceeded");
}

4. Invalid Dimensions (400):

try {
    AzureOpenAiEmbeddingOptions options = AzureOpenAiEmbeddingOptions.builder()
        .deploymentName("text-embedding-3-small")
        .dimensions(2000)  // Exceeds max of 1536
        .build();
} catch (IllegalArgumentException e) {
    // Use valid dimension
    options = AzureOpenAiEmbeddingOptions.builder()
        .deploymentName("text-embedding-3-small")
        .dimensions(1536)
        .build();
}

Validation Rules

Parameter Constraints Summary

Deployment Name:

  • Required: Yes (throws NullPointerException if null)
  • Format: Non-empty string
  • Must match existing Azure deployment

Dimensions:

  • Range: > 0 and <= model max
    • text-embedding-3-small: max 1536
    • text-embedding-3-large: max 3072
    • text-embedding-ada-002: fixed at 1536 (not configurable)
  • Optional (uses model default if not specified)
  • Type: Integer (nullable)

User Identifier:

  • Max length: 256 characters
  • Optional
  • Type: String (nullable)

Input Type:

  • Values: "query", "document", or null
  • Optional (uses model default if not specified)
  • Only supported by text-embedding-3-small and text-embedding-3-large
  • Type: String (nullable)

Input Texts:

  • Cannot be empty list (throws IllegalArgumentException)
  • Maximum 2048 texts per request (throws IllegalArgumentException if exceeded)
  • Each text maximum 8191 tokens (throws HttpResponseException 400 if exceeded)
  • Text cannot be null or empty

Embedding Models Comparison

ModelDefault DimensionsConfigurableMax InputUse Case
text-embedding-ada-0021536No8191 tokensGeneral purpose, cost-effective
text-embedding-3-small1536Yes (up to 1536)8191 tokensImproved performance, configurable size
text-embedding-3-large3072Yes (up to 3072)8191 tokensBest performance, larger dimensions

Choosing a Model:

  • ada-002: Mature, well-tested, good balance of quality and cost
  • 3-small: Better quality than ada-002, configurable dimensions for storage optimization
  • 3-large: Best quality, use for maximum accuracy in semantic search

Performance Considerations

Batch Processing

Efficient:

// Batch multiple texts in one request
List<String> texts = List.of("text1", "text2", "text3", ...);
EmbeddingResponse response = embeddingModel.call(
    new EmbeddingRequest(texts, null)
);

Inefficient:

// Don't make separate requests for each text
for (String text : texts) {
    embeddingModel.call(new EmbeddingRequest(List.of(text), null));
}

Optimal Batch Size

  • Azure recommends batches of 16-2048 texts
  • Smaller batches (16-100): Lower latency, higher throughput for real-time use cases
  • Larger batches (100-2048): Better cost efficiency for bulk processing
  • Monitor rate limits and adjust batch size accordingly

Reducing Storage with Dimensions

// Full dimensions (higher quality)
AzureOpenAiEmbeddingOptions fullOptions = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-small")
    .dimensions(1536)  // Full dimensions
    .build();

// Reduced dimensions (lower storage, faster search)
AzureOpenAiEmbeddingOptions reducedOptions = AzureOpenAiEmbeddingOptions.builder()
    .deploymentName("text-embedding-3-small")
    .dimensions(512)  // 67% reduction in storage
    .build();

// Test to find optimal dimension for your use case

Model Instance Reuse

Recommended:

// Create once at application startup
@Bean
public AzureOpenAiEmbeddingModel embeddingModel() {
    return new AzureOpenAiEmbeddingModel(client, MetadataMode.EMBED, options);
}

// Inject and reuse
@Autowired
private AzureOpenAiEmbeddingModel embeddingModel;

Avoid:

// Don't create new instance per request
for (String text : texts) {
    AzureOpenAiEmbeddingModel model = new AzureOpenAiEmbeddingModel(...);
    model.call(request);  // Inefficient
}

Parallel Processing

ExecutorService executor = Executors.newFixedThreadPool(10);
List<CompletableFuture<EmbeddingResponse>> futures = new ArrayList<>();

// Split large dataset into batches
List<List<String>> batches = partition(allTexts, 100);

for (List<String> batch : batches) {
    CompletableFuture<EmbeddingResponse> future = CompletableFuture.supplyAsync(
        () -> embeddingModel.call(new EmbeddingRequest(batch, null)),
        executor
    );
    futures.add(future);
}

// Wait for all batches
List<EmbeddingResponse> responses = futures.stream()
    .map(CompletableFuture::join)
    .collect(Collectors.toList());

Troubleshooting

Issue: Embeddings quality is poor

Symptoms: Low similarity scores for semantically similar texts

Solutions:

  1. Use text-embedding-3-large for best quality
  2. Ensure input type matches use case ("query" vs "document")
  3. Keep full dimensions (don't reduce)
  4. Include metadata with MetadataMode.EMBED for documents
  5. Normalize embeddings before computing similarity

Issue: High storage costs

Solutions:

  1. Use text-embedding-3-small with reduced dimensions
  2. Test different dimension sizes (512, 768, 1024) to find acceptable quality/cost tradeoff
  3. Consider using quantization (int8) for stored vectors
  4. Remove duplicate embeddings

Issue: Slow embedding generation

Solutions:

  1. Use batch processing (100-2048 texts per request)
  2. Process batches in parallel
  3. Use text-embedding-3-small instead of 3-large
  4. Reduce dimensions if possible
  5. Ensure proximity to Azure region

Issue: "Maximum context length exceeded" error

Solution: Split long texts into chunks

public List<float[]> embedLongText(String longText, int maxTokens) {
    List<String> chunks = splitIntoChunks(longText, maxTokens);
    EmbeddingResponse response = embeddingModel.call(
        new EmbeddingRequest(chunks, null)
    );
    return response.getResults().stream()
        .map(Embedding::getOutput)
        .collect(Collectors.toList());
}

private List<String> splitIntoChunks(String text, int maxTokens) {
    // Rough approximation: 1 token ≈ 4 characters
    int maxChars = maxTokens * 4;
    List<String> chunks = new ArrayList<>();
    for (int i = 0; i < text.length(); i += maxChars) {
        chunks.add(text.substring(i, Math.min(i + maxChars, text.length())));
    }
    return chunks;
}

Utility Methods

Cosine Similarity

public static double cosineSimilarity(float[] vectorA, float[] vectorB) {
    if (vectorA.length != vectorB.length) {
        throw new IllegalArgumentException("Vectors must have same length");
    }
    
    double dotProduct = 0.0;
    double normA = 0.0;
    double normB = 0.0;
    
    for (int i = 0; i < vectorA.length; i++) {
        dotProduct += vectorA[i] * vectorB[i];
        normA += vectorA[i] * vectorA[i];
        normB += vectorB[i] * vectorB[i];
    }
    
    return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}

Vector Normalization

public static float[] normalize(float[] vector) {
    double norm = 0.0;
    for (float v : vector) {
        norm += v * v;
    }
    norm = Math.sqrt(norm);
    
    float[] normalized = new float[vector.length];
    for (int i = 0; i < vector.length; i++) {
        normalized[i] = (float) (vector[i] / norm);
    }
    return normalized;
}

MetadataMode Options

  • NONE: Exclude document metadata from embedding
  • EMBED: Include metadata in the text to be embedded
  • ALL: Include all metadata fields

Example:

// Without metadata
AzureOpenAiEmbeddingModel modelNoMetadata = new AzureOpenAiEmbeddingModel(
    client,
    MetadataMode.NONE
);

// With metadata
AzureOpenAiEmbeddingModel modelWithMetadata = new AzureOpenAiEmbeddingModel(
    client,
    MetadataMode.EMBED
);

Default Values

  • Deployment Name: Must be specified (no default)
  • Dimensions: Model's default (1536 for ada-002 and 3-small, 3072 for 3-large)
  • Input Type: Not specified (model uses default behavior)
  • User: null (not tracked)
  • Metadata Mode: NONE (metadata excluded)
tessl i tessl/maven-org-springframework-ai--spring-ai-azure-openai@1.1.1

docs

index.md

tile.json