CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-openai

Quarkus LangChain4j OpenAI extension provides seamless integration between Quarkus and OpenAI's Large Language Models, enabling developers to easily incorporate LLMs into their applications with support for chat, streaming, embeddings, moderation, and image generation.

Overview
Eval results
Files

embedding-models.mddocs/

Embedding Models

Comprehensive API reference for OpenAI embedding models in Quarkus. Embedding models convert text into high-dimensional vector representations that capture semantic meaning, enabling advanced AI capabilities like semantic search, similarity comparison, and Retrieval-Augmented Generation (RAG). The extension provides an enhanced builder pattern through Service Provider Interface (SPI) registration, adding Quarkus-specific capabilities to the standard LangChain4j builders.

Introduction to Embeddings

Embeddings are numerical vector representations of text that encode semantic meaning. Unlike traditional keyword matching, embeddings enable AI systems to understand context, synonyms, and conceptual relationships. Two texts with similar meanings produce similar embeddings, even if they use completely different words.

Use Cases

Semantic Search: Find documents based on meaning rather than exact keyword matches. A search for "automobile repair" can find documents about "car maintenance" without explicit keyword overlap.

Retrieval-Augmented Generation (RAG): Enhance language model responses by retrieving relevant context from a knowledge base. When a user asks a question, the system finds semantically similar documents using embeddings and provides them as context to the language model.

Similarity Detection: Calculate how similar two pieces of text are by comparing their embedding vectors. Useful for duplicate detection, content recommendation, and clustering.

Classification: Train classifiers on embeddings to categorize text into topics, sentiments, or intents without extensive labeled data.

Clustering: Group similar documents together for organization, topic discovery, or content analysis.

Architecture Overview

The embedding models implementation uses an SPI-based pattern where Quarkus-enhanced builders are automatically used when creating OpenAI embedding models. The builders extend LangChain4j's base builders to add:

  • Named configurations for managing multiple embedding model instances
  • TLS configuration for custom certificates in enterprise environments
  • HTTP proxy support for corporate network environments
  • Configuration-driven development with automatic CDI integration
  • Batch processing for efficient embedding of multiple texts

Capabilities

QuarkusOpenAiEmbeddingModelBuilderFactory

Factory class implementing the Service Provider Interface for creating Quarkus-enhanced OpenAI embedding model builders.

/**
 * SPI factory for creating OpenAI embedding models with Quarkus extensions.
 *
 * Registered via: META-INF/services/dev.langchain4j.model.openai.spi.OpenAiEmbeddingModelBuilderFactory
 *
 * This factory is automatically discovered and used when calling
 * OpenAiEmbeddingModel.builder(), providing Quarkus-specific functionality
 * transparently.
 *
 * Usage:
 *     EmbeddingModel model = OpenAiEmbeddingModel.builder()
 *         .configName("semantic-search")  // Quarkus-specific
 *         .apiKey("sk-...")
 *         .build();
 */
public class QuarkusOpenAiEmbeddingModelBuilderFactory
    implements OpenAiEmbeddingModelBuilderFactory {

    /**
     * Creates a new Quarkus-enhanced builder instance.
     *
     * Returns:
     *     Builder instance with both Quarkus-specific and LangChain4j methods
     */
    @Override
    public OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder get();
}

QuarkusOpenAiEmbeddingModelBuilderFactory.Builder

Enhanced builder class extending LangChain4j's OpenAiEmbeddingModelBuilder with Quarkus-specific methods.

/**
 * Enhanced builder for OpenAI embedding models with Quarkus features.
 *
 * Extends: dev.langchain4j.model.openai.OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder
 *
 * Embedding models convert text into vector representations for semantic
 * operations. Different OpenAI embedding models offer tradeoffs between
 * dimensionality, cost, and performance.
 *
 * Usage:
 *     EmbeddingModel model = OpenAiEmbeddingModel.builder()
 *         .configName("rag-embeddings")       // Quarkus-specific
 *         .tlsConfigurationName("custom-tls") // Quarkus-specific
 *         .apiKey("sk-...")                   // LangChain4j inherited
 *         .modelName("text-embedding-3-small") // LangChain4j inherited
 *         .build();
 */
public static class Builder extends OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder {

    /**
     * Set the named configuration to use.
     *
     * Parameters:
     *     configName - Name of configuration defined in application.properties
     *
     * Returns:
     *     This builder for method chaining
     *
     * When specified, the builder loads settings from the named configuration
     * instead of the default configuration. For example, configName("semantic")
     * loads from quarkus.langchain4j.openai.semantic.* properties.
     *
     * This enables using different embedding models for different purposes
     * (e.g., one for semantic search, another for classification).
     *
     * Example:
     *     .configName("semantic")  // Uses quarkus.langchain4j.openai.semantic.*
     */
    public Builder configName(String configName);

    /**
     * Set the named TLS configuration for HTTPS connections.
     *
     * Parameters:
     *     tlsConfigurationName - Name of Quarkus TLS configuration
     *
     * Returns:
     *     This builder for method chaining
     *
     * References a Quarkus named TLS configuration defined via
     * quarkus.tls.{name}.* properties for custom certificates,
     * client authentication, or custom trust stores.
     *
     * Required when using custom certificate authorities or client
     * certificates for mutual TLS authentication.
     *
     * Example:
     *     .tlsConfigurationName("enterprise-certs")
     */
    public Builder tlsConfigurationName(String tlsConfigurationName);

    /**
     * Set HTTP proxy for API requests.
     *
     * Parameters:
     *     proxy - java.net.Proxy instance (HTTP or SOCKS)
     *
     * Returns:
     *     This builder for method chaining
     *
     * Configures HTTP proxy for routing OpenAI API requests through
     * corporate proxies or network gateways. Essential for enterprise
     * environments with restricted internet access.
     *
     * Example:
     *     Proxy proxy = new Proxy(Proxy.Type.HTTP,
     *         new InetSocketAddress("proxy.company.com", 8080));
     *     .proxy(proxy)
     */
    public Builder proxy(Proxy proxy);

    /**
     * Build the OpenAI embedding model instance.
     *
     * Returns:
     *     Configured OpenAiEmbeddingModel instance
     *
     * Creates the embedding model with all configured settings. If configName
     * was specified, applies settings from that named configuration.
     * Validates required settings (API key) and initializes the underlying
     * HTTP client.
     *
     * The returned model is thread-safe and can be reused across requests.
     * Consider creating a single instance and injecting it via CDI rather
     * than building new instances for each request.
     */
    @Override
    public OpenAiEmbeddingModel build();

    /**
     * Public fields (direct access, though builder methods are recommended).
     */
    public String configName;               // Named configuration reference
    public String tlsConfigurationName;     // Named TLS configuration
    public Proxy proxy;                     // HTTP proxy configuration
}

Inherited LangChain4j Builder Methods

All methods from OpenAiEmbeddingModel.OpenAiEmbeddingModelBuilder are available. Key methods include:

/**
 * Core configuration methods inherited from LangChain4j.
 *
 * These methods are part of the standard LangChain4j API and work
 * seamlessly with Quarkus enhancements.
 */

/**
 * Set the OpenAI API base URL.
 *
 * Parameters:
 *     baseUrl - API endpoint URL
 *
 * Default: "https://api.openai.com/v1/"
 *
 * Use for OpenAI-compatible embedding providers or custom deployments.
 * Some providers (e.g., Azure OpenAI) require custom base URLs.
 *
 * Example:
 *     .baseUrl("https://custom-openai.example.com/v1")
 */
public Builder baseUrl(String baseUrl);

/**
 * Set the OpenAI API key.
 *
 * Parameters:
 *     apiKey - Your OpenAI API key (format: sk-...)
 *
 * Required: Yes
 *
 * Obtain from: https://platform.openai.com/api-keys
 *
 * Keep API keys secure. Use environment variables or Quarkus configuration
 * rather than hardcoding keys in source code.
 *
 * Example:
 *     .apiKey("sk-proj-...")
 */
public Builder apiKey(String apiKey);

/**
 * Set the OpenAI organization ID.
 *
 * Parameters:
 *     organizationId - Organization identifier
 *
 * Required: No
 *
 * For users belonging to multiple organizations. Find at:
 * https://platform.openai.com/account/organization
 *
 * When specified, API usage is billed to the specified organization.
 *
 * Example:
 *     .organizationId("org-...")
 */
public Builder organizationId(String organizationId);

/**
 * Set the embedding model name.
 *
 * Parameters:
 *     modelName - OpenAI embedding model identifier
 *
 * Default: "text-embedding-ada-002"
 *
 * Available models:
 *     - "text-embedding-3-small" - 1536 dimensions, cost-effective, good performance
 *     - "text-embedding-3-large" - 3072 dimensions, highest quality, higher cost
 *     - "text-embedding-ada-002" - 1536 dimensions, legacy model, widely compatible
 *
 * Model selection considerations:
 *     - text-embedding-3-small: Best for most use cases, balances cost and quality
 *     - text-embedding-3-large: Use when maximum semantic understanding is critical
 *     - text-embedding-ada-002: Use for compatibility with existing embeddings
 *
 * Example:
 *     .modelName("text-embedding-3-small")
 */
public Builder modelName(String modelName);

/**
 * Set request timeout.
 *
 * Parameters:
 *     timeout - Maximum time to wait for response
 *
 * Default: 10 seconds
 *
 * Maximum duration to wait for OpenAI API responses. Embedding requests
 * are typically fast, but batch operations may require longer timeouts.
 *
 * For large batch operations (hundreds of texts), consider increasing
 * the timeout to 30-60 seconds.
 *
 * Example:
 *     .timeout(Duration.ofSeconds(30))
 */
public Builder timeout(Duration timeout);

/**
 * Set maximum retry attempts.
 *
 * Parameters:
 *     maxRetries - Maximum number of retries
 *
 * Default: 1 (no retries)
 * Deprecated: Use MicroProfile Fault Tolerance instead
 *
 * Number of retry attempts for failed requests. Built-in retry
 * is deprecated in favor of MicroProfile Fault Tolerance patterns,
 * which provide more sophisticated retry strategies with exponential
 * backoff and circuit breakers.
 *
 * Example:
 *     .maxRetries(3)
 */
@Deprecated
public Builder maxRetries(Integer maxRetries);

/**
 * Enable request logging.
 *
 * Parameters:
 *     logRequests - true to log requests
 *
 * Default: false
 *
 * When enabled, logs full request payloads sent to OpenAI API.
 * Useful for debugging but may expose sensitive data in logs.
 *
 * Request logs include the text being embedded, which may contain
 * sensitive or proprietary information.
 *
 * Example:
 *     .logRequests(true)
 */
public Builder logRequests(Boolean logRequests);

/**
 * Enable response logging.
 *
 * Parameters:
 *     logResponses - true to log responses
 *
 * Default: false
 *
 * When enabled, logs full response payloads from OpenAI API.
 * Useful for debugging and monitoring.
 *
 * Response logs include high-dimensional vectors, which can be
 * verbose. Consider using only in development environments.
 *
 * Example:
 *     .logResponses(true)
 */
public Builder logResponses(Boolean logResponses);

/**
 * Set end-user identifier for abuse monitoring.
 *
 * Parameters:
 *     user - Unique identifier for the end-user
 *
 * Required: No
 * Recommended: Yes, for production applications
 *
 * A unique identifier representing your end-user, which can help
 * OpenAI monitor and detect abuse. This should be a unique identifier
 * per user (e.g., user ID, email hash) rather than personally
 * identifiable information.
 *
 * OpenAI uses this to detect patterns of abuse and may use it to
 * take action on accounts that violate their usage policies.
 *
 * Example:
 *     .user("user-12345")
 *     .user(UUID.randomUUID().toString())
 */
public Builder user(String user);

EmbeddingModelConfig Interface

Configuration interface for embedding model settings, used for declarative configuration.

/**
 * Configuration interface for OpenAI embedding models.
 *
 * Configuration prefix:
 *     - Default: quarkus.langchain4j.openai.embedding-model
 *     - Named: quarkus.langchain4j.openai.{name}.embedding-model
 *
 * All properties can be set in application.properties or application.yaml.
 *
 * Embedding models are configured separately from chat models, allowing
 * different settings for different use cases (e.g., search vs RAG).
 */
@ConfigGroup
public interface EmbeddingModelConfig {

    /**
     * Model name to use.
     *
     * Property: model-name
     * Default: "text-embedding-ada-002"
     *
     * Returns:
     *     The OpenAI embedding model identifier
     *
     * Available models:
     *     - "text-embedding-3-small" - 1536 dimensions, cost-effective
     *     - "text-embedding-3-large" - 3072 dimensions, highest quality
     *     - "text-embedding-ada-002" - 1536 dimensions, legacy model
     */
    @WithDefault("text-embedding-ada-002")
    String modelName();

    /**
     * Enable request logging.
     *
     * Property: log-requests
     * Default: false
     *
     * Returns:
     *     Optional boolean for request logging
     *
     * When enabled, logs full request payloads sent to OpenAI API,
     * including text being embedded. Use cautiously with sensitive data.
     */
    Optional<Boolean> logRequests();

    /**
     * Enable response logging.
     *
     * Property: log-responses
     * Default: false
     *
     * Returns:
     *     Optional boolean for response logging
     *
     * When enabled, logs full response payloads including embedding vectors.
     * Vectors are high-dimensional and can produce verbose logs.
     */
    Optional<Boolean> logResponses();

    /**
     * End-user identifier for abuse monitoring.
     *
     * Property: user
     *
     * Returns:
     *     Optional user identifier string
     *
     * A unique identifier representing your end-user, which can help
     * OpenAI monitor and detect abuse. Should be unique per user but
     * not personally identifiable information.
     */
    Optional<String> user();
}

Available Embedding Models

OpenAI provides three primary embedding models, each optimized for different use cases:

text-embedding-3-small

Dimensions: 1536 (default) or customizable up to 1536 Cost: $0.00002 per 1K tokens (62,500 pages per dollar) Performance: Excellent for most use cases

Best for:

  • General-purpose semantic search
  • Cost-sensitive applications
  • RAG systems with moderate complexity
  • Applications requiring fast processing

Use when: You need high-quality embeddings at minimal cost and 1536 dimensions provide sufficient semantic resolution.

text-embedding-3-large

Dimensions: 3072 (default) or customizable up to 3072 Cost: $0.00013 per 1K tokens (7,692 pages per dollar) Performance: Highest quality semantic understanding

Best for:

  • High-precision semantic search
  • Complex domain-specific content
  • Advanced RAG systems requiring nuanced understanding
  • Applications where quality is more important than cost

Use when: Maximum semantic precision is critical and you can justify higher costs for better results.

text-embedding-ada-002

Dimensions: 1536 (fixed) Cost: $0.00010 per 1K tokens (10,000 pages per dollar) Performance: Solid baseline performance

Best for:

  • Compatibility with existing embedding databases
  • Legacy applications
  • Projects started before text-embedding-3 models

Use when: You have existing embeddings from ada-002 and need consistency, or require compatibility with older systems.

Embedding Dimensions and Use Cases

Embedding dimensions represent the size of the vector space. Higher dimensions can capture more nuanced semantic relationships but require more storage and computation.

1536 dimensions (text-embedding-3-small, text-embedding-ada-002):

  • Storage: ~6KB per embedding (as float32)
  • Use case: General semantic search, basic RAG, similarity detection
  • Tradeoff: Good balance of semantic understanding and efficiency

3072 dimensions (text-embedding-3-large):

  • Storage: ~12KB per embedding (as float32)
  • Use case: Fine-grained semantic analysis, complex domain knowledge
  • Tradeoff: Higher storage and compute cost for better semantic precision

Dimension reduction: Both text-embedding-3 models support dimension reduction by truncating vectors from the end. This allows storage optimization while maintaining reasonable semantic quality.

Usage Examples

Example 1: Basic Text Embedding with CDI

Simple embedding generation using dependency injection:

// application.properties
// quarkus.langchain4j.openai.api-key=sk-...
// quarkus.langchain4j.openai.embedding-model.model-name=text-embedding-3-small

import jakarta.inject.Inject;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;

public class EmbeddingService {

    @Inject
    EmbeddingModel embeddingModel;

    public float[] embedText(String text) {
        // Generate embedding for a single text
        Response<Embedding> response = embeddingModel.embed(text);
        Embedding embedding = response.content();

        // Get vector as array
        float[] vector = embedding.vector();

        return vector;
    }

    public List<Float> embedTextAsList(String text) {
        // Get embedding as List<Float> for easier manipulation
        Response<Embedding> response = embeddingModel.embed(text);
        return response.content().vectorAsList();
    }

    public int getDimensions(String text) {
        // Check embedding dimensions
        Response<Embedding> response = embeddingModel.embed(text);
        return response.content().dimension();
    }
}

Example 2: Batch Embedding of Multiple Texts

Efficiently embed multiple texts in a single API call:

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import jakarta.inject.Inject;
import java.util.List;
import java.util.stream.Collectors;

public class BatchEmbeddingService {

    @Inject
    EmbeddingModel embeddingModel;

    public List<float[]> embedDocuments(List<String> documents) {
        // Convert strings to TextSegments
        List<TextSegment> segments = documents.stream()
            .map(TextSegment::from)
            .collect(Collectors.toList());

        // Batch embed all segments in a single API call
        Response<List<Embedding>> response = embeddingModel.embedAll(segments);

        // Extract vectors
        return response.content().stream()
            .map(Embedding::vector)
            .collect(Collectors.toList());
    }

    public void embedLargeDataset(List<String> documents) {
        // For very large datasets, process in batches to avoid timeouts
        int batchSize = 100;

        for (int i = 0; i < documents.size(); i += batchSize) {
            int end = Math.min(i + batchSize, documents.size());
            List<String> batch = documents.subList(i, end);

            List<float[]> embeddings = embedDocuments(batch);

            // Process or store embeddings
            System.out.println("Processed batch: " + (i / batchSize + 1));
        }
    }
}

Example 3: Semantic Similarity Comparison

Calculate similarity between texts using cosine similarity:

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import jakarta.inject.Inject;

public class SimilarityService {

    @Inject
    EmbeddingModel embeddingModel;

    /**
     * Calculate cosine similarity between two texts.
     * Returns a value between -1 and 1, where 1 means identical meaning.
     */
    public double calculateSimilarity(String text1, String text2) {
        Embedding embedding1 = embeddingModel.embed(text1).content();
        Embedding embedding2 = embeddingModel.embed(text2).content();

        // Use LangChain4j's built-in cosine similarity
        return CosineSimilarity.between(embedding1, embedding2);
    }

    /**
     * Find the most similar text from a list of candidates.
     */
    public String findMostSimilar(String query, List<String> candidates) {
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        double maxSimilarity = -1.0;
        String mostSimilar = null;

        for (String candidate : candidates) {
            Embedding candidateEmbedding = embeddingModel.embed(candidate).content();
            double similarity = CosineSimilarity.between(queryEmbedding, candidateEmbedding);

            if (similarity > maxSimilarity) {
                maxSimilarity = similarity;
                mostSimilar = candidate;
            }
        }

        return mostSimilar;
    }

    /**
     * Detect duplicate or near-duplicate content.
     */
    public boolean isDuplicate(String text1, String text2, double threshold) {
        double similarity = calculateSimilarity(text1, text2);
        return similarity >= threshold; // e.g., threshold = 0.95 for near-duplicates
    }

    public record SimilarityResult(String text, double score) {}

    /**
     * Find top-k most similar texts from candidates.
     */
    public List<SimilarityResult> findTopSimilar(String query, List<String> candidates, int topK) {
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        return candidates.stream()
            .map(candidate -> {
                Embedding candidateEmbedding = embeddingModel.embed(candidate).content();
                double similarity = CosineSimilarity.between(queryEmbedding, candidateEmbedding);
                return new SimilarityResult(candidate, similarity);
            })
            .sorted((a, b) -> Double.compare(b.score(), a.score())) // Descending order
            .limit(topK)
            .collect(Collectors.toList());
    }
}

Example 4: RAG (Retrieval-Augmented Generation) Pattern

Implement a complete RAG system with embedding-based retrieval:

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import jakarta.inject.Inject;
import jakarta.enterprise.context.ApplicationScoped;
import java.util.List;
import java.util.stream.Collectors;

@ApplicationScoped
public class RAGService {

    @Inject
    EmbeddingModel embeddingModel;

    @Inject
    ChatModel chatModel;

    private final EmbeddingStore<TextSegment> embeddingStore;

    public RAGService() {
        // In production, use a persistent store like PgVector, Pinecone, etc.
        this.embeddingStore = new InMemoryEmbeddingStore<>();
    }

    /**
     * Index documents into the embedding store.
     */
    public void indexDocuments(List<String> documents) {
        for (String document : documents) {
            TextSegment segment = TextSegment.from(document);
            Embedding embedding = embeddingModel.embed(segment).content();
            embeddingStore.add(embedding, segment);
        }
    }

    /**
     * Retrieve relevant documents for a query.
     */
    public List<String> retrieveRelevantDocuments(String query, int maxResults) {
        // Embed the query
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        // Search for similar embeddings
        EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(maxResults)
            .minScore(0.7) // Only return results with similarity >= 0.7
            .build();

        EmbeddingSearchResult<TextSegment> searchResult =
            embeddingStore.search(searchRequest);

        // Extract text from matches
        return searchResult.matches().stream()
            .map(match -> match.embedded().text())
            .collect(Collectors.toList());
    }

    /**
     * Answer a question using RAG pattern.
     */
    public String answerQuestion(String question) {
        // 1. Retrieve relevant context
        List<String> relevantDocs = retrieveRelevantDocuments(question, 5);

        // 2. Build prompt with context
        String context = String.join("\n\n", relevantDocs);
        String prompt = String.format("""
            Answer the following question based on the provided context.
            If the answer cannot be found in the context, say so.

            Context:
            %s

            Question: %s

            Answer:
            """, context, question);

        // 3. Generate answer with chat model
        return chatModel.generate(prompt);
    }

    /**
     * Answer question with source attribution.
     */
    public record AnswerWithSources(String answer, List<String> sources) {}

    public AnswerWithSources answerWithSources(String question) {
        // Retrieve with full match information
        Embedding queryEmbedding = embeddingModel.embed(question).content();

        EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(5)
            .minScore(0.7)
            .build();

        EmbeddingSearchResult<TextSegment> searchResult =
            embeddingStore.search(searchRequest);

        List<String> sources = searchResult.matches().stream()
            .map(match -> match.embedded().text())
            .collect(Collectors.toList());

        // Generate answer with context
        String context = String.join("\n\n", sources);
        String prompt = String.format("""
            Answer the question based on the context. Cite sources when possible.

            Context:
            %s

            Question: %s
            """, context, question);

        String answer = chatModel.generate(prompt);

        return new AnswerWithSources(answer, sources);
    }
}

Example 5: Named Configurations for Different Embedding Models

Using multiple embedding models for different purposes:

# application.properties

# Default configuration - for general embedding
quarkus.langchain4j.openai.api-key=sk-default-key
quarkus.langchain4j.openai.embedding-model.model-name=text-embedding-3-small

# High-quality configuration - for critical semantic search
quarkus.langchain4j.openai.high-quality.api-key=sk-premium-key
quarkus.langchain4j.openai.high-quality.embedding-model.model-name=text-embedding-3-large
quarkus.langchain4j.openai.high-quality.embedding-model.log-requests=true

# Legacy configuration - for compatibility with existing embeddings
quarkus.langchain4j.openai.legacy.api-key=sk-legacy-key
quarkus.langchain4j.openai.legacy.embedding-model.model-name=text-embedding-ada-002
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
import dev.langchain4j.model.embedding.EmbeddingModel;

public class MultiModelEmbeddingService {

    private final EmbeddingModel defaultModel;
    private final EmbeddingModel highQualityModel;
    private final EmbeddingModel legacyModel;

    public MultiModelEmbeddingService() {
        // Each model uses its named configuration
        this.defaultModel = OpenAiEmbeddingModel.builder()
            .configName("default")  // Uses default config
            .build();

        this.highQualityModel = OpenAiEmbeddingModel.builder()
            .configName("high-quality")  // Uses high-quality config
            .build();

        this.legacyModel = OpenAiEmbeddingModel.builder()
            .configName("legacy")  // Uses legacy config
            .build();
    }

    public float[] embedForSearch(String text) {
        // Use high-quality model for search indexing
        return highQualityModel.embed(text).content().vector();
    }

    public float[] embedForClassification(String text) {
        // Use default model for classification (cost-effective)
        return defaultModel.embed(text).content().vector();
    }

    public float[] embedForCompatibility(String text) {
        // Use legacy model for compatibility with existing embeddings
        return legacyModel.embed(text).content().vector();
    }
}

Example 6: Enterprise Configuration with TLS and Proxy

Configuring embedding models for enterprise environments:

# application.properties

# Enterprise configuration with proxy and custom TLS
quarkus.langchain4j.openai.enterprise.api-key=sk-enterprise-key
quarkus.langchain4j.openai.enterprise.base-url=https://api.openai.com/v1/
quarkus.langchain4j.openai.enterprise.tls-configuration-name=company-certs
quarkus.langchain4j.openai.enterprise.proxy-type=HTTP
quarkus.langchain4j.openai.enterprise.proxy-host=proxy.company.com
quarkus.langchain4j.openai.enterprise.proxy-port=8080
quarkus.langchain4j.openai.enterprise.timeout=30s
quarkus.langchain4j.openai.enterprise.embedding-model.model-name=text-embedding-3-small
quarkus.langchain4j.openai.enterprise.embedding-model.user=rag-system
quarkus.langchain4j.openai.enterprise.embedding-model.log-requests=false

# Custom TLS configuration
quarkus.tls.company-certs.trust-store.pem.certs=company-root-ca.pem
quarkus.tls.company-certs.key-store.p12.path=client-cert.p12
quarkus.tls.company-certs.key-store.p12.password=changeit
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import java.net.InetSocketAddress;
import java.net.Proxy;

public class EnterpriseEmbeddingService {

    // Using configuration-based approach (recommended)
    private final EmbeddingModel configuredModel;

    // Using programmatic approach
    private final EmbeddingModel programmaticModel;

    public EnterpriseEmbeddingService() {
        // Configuration-based (recommended for enterprise)
        this.configuredModel = OpenAiEmbeddingModel.builder()
            .configName("enterprise")
            .build();

        // Programmatic configuration
        Proxy proxy = new Proxy(
            Proxy.Type.HTTP,
            new InetSocketAddress("proxy.company.com", 8080)
        );

        this.programmaticModel = OpenAiEmbeddingModel.builder()
            .apiKey("sk-enterprise-key")
            .modelName("text-embedding-3-small")
            .tlsConfigurationName("company-certs")
            .proxy(proxy)
            .user("rag-system")
            .logRequests(false)
            .logResponses(false)
            .build();
    }

    public float[] embedSecurely(String text) {
        // Uses enterprise configuration with TLS and proxy
        return configuredModel.embed(text).content().vector();
    }
}

Example 7: Integration with Embedding Store

Complete example with persistent embedding storage:

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import jakarta.inject.Inject;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Observes;
import io.quarkus.runtime.StartupEvent;

@ApplicationScoped
public class DocumentIndexingService {

    @Inject
    EmbeddingModel embeddingModel;

    @Inject
    EmbeddingStore<TextSegment> embeddingStore;

    /**
     * Index documents with metadata on startup.
     */
    public void indexOnStartup(@Observes StartupEvent event) {
        // Index sample documents with metadata
        indexDocument("Java is a programming language",
            Metadata.from("category", "programming").put("language", "java"));

        indexDocument("Python is great for data science",
            Metadata.from("category", "programming").put("language", "python"));

        indexDocument("Machine learning uses neural networks",
            Metadata.from("category", "ai").put("topic", "ml"));
    }

    /**
     * Index a single document with metadata.
     */
    public void indexDocument(String text, Metadata metadata) {
        TextSegment segment = TextSegment.from(text, metadata);
        Embedding embedding = embeddingModel.embed(segment).content();
        embeddingStore.add(embedding, segment);
    }

    /**
     * Search with metadata filtering.
     */
    public List<String> searchByCategory(String query, String category, int maxResults) {
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(maxResults)
            .minScore(0.6)
            .build();

        EmbeddingSearchResult<TextSegment> searchResult =
            embeddingStore.search(searchRequest);

        // Filter by metadata
        return searchResult.matches().stream()
            .map(EmbeddingMatch::embedded)
            .filter(segment -> category.equals(segment.metadata().getString("category")))
            .map(TextSegment::text)
            .collect(Collectors.toList());
    }

    /**
     * Get embedding statistics.
     */
    public record EmbeddingStats(int totalDocuments, int dimensions, String modelName) {}

    public EmbeddingStats getStats() {
        // Sample embedding to get dimensions
        Embedding sample = embeddingModel.embed("sample").content();

        return new EmbeddingStats(
            embeddingStore.findAll().size(),
            sample.dimension(),
            "text-embedding-3-small" // From config
        );
    }
}

Example 8: Batch Processing with Error Handling

Robust batch embedding with retry logic:

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.model.output.Response;
import jakarta.inject.Inject;
import org.eclipse.microprofile.faulttolerance.Retry;
import org.eclipse.microprofile.faulttolerance.Timeout;
import org.eclipse.microprofile.faulttolerance.CircuitBreaker;
import java.time.temporal.ChronoUnit;
import java.util.List;
import java.util.ArrayList;
import java.util.stream.Collectors;

public class RobustEmbeddingService {

    @Inject
    EmbeddingModel embeddingModel;

    /**
     * Embed with automatic retry on failure.
     */
    @Retry(maxRetries = 3, delay = 1, delayUnit = ChronoUnit.SECONDS)
    @Timeout(value = 30, unit = ChronoUnit.SECONDS)
    public Embedding embedWithRetry(String text) {
        return embeddingModel.embed(text).content();
    }

    /**
     * Batch embed with circuit breaker protection.
     */
    @CircuitBreaker(requestVolumeThreshold = 10, failureRatio = 0.5, delay = 5000)
    @Retry(maxRetries = 2)
    public List<Embedding> batchEmbedSafely(List<String> texts) {
        List<TextSegment> segments = texts.stream()
            .map(TextSegment::from)
            .collect(Collectors.toList());

        Response<List<Embedding>> response = embeddingModel.embedAll(segments);
        return response.content();
    }

    /**
     * Process large dataset with progress tracking.
     */
    public record ProcessingProgress(int processed, int total, int failed, List<String> errors) {}

    public ProcessingProgress processLargeDataset(
        List<String> documents,
        java.util.function.Consumer<ProcessingProgress> progressCallback
    ) {
        int batchSize = 100;
        int processed = 0;
        int failed = 0;
        List<String> errors = new ArrayList<>();

        for (int i = 0; i < documents.size(); i += batchSize) {
            int end = Math.min(i + batchSize, documents.size());
            List<String> batch = documents.subList(i, end);

            try {
                List<Embedding> embeddings = batchEmbedSafely(batch);
                processed += batch.size();

                // Report progress
                ProcessingProgress progress = new ProcessingProgress(
                    processed, documents.size(), failed, errors
                );
                progressCallback.accept(progress);

            } catch (Exception e) {
                failed += batch.size();
                errors.add("Batch " + (i / batchSize) + ": " + e.getMessage());
            }
        }

        return new ProcessingProgress(processed, documents.size(), failed, errors);
    }
}

Best Practices

Model Selection

Choose text-embedding-3-small when:

  • Building general-purpose semantic search
  • Cost is a significant consideration
  • Processing large volumes of text
  • 1536 dimensions provide sufficient semantic resolution

Choose text-embedding-3-large when:

  • Semantic precision is critical (e.g., legal, medical domains)
  • Working with complex, nuanced content
  • Building high-end RAG systems
  • Storage and compute costs are acceptable

Choose text-embedding-ada-002 when:

  • Maintaining compatibility with existing embeddings
  • Migrating from legacy systems
  • Working with older integration code

Batch Processing

Always use batch embedding (embedAll()) instead of individual calls when processing multiple texts:

// Efficient - single API call
List<Embedding> embeddings = embeddingModel.embedAll(segments).content();

// Inefficient - multiple API calls
for (TextSegment segment : segments) {
    Embedding embedding = embeddingModel.embed(segment).content();
}

User Parameter for Abuse Monitoring

Always set the user parameter in production to help OpenAI detect abuse:

// Good practice
EmbeddingModel model = OpenAiEmbeddingModel.builder()
    .apiKey("sk-...")
    .user("user-12345")  // Unique user identifier
    .build();

Embedding Storage

Store embeddings efficiently:

  • Use float32 (not float64) to reduce storage by 50%
  • Consider dimension reduction for text-embedding-3 models
  • Index embeddings with vector databases (PgVector, Pinecone, Weaviate)
  • Normalize vectors for faster cosine similarity computation

Error Handling

Use MicroProfile Fault Tolerance for robust embedding operations:

@Retry(maxRetries = 3)
@Timeout(value = 30, unit = ChronoUnit.SECONDS)
@CircuitBreaker(requestVolumeThreshold = 10)
public Embedding embedSafely(String text) {
    return embeddingModel.embed(text).content();
}

Configuration vs Programmatic

Use configuration-based approach when:

  • Deploying to multiple environments (dev, staging, prod)
  • Managing multiple embedding models
  • Requiring runtime configuration changes
  • Following enterprise configuration management practices

Use programmatic approach when:

  • Building dynamic embedding model factories
  • Implementing tenant-specific models
  • Testing with different configurations
  • Creating embedding model pools

Similarity Thresholds

Choose appropriate similarity thresholds based on use case:

  • 0.95-1.0: Near-duplicates, exact matches
  • 0.85-0.95: Highly related content, paraphrases
  • 0.75-0.85: Semantically similar, same topic
  • 0.60-0.75: Related topics, loose semantic connection
  • Below 0.60: Potentially unrelated content

RAG Best Practices

For effective RAG systems:

  1. Chunk size: Keep text segments between 100-500 words
  2. Overlap: Use 10-20% overlap between chunks for context continuity
  3. Metadata: Store source, timestamp, and category with embeddings
  4. Reranking: Consider reranking top results before LLM generation
  5. Hybrid search: Combine embedding search with keyword search for best results

Token Limits

OpenAI embedding models have token limits:

  • Maximum input tokens per request: 8,191 tokens
  • For longer texts, split into chunks and embed separately
  • Average English word ≈ 1.3 tokens

Security Considerations

Protect sensitive data:

  • Never log full request/response payloads in production
  • Use environment variables or vault for API keys
  • Hash user identifiers before passing to user parameter
  • Consider encrypting embeddings at rest for sensitive content
  • Rotate API keys regularly

Related APIs

  • Chat Models: See chat-models.md for text generation capabilities
  • Configuration: See configuration.md for complete configuration reference
  • Embedding Stores: For persistent storage of embeddings (PgVector, Pinecone, etc.)
  • Document Loaders: For ingesting documents into RAG systems
  • AI Services: For declarative AI service definitions with @RegisterAiService

See Also

  • OpenAI Embeddings Guide
  • OpenAI Embeddings API Reference
  • LangChain4j Documentation
  • Quarkus LangChain4j Guide
  • RAG Architecture Patterns

Install with Tessl CLI

npx tessl i tessl/maven-io-quarkiverse-langchain4j--quarkus-langchain4j-openai@1.7.0

docs

chat-models.md

configuration.md

cost-estimation.md

dev-ui-services.md

embedding-models.md

image-models.md

index.md

moderation-models.md

tile.json