tessl/maven-com-embabel-agent--embabel-agent-openai

OpenAI compatible model factory for the Embabel Agent Framework

Overview

Eval results

Files

Configuration

Name: tessl/maven-com-embabel-agent--embabel-agent-openai
Author: tessl

Comprehensive guide to configuring the OpenAI-compatible model factory.

Authentication

With API Key (Authenticated)

For OpenAI and most cloud providers:

val factory = OpenAiCompatibleModelFactory(
    baseUrl = null,
    apiKey = "sk-...",  // Your API key
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry
)

Without API Key (Unauthenticated)

For local servers that don't require authentication:

val factory = OpenAiCompatibleModelFactory(
    baseUrl = "http://localhost:8000",
    apiKey = null,  // No authentication
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry
)

Reading from Environment Variables

val factory = OpenAiCompatibleModelFactory(
    baseUrl = System.getenv("OPENAI_BASE_URL"),  // Can be null
    apiKey = System.getenv("OPENAI_API_KEY"),    // Read from environment
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry
)

Reading from Spring Configuration

@Configuration
class LlmConfiguration {

    @Bean
    fun openAiModelFactory(
        @Value("\${openai.api.key}") apiKey: String,
        @Value("\${openai.base.url:#{null}}") baseUrl: String?,
        observationRegistry: ObservationRegistry
    ): OpenAiCompatibleModelFactory {
        return OpenAiCompatibleModelFactory(
            baseUrl = baseUrl,
            apiKey = apiKey,
            completionsPath = null,
            embeddingsPath = null,
            observationRegistry = observationRegistry
        )
    }
}

Custom Endpoints

Base URL

The baseUrl parameter sets the API base URL:

// OpenAI (default)
val openAi = OpenAiCompatibleModelFactory(
    baseUrl = null,  // Uses https://api.openai.com
    ...
)

// Azure OpenAI
val azure = OpenAiCompatibleModelFactory(
    baseUrl = "https://your-resource.openai.azure.com",
    ...
)

// Local LLM server
val local = OpenAiCompatibleModelFactory(
    baseUrl = "http://localhost:11434",
    ...
)

// Custom cloud provider
val custom = OpenAiCompatibleModelFactory(
    baseUrl = "https://api.custom-provider.com",
    ...
)

Custom Completions Path

Override the default chat completions endpoint path:

val factory = OpenAiCompatibleModelFactory(
    baseUrl = "https://custom-llm-api.com",
    apiKey = "your-api-key",
    completionsPath = "/api/v1/chat/completions",  // Custom path
    embeddingsPath = null,
    observationRegistry = observationRegistry
)

Default paths:

OpenAI: /v1/chat/completions
If your provider uses a different path, specify it explicitly

Custom Embeddings Path

Override the default embeddings endpoint path:

val factory = OpenAiCompatibleModelFactory(
    baseUrl = "https://custom-llm-api.com",
    apiKey = "your-api-key",
    completionsPath = null,
    embeddingsPath = "/api/v1/embeddings",  // Custom path
    observationRegistry = observationRegistry
)

Default paths:

OpenAI: /v1/embeddings
If your provider uses a different path, specify it explicitly

Full Custom Configuration

val factory = OpenAiCompatibleModelFactory(
    baseUrl = "https://custom-provider.com",
    apiKey = "custom-api-key",
    completionsPath = "/custom/chat/endpoint",
    embeddingsPath = "/custom/embeddings/endpoint",
    observationRegistry = observationRegistry
)

Timeouts

Default Timeouts

The factory uses these default timeouts:

Connect timeout: 5 seconds (5000ms)
Read timeout: 10 minutes (600000ms)

Custom Timeouts

Provide a custom ClientHttpRequestFactory to override timeouts:

import org.springframework.http.client.SimpleClientHttpRequestFactory
import org.springframework.beans.factory.ObjectProvider

val customRequestFactory = SimpleClientHttpRequestFactory().apply {
    setConnectTimeout(10000)   // 10 seconds connect timeout
    setReadTimeout(120000)     // 2 minutes read timeout
}

val factory = OpenAiCompatibleModelFactory(
    baseUrl = null,
    apiKey = "your-api-key",
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry,
    requestFactory = ObjectProvider.of(customRequestFactory)
)

Timeout Recommendations

Connect timeout:

Fast networks: 5 seconds (default)
Slow networks: 10-15 seconds
Very slow networks: 30 seconds

Read timeout:

Short responses: 1-2 minutes
Long responses: 10 minutes (default)
Very long responses or complex reasoning: 15-20 minutes

Example - Short timeout for fast responses:

val fastFactory = SimpleClientHttpRequestFactory().apply {
    setConnectTimeout(5000)    // 5 seconds
    setReadTimeout(60000)      // 1 minute - expect fast responses
}

val factory = OpenAiCompatibleModelFactory(
    baseUrl = null,
    apiKey = "your-api-key",
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry,
    requestFactory = ObjectProvider.of(fastFactory)
)

Observability

The factory integrates with Micrometer for observability.

Basic Observability

import io.micrometer.observation.ObservationRegistry

// Create observation registry
val observationRegistry = ObservationRegistry.create()

// Pass to factory
val factory = OpenAiCompatibleModelFactory(
    baseUrl = null,
    apiKey = "your-api-key",
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry  // Enables observability
)

What Gets Instrumented

The factory automatically instruments:

REST client calls (HTTP requests/responses)
WebClient calls (for streaming)
Tool calling operations
Token usage and costs

With Metrics Registry

import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.observation.ObservationRegistry

val observationRegistry = ObservationRegistry.create().apply {
    observationConfig()
        .observationHandler(DefaultMeterObservationHandler(meterRegistry))
}

val factory = OpenAiCompatibleModelFactory(
    baseUrl = null,
    apiKey = "your-api-key",
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry
)

With Tracing (Distributed Tracing)

import io.micrometer.observation.ObservationRegistry
import io.micrometer.tracing.Tracer

val observationRegistry = ObservationRegistry.create().apply {
    observationConfig()
        .observationHandler(DefaultTracingObservationHandler(tracer))
}

val factory = OpenAiCompatibleModelFactory(
    baseUrl = null,
    apiKey = "your-api-key",
    completionsPath = null,
    embeddingsPath = null,
    observationRegistry = observationRegistry
)

Spring Boot Auto-Configuration

In Spring Boot, the ObservationRegistry is auto-configured:

@Configuration
class LlmConfiguration(
    private val observationRegistry: ObservationRegistry  // Auto-injected
) {

    @Bean
    fun openAiModelFactory(
        @Value("\${openai.api.key}") apiKey: String
    ): OpenAiCompatibleModelFactory {
        return OpenAiCompatibleModelFactory(
            baseUrl = null,
            apiKey = apiKey,
            completionsPath = null,
            embeddingsPath = null,
            observationRegistry = observationRegistry  // Uses Spring Boot's registry
        )
    }
}

Retry Configuration

Configure retry behavior for handling transient failures.

Default Retry Behavior

By default, the factory uses Spring AI's default retry template with reasonable retry policies.

Custom Retry Configuration

import org.springframework.retry.support.RetryTemplate
import org.springframework.retry.backoff.ExponentialBackOffPolicy
import org.springframework.retry.policy.SimpleRetryPolicy

val retryTemplate = RetryTemplate().apply {
    setBackOffPolicy(ExponentialBackOffPolicy().apply {
        initialInterval = 1000      // Start with 1 second delay
        multiplier = 2.0             // Double the delay each retry
        maxInterval = 10000          // Cap at 10 seconds
    })
    setRetryPolicy(SimpleRetryPolicy(3))  // Retry up to 3 times
}

val llmService = factory.openAiCompatibleLlm(
    model = "gpt-4",
    pricingModel = PricingModel.usdPer1MTokens(30.0, 60.0),
    provider = "OpenAI",
    knowledgeCutoffDate = LocalDate.of(2023, 4, 1),
    retryTemplate = retryTemplate  // Custom retry configuration
)

Retry Only Specific Exceptions

import org.springframework.retry.policy.ExceptionClassifierRetryPolicy
import org.springframework.retry.policy.NeverRetryPolicy
import org.springframework.retry.policy.SimpleRetryPolicy
import org.springframework.web.client.HttpServerErrorException
import org.springframework.web.client.ResourceAccessException

val retryPolicy = ExceptionClassifierRetryPolicy().apply {
    setExceptionClassifier { throwable ->
        when (throwable) {
            is HttpServerErrorException -> SimpleRetryPolicy(3)  // Retry 5xx errors
            is ResourceAccessException -> SimpleRetryPolicy(3)   // Retry connection errors
            else -> NeverRetryPolicy()                            // Don't retry others
        }
    }
}

val retryTemplate = RetryTemplate().apply {
    setRetryPolicy(retryPolicy)
    setBackOffPolicy(ExponentialBackOffPolicy().apply {
        initialInterval = 1000
        multiplier = 2.0
        maxInterval = 10000
    })
}

val llmService = factory.openAiCompatibleLlm(
    model = "gpt-4",
    pricingModel = PricingModel.usdPer1MTokens(30.0, 60.0),
    provider = "OpenAI",
    knowledgeCutoffDate = LocalDate.of(2023, 4, 1),
    retryTemplate = retryTemplate
)

No Retries

import org.springframework.retry.support.RetryTemplate
import org.springframework.retry.policy.NeverRetryPolicy

val noRetryTemplate = RetryTemplate().apply {
    setRetryPolicy(NeverRetryPolicy())
}

val llmService = factory.openAiCompatibleLlm(
    model = "gpt-4",
    pricingModel = PricingModel.usdPer1MTokens(30.0, 60.0),
    provider = "OpenAI",
    knowledgeCutoffDate = LocalDate.of(2023, 4, 1),
    retryTemplate = noRetryTemplate  // Disable retries
)

Aggressive Retries for Unreliable Networks

val aggressiveRetryTemplate = RetryTemplate().apply {
    setBackOffPolicy(ExponentialBackOffPolicy().apply {
        initialInterval = 500       // Start with 500ms
        multiplier = 1.5            // Slower exponential growth
        maxInterval = 30000         // Cap at 30 seconds
    })
    setRetryPolicy(SimpleRetryPolicy(5))  // Retry up to 5 times
}

val llmService = factory.openAiCompatibleLlm(
    model = "gpt-4",
    pricingModel = PricingModel.usdPer1MTokens(30.0, 60.0),
    provider = "OpenAI",
    knowledgeCutoffDate = LocalDate.of(2023, 4, 1),
    retryTemplate = aggressiveRetryTemplate
)

Pricing Models

Configure pricing for cost tracking.

Per-Token Pricing

For cloud providers that charge per token:

val pricingModel = PricingModel.usdPer1MTokens(
    usdPer1mInputTokens = 30.0,   // $30 per 1 million input tokens
    usdPer1mOutputTokens = 60.0   // $60 per 1 million output tokens
)

val service = factory.openAiCompatibleLlm(
    model = "gpt-4",
    pricingModel = pricingModel,
    provider = "OpenAI",
    knowledgeCutoffDate = LocalDate.of(2023, 4, 1)
)

Common OpenAI prices (as of 2024):

GPT-3.5 Turbo: PricingModel.usdPer1MTokens(0.5, 1.5)
GPT-4: PricingModel.usdPer1MTokens(30.0, 60.0)
GPT-4 Turbo: PricingModel.usdPer1MTokens(10.0, 30.0)
GPT-5 Turbo: PricingModel.usdPer1MTokens(10.0, 30.0)

All-You-Can-Eat Pricing

For free models or fixed-cost scenarios:

val service = factory.openAiCompatibleLlm(
    model = "llama-3-70b",
    pricingModel = PricingModel.ALL_YOU_CAN_EAT,  // No per-token tracking
    provider = "Ollama",
    knowledgeCutoffDate = null
)

Use ALL_YOU_CAN_EAT for:

Local models (Ollama, LM Studio, etc.)
Self-hosted models
Fixed-price subscriptions
Free tiers

Complete Configuration Example

Putting it all together:

import com.embabel.agent.openai.OpenAiCompatibleModelFactory
import com.embabel.agent.openai.StandardOpenAiOptionsConverter
import com.embabel.common.ai.model.PricingModel
import io.micrometer.observation.ObservationRegistry
import org.springframework.http.client.SimpleClientHttpRequestFactory
import org.springframework.beans.factory.ObjectProvider
import org.springframework.retry.support.RetryTemplate
import org.springframework.retry.backoff.ExponentialBackOffPolicy
import org.springframework.retry.policy.SimpleRetryPolicy
import java.time.LocalDate

// Custom HTTP client
val requestFactory = SimpleClientHttpRequestFactory().apply {
    setConnectTimeout(10000)   // 10 seconds
    setReadTimeout(300000)     // 5 minutes
}

// Custom retry policy
val retryTemplate = RetryTemplate().apply {
    setBackOffPolicy(ExponentialBackOffPolicy().apply {
        initialInterval = 1000
        multiplier = 2.0
        maxInterval = 10000
    })
    setRetryPolicy(SimpleRetryPolicy(3))
}

// Create factory with custom configuration
val factory = OpenAiCompatibleModelFactory(
    baseUrl = "https://api.openai.com",
    apiKey = System.getenv("OPENAI_API_KEY"),
    completionsPath = "/v1/chat/completions",
    embeddingsPath = "/v1/embeddings",
    observationRegistry = ObservationRegistry.create(),
    requestFactory = ObjectProvider.of(requestFactory)
)

// Create service with custom options
val service = factory.openAiCompatibleLlm(
    model = "gpt-4-turbo",
    pricingModel = PricingModel.usdPer1MTokens(10.0, 30.0),
    provider = "OpenAI",
    knowledgeCutoffDate = LocalDate.of(2023, 12, 1),
    optionsConverter = StandardOpenAiOptionsConverter,
    retryTemplate = retryTemplate
)

Install with Tessl CLI

npx tessl i tessl/maven-com-embabel-agent--embabel-agent-openai@0.3.0

docs

options-converters.md

quickstart.md

spring-integration.md

use-cases.md

tile.json