Spring AI utility library providing retry mechanisms for AI API interactions with comprehensive error handling and exception classification
This document covers common pitfalls and edge cases when using Spring AI Retry.
Problem: Retrying operations that aren't safe to repeat (e.g., charging a credit card, creating resources).
Bad Example:
// BAD: May charge multiple times!
retryTemplate.execute(context -> {
return paymentService.charge(amount);
});Solution: Only retry idempotent operations, or implement idempotency keys.
// GOOD: Use idempotency key
retryTemplate.execute(context -> {
String idempotencyKey = generateKey(request);
return paymentService.charge(amount, idempotencyKey);
});
// GOOD: Check if already processed
retryTemplate.execute(context -> {
if (paymentService.isAlreadyProcessed(transactionId)) {
return paymentService.getResult(transactionId);
}
return paymentService.charge(amount, transactionId);
});Problem: Catching all exceptions and retrying, including non-transient ones.
Bad Example:
// BAD: Retrying everything manually defeats the purpose
try {
result = retryTemplate.execute(context -> aiClient.call());
} catch (Exception e) {
// Retrying again manually - wrong!
result = aiClient.call();
}Solution: Let retry template handle retries, distinguish exception types.
// GOOD: Let retry template handle it
try {
result = retryTemplate.execute(context -> aiClient.call());
} catch (NonTransientAiException e) {
// Handle permanent failures appropriately
logger.error("Invalid request: {}", e.getMessage());
throw new BadRequestException("Invalid request", e);
} catch (TransientAiException e) {
// All retries exhausted
logger.error("Service unavailable after retries");
throw new ServiceUnavailableException("Service down", e);
}Problem: Using blocking retry in reactive applications.
Bad Example:
// BAD: Blocks reactive thread!
Mono<String> result = Mono.fromCallable(() -> {
return RetryUtils.DEFAULT_RETRY_TEMPLATE.execute(context -> {
return aiClient.generate(prompt); // Blocking call!
});
});Solution: Use reactive retry mechanisms.
// GOOD: Use reactive retry
Mono<String> result = webClient
.post()
.uri("/chat")
.bodyValue(request)
.retrieve()
.bodyToMono(String.class)
.retryWhen(Retry.backoff(10, Duration.ofSeconds(2))
.maxBackoff(Duration.ofMinutes(3))
.filter(e -> e instanceof TransientAiException));Problem: Classifying errors incorrectly (e.g., treating auth errors as transient).
Bad Example:
// BAD: Auth error will never succeed on retry!
if (response.getStatusCode() == 401) {
throw new TransientAiException("Auth failed");
}Solution: Carefully classify errors based on whether they'll succeed on retry.
// GOOD: Correct classification
if (response.getStatusCode() == 401) {
throw new NonTransientAiException(
"Invalid API key - check configuration");
}
if (response.getStatusCode() == 503) {
throw new TransientAiException(
"Service temporarily unavailable");
}Problem: Exponential backoff grows indefinitely without a cap.
Bad Configuration:
# BAD: No max interval (can grow to hours!)
spring.ai.retry.backoff:
initial-interval: 1s
multiplier: 10
# max-interval not set - defaults to 3 minutes, but could be misconfiguredSolution: Always set a reasonable max-interval.
# GOOD: Reasonable cap
spring.ai.retry.backoff:
initial-interval: 1s
multiplier: 10
max-interval: 60s # Cap at 1 minuteProblem: Need to respect the Retry-After header from rate limit responses.
Solution: Parse and use Retry-After header.
public class RateLimitAwareRetryListener implements RetryListener {
@Override
public <T, E extends Throwable> void onError(
RetryContext context,
RetryCallback<T, E> callback,
Throwable throwable) {
if (throwable instanceof TransientAiException) {
String message = throwable.getMessage();
// Parse Retry-After from error message
if (message.contains("Retry-After:")) {
int retryAfterSeconds = parseRetryAfter(message);
if (retryAfterSeconds > 0) {
try {
Thread.sleep(retryAfterSeconds * 1000L);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
}
}
private int parseRetryAfter(String message) {
// Parse "Retry-After: 30" from message
Pattern pattern = Pattern.compile("Retry-After:\\s*(\\d+)");
Matcher matcher = pattern.matcher(message);
if (matcher.find()) {
return Integer.parseInt(matcher.group(1));
}
return 0;
}
}Problem: Circuit breaker is open, should not attempt requests.
Solution: Check circuit breaker state before attempting.
public String generate(String prompt) {
return retryTemplate.execute(context -> {
// Check circuit breaker state
if (circuitBreaker.getState() == CircuitBreaker.State.OPEN) {
// Don't waste time retrying - fail immediately
throw new TransientAiException(
"Circuit breaker open - service recovering");
}
// Only attempt if circuit is closed or half-open
return aiClient.generate(prompt);
});
}Problem: Both may return 429, but one is permanent (quota) and one is temporary (rate limit).
Solution: Parse response body to distinguish.
public String generate(String prompt) {
return retryTemplate.execute(context -> {
try {
return aiClient.generate(prompt);
} catch (HttpStatusCodeException e) {
if (e.getStatusCode().value() == 429) {
String body = e.getResponseBodyAsString();
// Check if it's quota exceeded (non-transient)
if (body.contains("quota_exceeded") ||
body.contains("monthly_limit")) {
throw new NonTransientAiException(
"API quota exceeded - upgrade plan", e);
}
// Otherwise it's a temporary rate limit (transient)
throw new TransientAiException(
"Rate limit exceeded - will retry", e);
}
throw e;
}
});
}Problem: Service returns 503 during scheduled maintenance, which may last hours.
Solution: Detect maintenance and fail fast.
public String generate(String prompt) {
return retryTemplate.execute(context -> {
try {
return aiClient.generate(prompt);
} catch (HttpStatusCodeException e) {
if (e.getStatusCode().value() == 503) {
String body = e.getResponseBodyAsString();
// Check if it's scheduled maintenance
if (body.contains("maintenance") ||
body.contains("scheduled_downtime")) {
throw new NonTransientAiException(
"Service in maintenance - check status page", e);
}
// Otherwise it's a temporary outage
throw new TransientAiException(
"Service temporarily unavailable", e);
}
throw e;
}
});
}Problem: Response body can only be read once, causing errors in error handler.
Solution: Use buffering or don't read body multiple times.
public class BufferedResponseErrorHandler implements ResponseErrorHandler {
@Override
public void handleError(ClientHttpResponse response) throws IOException {
// Buffer response so it can be read multiple times
byte[] body = response.getBody().readAllBytes();
String bodyString = new String(body, StandardCharsets.UTF_8);
int statusCode = response.getStatusCode().value();
if (statusCode >= 400 && statusCode < 500) {
throw new NonTransientAiException(
"Client error (" + statusCode + "): " + bodyString);
} else {
throw new TransientAiException(
"Server error (" + statusCode + "): " + bodyString);
}
}
}Problem: Both are network errors but may have different retry strategies.
Solution: Classify network errors more specifically.
public String generate(String prompt) {
return retryTemplate.execute(context -> {
try {
return aiClient.generate(prompt);
} catch (SocketTimeoutException e) {
// Timeout - service is slow but reachable
throw new TransientAiException("Request timeout", e);
} catch (ConnectException e) {
// Connection refused - service is down
// Check attempt count
if (context.getRetryCount() > 5) {
// After 5 attempts, likely not coming back soon
throw new NonTransientAiException(
"Service unreachable - check status", e);
}
throw new TransientAiException("Connection refused", e);
}
});
}Problem: Service has concurrent request limit, not a rate limit.
Solution: Use semaphore or queue before retry.
public class ThrottledAiService {
private final Semaphore concurrencyLimiter;
private final RetryTemplate retryTemplate;
public ThrottledAiService(int maxConcurrentRequests) {
this.concurrencyLimiter = new Semaphore(maxConcurrentRequests);
this.retryTemplate = RetryUtils.DEFAULT_RETRY_TEMPLATE;
}
public String generate(String prompt) {
try {
// Acquire permit before attempting
concurrencyLimiter.acquire();
try {
return retryTemplate.execute(context ->
aiClient.generate(prompt)
);
} finally {
concurrencyLimiter.release();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException("Interrupted", e);
}
}
}Problem: Batch operation partially succeeds, don't want to retry successful items.
Solution: Track successful items and only retry failures.
public List<String> generateBatch(List<String> prompts) {
Map<String, String> results = new ConcurrentHashMap<>();
Set<String> remaining = ConcurrentHashMap.newKeySet();
remaining.addAll(prompts);
int attempt = 0;
while (!remaining.isEmpty() && attempt < 5) {
attempt++;
Set<String> currentBatch = new HashSet<>(remaining);
for (String prompt : currentBatch) {
try {
String result = aiClient.generate(prompt);
results.put(prompt, result);
remaining.remove(prompt);
} catch (TransientAiException e) {
logger.warn("Attempt {} failed for prompt: {}", attempt, prompt);
// Will retry in next iteration
} catch (NonTransientAiException e) {
logger.error("Prompt failed validation: {}", prompt);
remaining.remove(prompt); // Don't retry
}
}
if (!remaining.isEmpty()) {
try {
Thread.sleep(2000 * attempt); // Backoff
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}
return prompts.stream()
.map(results::get)
.collect(Collectors.toList());
}Problem: OAuth token expires while request is in flight.
Solution: Refresh token and retry.
public String generate(String prompt) {
return retryTemplate.execute(context -> {
try {
return aiClient.generate(prompt);
} catch (HttpStatusCodeException e) {
if (e.getStatusCode().value() == 401) {
String body = e.getResponseBodyAsString();
// Check if it's token expiration (transient)
if (body.contains("token_expired") ||
body.contains("expired_token")) {
// Refresh token
logger.info("Token expired, refreshing");
tokenService.refresh();
// Retry with new token
throw new TransientAiException(
"Token expired - refreshed", e);
}
// Otherwise it's invalid credentials (non-transient)
throw new NonTransientAiException(
"Invalid credentials", e);
}
throw e;
}
});
}Problem: Want different retry strategies for different error types.
Solution: Use multiple retry templates or custom retry policy.
public class MultiStrategyService {
private final RetryTemplate fastRetry;
private final RetryTemplate slowRetry;
public MultiStrategyService() {
// Fast retry for rate limits (fixed delay)
this.fastRetry = RetryTemplate.builder()
.maxAttempts(10)
.fixedBackoff(10000) // 10 seconds
.retryOn(RateLimitException.class)
.build();
// Slow retry for service outages (exponential)
this.slowRetry = RetryTemplate.builder()
.maxAttempts(5)
.exponentialBackoff(
Duration.ofSeconds(5),
3.0,
Duration.ofMinutes(5)
)
.retryOn(ServiceUnavailableException.class)
.build();
}
public String generate(String prompt) {
try {
return slowRetry.execute(context -> {
try {
return fastRetry.execute(innerContext -> {
return aiClient.generate(prompt);
});
} catch (RateLimitException e) {
// Fast retry exhausted
throw new TransientAiException("Rate limited", e);
}
});
} catch (ServiceUnavailableException e) {
throw new TransientAiException("Service down", e);
}
}
}@Test
void testNonTransientNotRetried() {
AtomicInteger attempts = new AtomicInteger(0);
assertThrows(NonTransientAiException.class, () -> {
RetryUtils.SHORT_RETRY_TEMPLATE.execute(context -> {
attempts.incrementAndGet();
throw new NonTransientAiException("Invalid API key");
});
});
// Should only attempt once
assertEquals(1, attempts.get());
}@Test
void testRetryExhaustion() {
AtomicInteger attempts = new AtomicInteger(0);
assertThrows(TransientAiException.class, () -> {
RetryUtils.SHORT_RETRY_TEMPLATE.execute(context -> {
attempts.incrementAndGet();
throw new TransientAiException("Always fails");
});
});
// Should attempt all 10 times
assertEquals(10, attempts.get());
}@Test
void testSuccessAfterRetries() {
AtomicInteger attempts = new AtomicInteger(0);
String result = RetryUtils.SHORT_RETRY_TEMPLATE.execute(context -> {
int attempt = attempts.incrementAndGet();
if (attempt < 3) {
throw new TransientAiException("Temporary failure");
}
return "Success";
});
assertEquals("Success", result);
assertEquals(3, attempts.get());
}Install with Tessl CLI
npx tessl i tessl/maven-org-springframework-ai--spring-ai-retry@1.1.0