Spring AI Chat Client provides a fluent API for building AI-powered applications with LLMs, supporting advisors, streaming, structured outputs, and conversation memory
The Spring AI Chat Client provides flexible response handling for both synchronous (blocking) and streaming (reactive) execution patterns. Responses can be accessed as raw text, converted to typed entities, or accessed as the full ChatResponse object.
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.ChatClientResponse;
import org.springframework.ai.chat.client.ResponseEntity;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.converter.StructuredOutputConverter;
import org.springframework.core.ParameterizedTypeReference;
import org.springframework.lang.Nullable;
import reactor.core.publisher.Flux;Interface for processing synchronous (blocking) responses.
interface CallResponseSpec {
@Nullable
String content();
@Nullable
ChatResponse chatResponse();
ChatClientResponse chatClientResponse();
@Nullable
<T> T entity(Class<T> type);
@Nullable
<T> T entity(ParameterizedTypeReference<T> type);
@Nullable
<T> T entity(StructuredOutputConverter<T> structuredOutputConverter);
<T> ResponseEntity<ChatResponse, T> responseEntity(Class<T> type);
<T> ResponseEntity<ChatResponse, T> responseEntity(
ParameterizedTypeReference<T> type
);
<T> ResponseEntity<ChatResponse, T> responseEntity(
StructuredOutputConverter<T> structuredOutputConverter
);
}Interface for processing streaming (reactive) responses.
interface StreamResponseSpec {
Flux<String> content();
Flux<ChatResponse> chatResponse();
Flux<ChatClientResponse> chatClientResponse();
}The simplest way to get the response is as a String.
String content();Example:
String answer = chatClient
.prompt("What is Spring Framework?")
.call()
.content();
System.out.println(answer);Access the full ChatResponse object containing metadata, model information, and usage statistics.
ChatResponse chatResponse();Example:
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.model.Generation;
ChatResponse response = chatClient
.prompt("Explain Java")
.call()
.chatResponse();
// Access response details
Generation generation = response.getResult();
String content = generation.getOutput().getContent();
Map<String, Object> metadata = generation.getMetadata();
// Access usage information
var usage = response.getMetadata().getUsage();
Long promptTokens = usage.getPromptTokens();
Long generationTokens = usage.getGenerationTokens();
Long totalTokens = usage.getTotalTokens();Convert the response to a typed Java object. The AI model's output is parsed as JSON and mapped to the specified class.
With Class:
<T> T entity(Class<T> entityClass);Example:
record Summary(String title, String content, List<String> tags) {}
Summary summary = chatClient
.prompt("Summarize this article: " + article)
.call()
.entity(Summary.class);
System.out.println("Title: " + summary.title());
System.out.println("Tags: " + summary.tags());With Generic Types:
<T> T entity(ParameterizedTypeReference<T> entityTypeRef);Use ParameterizedTypeReference for generic types like List<T> or Map<K, V>.
Example:
import org.springframework.core.ParameterizedTypeReference;
List<String> items = chatClient
.prompt("List 5 programming languages")
.call()
.entity(new ParameterizedTypeReference<List<String>>() {});
items.forEach(System.out::println);Complex Example:
record Task(String name, String priority) {}
Map<String, List<Task>> tasksByProject = chatClient
.prompt("Organize these tasks by project: " + tasks)
.call()
.entity(new ParameterizedTypeReference<Map<String, List<Task>>>() {});Get both the raw ChatResponse and the converted entity together.
<T> ResponseEntity<ChatResponse, T> responseEntity(Class<T> type);
<T> ResponseEntity<ChatResponse, T> responseEntity(
ParameterizedTypeReference<T> type
);
<T> ResponseEntity<ChatResponse, T> responseEntity(
StructuredOutputConverter<T> structuredOutputConverter
);Example:
import org.springframework.ai.chat.client.ResponseEntity;
record Answer(String text) {}
ResponseEntity<ChatResponse, Answer> responseEntity = chatClient
.prompt("What is 2+2?")
.call()
.responseEntity(Answer.class);
// Access both response and entity
ChatResponse chatResponse = responseEntity.getResponse();
Answer answer = responseEntity.getEntity();
// Get metadata from response
var usage = chatResponse.getMetadata().getUsage();
System.out.println("Tokens used: " + usage.getTotalTokens());
// Use the entity
System.out.println("Answer: " + answer.text());Streaming responses use Project Reactor's Flux for reactive processing. Content is delivered incrementally as it's generated by the model.
Flux<String> content();Example:
Flux<String> stream = chatClient
.prompt("Tell me a long story")
.stream()
.content();
// Print as content arrives
stream.subscribe(chunk -> System.out.print(chunk));
// Or collect all chunks
String complete = stream.collectList()
.map(chunks -> String.join("", chunks))
.block();With Buffering:
Flux<String> stream = chatClient
.prompt("Write an essay")
.stream()
.content();
stream
.buffer(Duration.ofMillis(100)) // Buffer for 100ms
.subscribe(chunks -> {
String buffered = String.join("", chunks);
System.out.print(buffered);
});Stream ChatResponse objects as they arrive.
Flux<ChatResponse> chatResponse();Example:
Flux<ChatResponse> stream = chatClient
.prompt("Generate text")
.stream()
.chatResponse();
stream.subscribe(response -> {
String content = response.getResult()
.getOutput()
.getContent();
System.out.print(content);
});Stream ChatClientResponse objects containing both ChatResponse and context.
Flux<ChatClientResponse> chatClientResponse();Example:
import org.springframework.ai.chat.client.ChatClientResponse;
Flux<ChatClientResponse> stream = chatClient
.prompt("Generate a report")
.stream()
.chatClientResponse();
stream.subscribe(clientResponse -> {
ChatResponse response = clientResponse.chatResponse();
if (response != null) {
String content = response.getResult()
.getOutput()
.getContent();
System.out.print(content);
}
// Access context for advisor-shared data
Map<String, Object> context = clientResponse.context();
});Note: For structured output from streaming, you need to aggregate the stream first and then parse the complete response. The streaming API does not support incremental entity parsing.
The ChatClientMessageAggregator utility helps aggregate streaming responses into complete messages.
class ChatClientMessageAggregator {
Flux<ChatClientResponse> aggregateChatClientResponse(
Flux<ChatClientResponse> chatClientResponses,
Consumer<ChatClientResponse> aggregationHandler
);
}Example:
import org.springframework.ai.chat.client.ChatClientMessageAggregator;
import org.springframework.ai.chat.client.ChatClientResponse;
// Assuming you have access to the internal Flux<ChatClientResponse>
Flux<ChatClientResponse> stream = // ... get from internal API
ChatClientMessageAggregator aggregator = new ChatClientMessageAggregator();
Flux<ChatClientResponse> aggregated = aggregator
.aggregateChatClientResponse(
stream,
completeResponse -> {
// Called when streaming completes
System.out.println("Complete response: " + completeResponse);
}
);
aggregated.subscribe(response -> {
// Process each chunk
});try {
String response = chatClient
.prompt("What is AI?")
.call()
.content();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
// Handle error
}Flux<String> stream = chatClient
.prompt("Generate text")
.stream()
.content();
stream
.doOnError(error -> {
System.err.println("Stream error: " + error.getMessage());
})
.onErrorResume(error -> {
// Provide fallback
return Flux.just("Fallback response");
})
.subscribe(chunk -> System.out.print(chunk));The ResponseEntity record wraps both the raw response and the converted entity.
record ResponseEntity<R, E>(
@Nullable R response,
@Nullable E entity
) {
R getResponse();
E getEntity();
}Type Parameters:
R - Response type (typically ChatResponse)E - Entity type (your custom class)Example:
record Data(String value) {}
ResponseEntity<ChatResponse, Data> entity = chatClient
.prompt("Get data")
.call()
.responseEntity(Data.class);
// Access response metadata
ChatResponse response = entity.getResponse();
if (response != null) {
var usage = response.getMetadata().getUsage();
System.out.println("Tokens: " + usage.getTotalTokens());
}
// Access converted entity
Data data = entity.getEntity();
System.out.println("Value: " + data.value());record WeatherInfo(
String location,
double temperature,
String conditions,
List<String> forecast
) {}
WeatherInfo weather = chatClient
.prompt("What's the weather in Paris?")
.call()
.entity(WeatherInfo.class);
System.out.println(weather.location() + ": " + weather.temperature() + "°C");
weather.forecast().forEach(day -> System.out.println(" - " + day));Flux<String> story = chatClient
.prompt("Write a short story about a robot")
.stream()
.content();
// Print in real-time
story.subscribe(
chunk -> System.out.print(chunk),
error -> System.err.println("Error: " + error),
() -> System.out.println("\n[Story complete]")
);record Analysis(String sentiment, double confidence, List<String> keywords) {}
ResponseEntity<ChatResponse, Analysis> result = chatClient
.prompt("Analyze: " + text)
.call()
.responseEntity(Analysis.class);
// Log metrics
ChatResponse response = result.getResponse();
System.out.println("Tokens used: " +
response.getMetadata().getUsage().getTotalTokens());
// Use analysis
Analysis analysis = result.getEntity();
System.out.println("Sentiment: " + analysis.sentiment() +
" (confidence: " + analysis.confidence() + ")");import reactor.core.publisher.Mono;
Flux<String> stream = chatClient
.prompt("Generate a report")
.stream()
.content();
// Collect all chunks into single string
Mono<String> complete = stream
.collect(StringBuilder::new, StringBuilder::append)
.map(StringBuilder::toString);
String fullReport = complete.block();
System.out.println(fullReport);