tessl/maven-org-springframework-ai--spring-ai-model

Core model interfaces and abstractions for Spring AI framework providing portable API for chat, embeddings, images, audio, and tool calling across multiple AI providers

Overview

Eval results

Files

Text-to-Speech

Name: tessl/maven-org-springframework-ai--spring-ai-model
Author: tessl

Text-to-speech capabilities for generating audio from text, supporting voice selection, speed control, audio format options, and both synchronous and streaming output.

Capabilities

TextToSpeechModel Interface

Main interface for converting text to speech audio.

public interface TextToSpeechModel extends Model<TextToSpeechPrompt, TextToSpeechResponse>, StreamingTextToSpeechModel {
    /**
     * Generate speech audio from text with default options.
     *
     * @param text the text to convert to speech
     * @return the audio bytes
     */
    byte[] call(String text);

    /**
     * Generate speech audio from a prompt.
     *
     * @param prompt the text-to-speech prompt
     * @return the text-to-speech response
     */
    TextToSpeechResponse call(TextToSpeechPrompt prompt);

    /**
     * Get the default options for this TTS model.
     *
     * @return the default text-to-speech options
     */
    TextToSpeechOptions getDefaultOptions();
}

StreamingTextToSpeechModel Interface

Interface for models that support streaming audio output.

public interface StreamingTextToSpeechModel extends StreamingModel<TextToSpeechPrompt, TextToSpeechResponse> {
    /**
     * Stream audio generation for the given prompt.
     *
     * @param prompt the text-to-speech prompt
     * @return Flux of streaming audio responses
     */
    Flux<TextToSpeechResponse> stream(TextToSpeechPrompt prompt);
}

TextToSpeechPrompt

Request for generating speech from text.

public class TextToSpeechPrompt implements ModelRequest<TextToSpeechMessage> {
    /**
     * Construct a TextToSpeechPrompt from simple text.
     *
     * @param text the text to convert to speech
     */
    public TextToSpeechPrompt(String text);

    /**
     * Construct a TextToSpeechPrompt from a message.
     *
     * @param message the text-to-speech message
     */
    public TextToSpeechPrompt(TextToSpeechMessage message);

    /**
     * Construct a TextToSpeechPrompt with message and options.
     *
     * @param message the text-to-speech message
     * @param options the text-to-speech options
     */
    public TextToSpeechPrompt(TextToSpeechMessage message, TextToSpeechOptions options);

    /**
     * Get the text-to-speech message.
     *
     * @return the message
     */
    TextToSpeechMessage getInstructions();

    /**
     * Get the text-to-speech options.
     *
     * @return the options
     */
    TextToSpeechOptions getOptions();
}

TextToSpeechResponse

Response containing generated speech audio and metadata.

public class TextToSpeechResponse implements ModelResponse<Speech> {
    /**
     * Construct a TextToSpeechResponse with a single speech result.
     *
     * @param speech the speech result
     */
    public TextToSpeechResponse(Speech speech);

    /**
     * Construct a TextToSpeechResponse with multiple speech results.
     *
     * @param speeches the list of speech results
     */
    public TextToSpeechResponse(List<Speech> speeches);

    /**
     * Construct a TextToSpeechResponse with speeches and metadata.
     *
     * @param speeches the list of speech results
     * @param metadata the response metadata
     */
    public TextToSpeechResponse(
        List<Speech> speeches,
        TextToSpeechResponseMetadata metadata
    );

    /**
     * Get the first speech result.
     *
     * @return the first speech
     */
    Speech getResult();

    /**
     * Get all speech results.
     *
     * @return list of speeches
     */
    List<Speech> getResults();

    /**
     * Get response metadata.
     *
     * @return the text-to-speech response metadata
     */
    TextToSpeechResponseMetadata getMetadata();
}

Speech

Single speech result containing audio bytes.

public class Speech implements ModelResult<byte[]> {
    /**
     * Construct a Speech with audio bytes.
     *
     * @param audio the audio data
     */
    public Speech(byte[] audio);

    /**
     * Construct a Speech with audio bytes and metadata.
     *
     * @param audio the audio data
     * @param metadata the result metadata
     */
    public Speech(byte[] audio, ResultMetadata metadata);

    /**
     * Get the audio bytes.
     *
     * @return the audio data as byte array
     */
    byte[] getOutput();

    /**
     * Get result metadata.
     *
     * @return the result metadata
     */
    ResultMetadata getMetadata();
}

TextToSpeechMessage

Message containing text to be converted to speech.

public class TextToSpeechMessage {
    /**
     * Construct a TextToSpeechMessage with text.
     *
     * @param text the text to convert to speech
     */
    public TextToSpeechMessage(String text);

    /**
     * Get the text content.
     *
     * @return the text
     */
    String getText();
}

TextToSpeechOptions Interface

Options for configuring text-to-speech generation.

public interface TextToSpeechOptions extends ModelOptions {
    /**
     * Get the model name to use.
     *
     * @return the model name
     */
    String getModel();

    /**
     * Get the voice identifier to use.
     * Voice options are provider-specific.
     *
     * @return the voice ID
     */
    String getVoice();

    /**
     * Get the speech speed (0.25 to 4.0).
     * 1.0 is normal speed.
     *
     * @return the speed multiplier
     */
    Float getSpeed();

    /**
     * Get the audio response format.
     * Common formats: mp3, opus, aac, flac, wav, pcm
     *
     * @return the response format
     */
    String getResponseFormat();

    /**
     * Create a new options builder.
     *
     * @return a new builder
     */
    static Builder builder();

    /**
     * Builder interface for constructing TextToSpeechOptions.
     */
    interface Builder {
        /**
         * Set the model name.
         *
         * @param model the model name
         * @return this builder
         */
        Builder model(String model);

        /**
         * Set the voice identifier.
         *
         * @param voice the voice identifier
         * @return this builder
         */
        Builder voice(String voice);

        /**
         * Set the output format.
         *
         * @param format the output format (e.g., "mp3", "wav")
         * @return this builder
         */
        Builder format(String format);

        /**
         * Set the speech speed.
         *
         * @param speed the speech speed
         * @return this builder
         */
        Builder speed(Double speed);

        /**
         * Build the TextToSpeechOptions.
         *
         * @return the constructed TextToSpeechOptions
         */
        TextToSpeechOptions build();
    }
}

DefaultTextToSpeechOptions

Default implementation of TextToSpeechOptions.

public class DefaultTextToSpeechOptions implements TextToSpeechOptions {
    // Default implementation with standard TTS options
}

TextToSpeechResponseMetadata

Metadata for text-to-speech responses.

public class TextToSpeechResponseMetadata extends MutableResponseMetadata {
    // Response-level metadata for text-to-speech
}

Usage Examples

Simple Text-to-Speech

import org.springframework.ai.audio.tts.TextToSpeechModel;
import org.springframework.beans.factory.annotation.Autowired;
import java.io.FileOutputStream;

@Service
public class TextToSpeechService {
    @Autowired
    private TextToSpeechModel ttsModel;

    public void generateSpeech(String text, String outputPath) throws Exception {
        // Generate audio with defaults
        byte[] audio = ttsModel.call(text);

        // Save to file
        try (FileOutputStream fos = new FileOutputStream(outputPath)) {
            fos.write(audio);
        }
    }
}

Text-to-Speech with Options

import org.springframework.ai.audio.tts.*;

// Configure TTS options
TextToSpeechOptions options = TextToSpeechOptions.builder()
    .voice("alloy")
    .speed(1.0f)
    .responseFormat("mp3")
    .model("tts-1")
    .build();

// Create prompt
TextToSpeechPrompt prompt = new TextToSpeechPrompt("Hello, world!", options);

// Generate audio
TextToSpeechResponse response = ttsModel.call(prompt);
byte[] audio = response.getResult().getOutput();

// Save audio
Files.write(Paths.get("output.mp3"), audio);

Using Different Voices

// Try different voices
List<String> voices = List.of("alloy", "echo", "fable", "onyx", "nova", "shimmer");

for (String voice : voices) {
    TextToSpeechOptions options = TextToSpeechOptions.builder()
        .voice(voice)
        .build();

    TextToSpeechPrompt prompt = new TextToSpeechPrompt("Hello!", options);
    TextToSpeechResponse response = ttsModel.call(prompt);

    byte[] audio = response.getResult().getOutput();
    Files.write(Paths.get("hello_" + voice + ".mp3"), audio);
}

Speed Control

// Slower speech (0.5x speed)
TextToSpeechOptions slowOptions = TextToSpeechOptions.builder()
    .speed(0.5f)
    .build();

// Normal speech (1.0x speed)
TextToSpeechOptions normalOptions = TextToSpeechOptions.builder()
    .speed(1.0f)
    .build();

// Faster speech (2.0x speed)
TextToSpeechOptions fastOptions = TextToSpeechOptions.builder()
    .speed(2.0f)
    .build();

String text = "This is a speed test.";
byte[] slowAudio = ttsModel.call(new TextToSpeechPrompt(text, slowOptions))
    .getResult().getOutput();
byte[] normalAudio = ttsModel.call(new TextToSpeechPrompt(text, normalOptions))
    .getResult().getOutput();
byte[] fastAudio = ttsModel.call(new TextToSpeechPrompt(text, fastOptions))
    .getResult().getOutput();

Different Audio Formats

// MP3 format (compressed, smaller file)
TextToSpeechOptions mp3Options = TextToSpeechOptions.builder()
    .responseFormat("mp3")
    .build();

// WAV format (uncompressed, larger file, higher quality)
TextToSpeechOptions wavOptions = TextToSpeechOptions.builder()
    .responseFormat("wav")
    .build();

// OPUS format (efficient for streaming)
TextToSpeechOptions opusOptions = TextToSpeechOptions.builder()
    .responseFormat("opus")
    .build();

// FLAC format (lossless compression)
TextToSpeechOptions flacOptions = TextToSpeechOptions.builder()
    .responseFormat("flac")
    .build();

String text = "Audio format test";
byte[] mp3 = ttsModel.call(new TextToSpeechPrompt(text, mp3Options))
    .getResult().getOutput();
byte[] wav = ttsModel.call(new TextToSpeechPrompt(text, wavOptions))
    .getResult().getOutput();

Streaming Text-to-Speech

import reactor.core.publisher.Flux;
import java.io.ByteArrayOutputStream;

// Stream audio generation
TextToSpeechPrompt prompt = new TextToSpeechPrompt("A long text to be streamed...");

Flux<TextToSpeechResponse> stream = ttsModel.stream(prompt);

// Collect audio chunks
ByteArrayOutputStream audioStream = new ByteArrayOutputStream();
stream.subscribe(
    chunk -> {
        byte[] audio = chunk.getResult().getOutput();
        audioStream.write(audio, 0, audio.length);
    },
    error -> System.err.println("Error: " + error.getMessage()),
    () -> {
        // Save complete audio
        try {
            Files.write(Paths.get("streamed.mp3"), audioStream.toByteArray());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
);

Generating Multiple Speeches

List<String> texts = List.of(
    "First sentence.",
    "Second sentence.",
    "Third sentence."
);

List<byte[]> audioFiles = new ArrayList<>();

for (String text : texts) {
    TextToSpeechPrompt prompt = new TextToSpeechPrompt(text);
    TextToSpeechResponse response = ttsModel.call(prompt);
    audioFiles.add(response.getResult().getOutput());
}

// Save each audio file
for (int i = 0; i < audioFiles.size(); i++) {
    Files.write(Paths.get("audio_" + i + ".mp3"), audioFiles.get(i));
}

Error Handling

public byte[] safeGenerateSpeech(String text) {
    try {
        TextToSpeechOptions options = TextToSpeechOptions.builder()
            .voice("alloy")
            .speed(1.0f)
            .responseFormat("mp3")
            .build();

        TextToSpeechPrompt prompt = new TextToSpeechPrompt(text, options);
        TextToSpeechResponse response = ttsModel.call(prompt);

        return response.getResult().getOutput();
    } catch (Exception e) {
        System.err.println("TTS generation failed: " + e.getMessage());
        return null;
    }
}

REST API Example

@RestController
@RequestMapping("/api/tts")
public class TextToSpeechController {
    private final TextToSpeechModel ttsModel;

    public TextToSpeechController(TextToSpeechModel ttsModel) {
        this.ttsModel = ttsModel;
    }

    @PostMapping(value = "/generate", produces = "audio/mpeg")
    public ResponseEntity<byte[]> generateSpeech(@RequestBody TtsRequest request) {
        TextToSpeechOptions options = TextToSpeechOptions.builder()
            .voice(request.voice())
            .speed(request.speed())
            .responseFormat("mp3")
            .build();

        TextToSpeechPrompt prompt = new TextToSpeechPrompt(
            request.text(),
            options
        );

        TextToSpeechResponse response = ttsModel.call(prompt);
        byte[] audio = response.getResult().getOutput();

        return ResponseEntity.ok()
            .header("Content-Type", "audio/mpeg")
            .body(audio);
    }

    record TtsRequest(String text, String voice, Float speed) {}
}

Long Text Chunking

@Service
public class LongTextTtsService {
    private final TextToSpeechModel ttsModel;

    public byte[] generateForLongText(String longText) throws Exception {
        // Split text into chunks (providers often have character limits)
        List<String> chunks = splitIntoChunks(longText, 4000);

        ByteArrayOutputStream combinedAudio = new ByteArrayOutputStream();

        for (String chunk : chunks) {
            byte[] audio = ttsModel.call(chunk);
            combinedAudio.write(audio);
        }

        return combinedAudio.toByteArray();
    }

    private List<String> splitIntoChunks(String text, int chunkSize) {
        List<String> chunks = new ArrayList<>();
        String[] sentences = text.split("\\. ");

        StringBuilder currentChunk = new StringBuilder();
        for (String sentence : sentences) {
            if (currentChunk.length() + sentence.length() > chunkSize) {
                chunks.add(currentChunk.toString());
                currentChunk = new StringBuilder();
            }
            currentChunk.append(sentence).append(". ");
        }

        if (currentChunk.length() > 0) {
            chunks.add(currentChunk.toString());
        }

        return chunks;
    }
}

Accessing Metadata

TextToSpeechPrompt prompt = new TextToSpeechPrompt("Hello world");
TextToSpeechResponse response = ttsModel.call(prompt);

// Access speech
Speech speech = response.getResult();
byte[] audio = speech.getOutput();

// Access metadata
ResultMetadata metadata = speech.getMetadata();
TextToSpeechResponseMetadata responseMetadata = response.getMetadata();

Using Default Options

// Get model's default options
TextToSpeechOptions defaults = ttsModel.getDefaultOptions();

// Use defaults
TextToSpeechPrompt prompt = new TextToSpeechPrompt("Using defaults");
byte[] audio = ttsModel.call(prompt.getText());

Spring Configuration

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class TtsConfig {

    @Bean
    public TextToSpeechOptions defaultTtsOptions() {
        return TextToSpeechOptions.builder()
            .model("tts-1")
            .voice("alloy")
            .speed(1.0f)
            .responseFormat("mp3")
            .build();
    }
}

Multilingual Speech Generation

Map<String, String> greetings = Map.of(
    "English", "Hello, how are you?",
    "Spanish", "Hola, ¿cómo estás?",
    "French", "Bonjour, comment allez-vous?",
    "German", "Hallo, wie geht es dir?",
    "Japanese", "こんにちは、お元気ですか？"
);

for (Map.Entry<String, String> entry : greetings.entrySet()) {
    String language = entry.getKey();
    String text = entry.getValue();

    byte[] audio = ttsModel.call(text);
    Files.write(
        Paths.get("greeting_" + language.toLowerCase() + ".mp3"),
        audio
    );
}

Install with Tessl CLI

npx tessl i tessl/maven-org-springframework-ai--spring-ai-model