CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-google-cloud-texttospeech

Google Cloud Texttospeech API client library for converting text to speech with multiple voices and audio formats

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

Google Cloud Text-to-Speech API

Overview

The Google Cloud Text-to-Speech API provides advanced text-to-speech capabilities that convert text into natural-sounding speech. The API supports over 380 voices across more than 50 languages and variants, offering both standard and WaveNet neural voices for high-quality audio synthesis.

Key Features:

  • High-quality neural voices (WaveNet) and standard voices
  • Real-time and streaming synthesis
  • Long-form audio synthesis for extended content
  • SSML (Speech Synthesis Markup Language) support
  • Custom voice models and pronunciation
  • Multiple audio formats and quality settings
  • Async/await support for all operations

Package Information

# Installation
pip install google-cloud-texttospeech

# Package: google-cloud-texttospeech
# Version: 2.29.0
# Main Module: google.cloud.texttospeech

Core Imports

Basic Import

from google.cloud import texttospeech

# Main client classes
client = texttospeech.TextToSpeechClient()
async_client = texttospeech.TextToSpeechAsyncClient()

Version-Specific Imports

# Stable API (v1)
from google.cloud import texttospeech_v1

# Beta API (v1beta1) - includes timepoint features
from google.cloud import texttospeech_v1beta1

Complete Type Imports

from google.cloud.texttospeech import (
    TextToSpeechClient,
    AudioConfig,
    AudioEncoding,
    SynthesisInput,
    VoiceSelectionParams,
    SsmlVoiceGender,
    SynthesizeSpeechRequest,
    SynthesizeSpeechResponse
)

Basic Usage

Simple Text-to-Speech Synthesis

from google.cloud import texttospeech

# Initialize the client
client = texttospeech.TextToSpeechClient()

# Configure the synthesis input
synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")

# Select voice parameters
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)

# Configure audio output
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

# Create synthesis request
request = texttospeech.SynthesizeSpeechRequest(
    input=synthesis_input,
    voice=voice,
    audio_config=audio_config
)

# Perform the text-to-speech synthesis
response = client.synthesize_speech(request=request)

# Save the synthesized audio to a file
with open("output.mp3", "wb") as out:
    out.write(response.audio_content)
    print("Audio content written to file 'output.mp3'")

Architecture

Client Classes

The API provides four main client classes for different use cases:

  1. TextToSpeechClient - Synchronous client for standard operations
  2. TextToSpeechAsyncClient - Asynchronous client for async/await patterns
  3. TextToSpeechLongAudioSynthesizeClient - Synchronous client for long-form audio
  4. TextToSpeechLongAudioSynthesizeAsyncClient - Async client for long-form audio

Core Components

  • Request Types: Structured request objects for different operations
  • Response Types: Structured response objects containing results
  • Configuration Classes: Objects for configuring voice, audio, and synthesis parameters
  • Enums: Constants for audio encodings, voice genders, and other options

Capabilities

Speech Synthesis Operations

Basic text-to-speech synthesis with support for plain text and SSML input.

# Quick synthesis example
response = client.synthesize_speech(
    input=texttospeech.SynthesisInput(text="Convert this text to speech"),
    voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
    audio_config=texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    )
)

See: Speech Synthesis for complete synthesis operations documentation.

Voice Management

List and select from available voices with filtering by language and characteristics.

# List all available voices
voices_response = client.list_voices()
for voice in voices_response.voices:
    print(f"Voice: {voice.name}, Language: {voice.language_codes}")

# List voices for specific language
request = texttospeech.ListVoicesRequest(language_code="en-US")
response = client.list_voices(request=request)

See: Voice Management for voice discovery and selection.

Streaming Synthesis

Real-time bidirectional streaming for interactive applications.

# Streaming synthesis configuration
config = texttospeech.StreamingSynthesizeConfig(
    voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
    audio_config=texttospeech.StreamingAudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16,
        sample_rate_hertz=22050
    )
)

See: Streaming Synthesis for real-time streaming operations.

Long Audio Synthesis

Generate extended audio content using long-running operations.

from google.cloud.texttospeech_v1.services import text_to_speech_long_audio_synthesize

# Long audio client
long_client = text_to_speech_long_audio_synthesize.TextToSpeechLongAudioSynthesizeClient()

# Create long audio request
request = texttospeech.SynthesizeLongAudioRequest(
    parent="projects/your-project-id/locations/us-central1",
    input=texttospeech.SynthesisInput(text="Very long text content..."),
    audio_config=texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    ),
    voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
    output_gcs_uri="gs://your-bucket/output.wav"
)

See: Long Audio Synthesis for extended audio operations.

Configuration and Types

Comprehensive configuration options for voice selection, audio output, and advanced features.

# Advanced voice configuration
advanced_voice = texttospeech.AdvancedVoiceOptions(
    low_latency_journey_synthesis=True
)

# Custom pronunciations
custom_pronunciations = texttospeech.CustomPronunciations(
    pronunciations=[
        texttospeech.CustomPronunciationParams(
            phrase="example",
            ipa="ɪɡˈzæmpəl",
            phonetic_encoding=texttospeech.CustomPronunciationParams.PhoneticEncoding.IPA
        )
    ]
)

See: Configuration Types for all configuration classes and options.

Async Operations

Full async/await support for all Text-to-Speech operations.

import asyncio
from google.cloud import texttospeech

async def synthesize_async():
    async_client = texttospeech.TextToSpeechAsyncClient()
    
    request = texttospeech.SynthesizeSpeechRequest(
        input=texttospeech.SynthesisInput(text="Async synthesis"),
        voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
        audio_config=texttospeech.AudioConfig(
            audio_encoding=texttospeech.AudioEncoding.MP3
        )
    )
    
    response = await async_client.synthesize_speech(request=request)
    return response.audio_content

# Run async operation
audio_data = asyncio.run(synthesize_async())

See: Async Clients for asynchronous operation patterns.

Audio Formats and Encodings

Supported Audio Encodings

# Available audio encoding formats
from google.cloud.texttospeech import AudioEncoding

LINEAR16 = AudioEncoding.LINEAR16        # 16-bit PCM with WAV header
MP3 = AudioEncoding.MP3                  # MP3 at 32kbps
OGG_OPUS = AudioEncoding.OGG_OPUS        # Opus in Ogg container
MULAW = AudioEncoding.MULAW              # 8-bit G.711 PCMU/mu-law
ALAW = AudioEncoding.ALAW                # 8-bit G.711 PCMU/A-law
PCM = AudioEncoding.PCM                  # 16-bit PCM without header
M4A = AudioEncoding.M4A                  # M4A format

Error Handling

Common Exception Patterns

from google.api_core import exceptions
from google.cloud import texttospeech

try:
    client = texttospeech.TextToSpeechClient()
    response = client.synthesize_speech(request=request)
except exceptions.InvalidArgument as e:
    print(f"Invalid request parameters: {e}")
except exceptions.PermissionDenied as e:
    print(f"Permission denied: {e}")
except exceptions.ResourceExhausted as e:
    print(f"Quota exceeded: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

API Versions

Stable API (v1)

  • Core synthesis operations
  • Standard voice and audio configuration
  • Streaming synthesis
  • Long audio synthesis

Beta API (v1beta1)

  • All v1 features
  • Timepoint information for SSML marks
  • Enhanced response metadata
  • Advanced voice features
# Using beta API for timepoint information
from google.cloud import texttospeech_v1beta1

client = texttospeech_v1beta1.TextToSpeechClient()

request = texttospeech_v1beta1.SynthesizeSpeechRequest(
    input=texttospeech_v1beta1.SynthesisInput(
        ssml='<speak>Hello <mark name="greeting"/> world!</speak>'
    ),
    voice=texttospeech_v1beta1.VoiceSelectionParams(language_code="en-US"),
    audio_config=texttospeech_v1beta1.AudioConfig(
        audio_encoding=texttospeech_v1beta1.AudioEncoding.LINEAR16
    ),
    enable_time_pointing=[
        texttospeech_v1beta1.SynthesizeSpeechRequest.TimepointType.SSML_MARK
    ]
)

response = client.synthesize_speech(request=request)
# Response includes timepoints field with timestamp information
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/google-cloud-texttospeech@2.29.x
Publish Source
CLI
Badge
tessl/pypi-google-cloud-texttospeech badge