0
# Google Cloud Text-to-Speech API
1
2
## Overview
3
4
The Google Cloud Text-to-Speech API provides advanced text-to-speech capabilities that convert text into natural-sounding speech. The API supports over 380 voices across more than 50 languages and variants, offering both standard and WaveNet neural voices for high-quality audio synthesis.
5
6
**Key Features:**
7
- High-quality neural voices (WaveNet) and standard voices
8
- Real-time and streaming synthesis
9
- Long-form audio synthesis for extended content
10
- SSML (Speech Synthesis Markup Language) support
11
- Custom voice models and pronunciation
12
- Multiple audio formats and quality settings
13
- Async/await support for all operations
14
15
## Package Information
16
17
```api { .api }
18
# Installation
19
pip install google-cloud-texttospeech
20
21
# Package: google-cloud-texttospeech
22
# Version: 2.29.0
23
# Main Module: google.cloud.texttospeech
24
```
25
26
## Core Imports
27
28
### Basic Import
29
```api { .api }
30
from google.cloud import texttospeech
31
32
# Main client classes
33
client = texttospeech.TextToSpeechClient()
34
async_client = texttospeech.TextToSpeechAsyncClient()
35
```
36
37
### Version-Specific Imports
38
```api { .api }
39
# Stable API (v1)
40
from google.cloud import texttospeech_v1
41
42
# Beta API (v1beta1) - includes timepoint features
43
from google.cloud import texttospeech_v1beta1
44
```
45
46
### Complete Type Imports
47
```api { .api }
48
from google.cloud.texttospeech import (
49
TextToSpeechClient,
50
AudioConfig,
51
AudioEncoding,
52
SynthesisInput,
53
VoiceSelectionParams,
54
SsmlVoiceGender,
55
SynthesizeSpeechRequest,
56
SynthesizeSpeechResponse
57
)
58
```
59
60
## Basic Usage
61
62
### Simple Text-to-Speech Synthesis
63
```api { .api }
64
from google.cloud import texttospeech
65
66
# Initialize the client
67
client = texttospeech.TextToSpeechClient()
68
69
# Configure the synthesis input
70
synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")
71
72
# Select voice parameters
73
voice = texttospeech.VoiceSelectionParams(
74
language_code="en-US",
75
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
76
)
77
78
# Configure audio output
79
audio_config = texttospeech.AudioConfig(
80
audio_encoding=texttospeech.AudioEncoding.MP3
81
)
82
83
# Create synthesis request
84
request = texttospeech.SynthesizeSpeechRequest(
85
input=synthesis_input,
86
voice=voice,
87
audio_config=audio_config
88
)
89
90
# Perform the text-to-speech synthesis
91
response = client.synthesize_speech(request=request)
92
93
# Save the synthesized audio to a file
94
with open("output.mp3", "wb") as out:
95
out.write(response.audio_content)
96
print("Audio content written to file 'output.mp3'")
97
```
98
99
## Architecture
100
101
### Client Classes
102
The API provides four main client classes for different use cases:
103
104
1. **TextToSpeechClient** - Synchronous client for standard operations
105
2. **TextToSpeechAsyncClient** - Asynchronous client for async/await patterns
106
3. **TextToSpeechLongAudioSynthesizeClient** - Synchronous client for long-form audio
107
4. **TextToSpeechLongAudioSynthesizeAsyncClient** - Async client for long-form audio
108
109
### Core Components
110
- **Request Types**: Structured request objects for different operations
111
- **Response Types**: Structured response objects containing results
112
- **Configuration Classes**: Objects for configuring voice, audio, and synthesis parameters
113
- **Enums**: Constants for audio encodings, voice genders, and other options
114
115
## Capabilities
116
117
### Speech Synthesis Operations
118
Basic text-to-speech synthesis with support for plain text and SSML input.
119
120
```api { .api }
121
# Quick synthesis example
122
response = client.synthesize_speech(
123
input=texttospeech.SynthesisInput(text="Convert this text to speech"),
124
voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
125
audio_config=texttospeech.AudioConfig(
126
audio_encoding=texttospeech.AudioEncoding.LINEAR16
127
)
128
)
129
```
130
131
**See:** [Speech Synthesis](./speech-synthesis.md) for complete synthesis operations documentation.
132
133
### Voice Management
134
List and select from available voices with filtering by language and characteristics.
135
136
```api { .api }
137
# List all available voices
138
voices_response = client.list_voices()
139
for voice in voices_response.voices:
140
print(f"Voice: {voice.name}, Language: {voice.language_codes}")
141
142
# List voices for specific language
143
request = texttospeech.ListVoicesRequest(language_code="en-US")
144
response = client.list_voices(request=request)
145
```
146
147
**See:** [Voice Management](./voice-management.md) for voice discovery and selection.
148
149
### Streaming Synthesis
150
Real-time bidirectional streaming for interactive applications.
151
152
```api { .api }
153
# Streaming synthesis configuration
154
config = texttospeech.StreamingSynthesizeConfig(
155
voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
156
audio_config=texttospeech.StreamingAudioConfig(
157
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
158
sample_rate_hertz=22050
159
)
160
)
161
```
162
163
**See:** [Streaming Synthesis](./streaming-synthesis.md) for real-time streaming operations.
164
165
### Long Audio Synthesis
166
Generate extended audio content using long-running operations.
167
168
```api { .api }
169
from google.cloud.texttospeech_v1.services import text_to_speech_long_audio_synthesize
170
171
# Long audio client
172
long_client = text_to_speech_long_audio_synthesize.TextToSpeechLongAudioSynthesizeClient()
173
174
# Create long audio request
175
request = texttospeech.SynthesizeLongAudioRequest(
176
parent="projects/your-project-id/locations/us-central1",
177
input=texttospeech.SynthesisInput(text="Very long text content..."),
178
audio_config=texttospeech.AudioConfig(
179
audio_encoding=texttospeech.AudioEncoding.LINEAR16
180
),
181
voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
182
output_gcs_uri="gs://your-bucket/output.wav"
183
)
184
```
185
186
**See:** [Long Audio Synthesis](./long-audio-synthesis.md) for extended audio operations.
187
188
### Configuration and Types
189
Comprehensive configuration options for voice selection, audio output, and advanced features.
190
191
```api { .api }
192
# Advanced voice configuration
193
advanced_voice = texttospeech.AdvancedVoiceOptions(
194
low_latency_journey_synthesis=True
195
)
196
197
# Custom pronunciations
198
custom_pronunciations = texttospeech.CustomPronunciations(
199
pronunciations=[
200
texttospeech.CustomPronunciationParams(
201
phrase="example",
202
ipa="ɪɡˈzæmpəl",
203
phonetic_encoding=texttospeech.CustomPronunciationParams.PhoneticEncoding.IPA
204
)
205
]
206
)
207
```
208
209
**See:** [Configuration Types](./configuration-types.md) for all configuration classes and options.
210
211
### Async Operations
212
Full async/await support for all Text-to-Speech operations.
213
214
```api { .api }
215
import asyncio
216
from google.cloud import texttospeech
217
218
async def synthesize_async():
219
async_client = texttospeech.TextToSpeechAsyncClient()
220
221
request = texttospeech.SynthesizeSpeechRequest(
222
input=texttospeech.SynthesisInput(text="Async synthesis"),
223
voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
224
audio_config=texttospeech.AudioConfig(
225
audio_encoding=texttospeech.AudioEncoding.MP3
226
)
227
)
228
229
response = await async_client.synthesize_speech(request=request)
230
return response.audio_content
231
232
# Run async operation
233
audio_data = asyncio.run(synthesize_async())
234
```
235
236
**See:** [Async Clients](./async-clients.md) for asynchronous operation patterns.
237
238
## Audio Formats and Encodings
239
240
### Supported Audio Encodings
241
```api { .api }
242
# Available audio encoding formats
243
from google.cloud.texttospeech import AudioEncoding
244
245
LINEAR16 = AudioEncoding.LINEAR16 # 16-bit PCM with WAV header
246
MP3 = AudioEncoding.MP3 # MP3 at 32kbps
247
OGG_OPUS = AudioEncoding.OGG_OPUS # Opus in Ogg container
248
MULAW = AudioEncoding.MULAW # 8-bit G.711 PCMU/mu-law
249
ALAW = AudioEncoding.ALAW # 8-bit G.711 PCMU/A-law
250
PCM = AudioEncoding.PCM # 16-bit PCM without header
251
M4A = AudioEncoding.M4A # M4A format
252
```
253
254
## Error Handling
255
256
### Common Exception Patterns
257
```api { .api }
258
from google.api_core import exceptions
259
from google.cloud import texttospeech
260
261
try:
262
client = texttospeech.TextToSpeechClient()
263
response = client.synthesize_speech(request=request)
264
except exceptions.InvalidArgument as e:
265
print(f"Invalid request parameters: {e}")
266
except exceptions.PermissionDenied as e:
267
print(f"Permission denied: {e}")
268
except exceptions.ResourceExhausted as e:
269
print(f"Quota exceeded: {e}")
270
except Exception as e:
271
print(f"Unexpected error: {e}")
272
```
273
274
## API Versions
275
276
### Stable API (v1)
277
- Core synthesis operations
278
- Standard voice and audio configuration
279
- Streaming synthesis
280
- Long audio synthesis
281
282
### Beta API (v1beta1)
283
- All v1 features
284
- Timepoint information for SSML marks
285
- Enhanced response metadata
286
- Advanced voice features
287
288
```api { .api }
289
# Using beta API for timepoint information
290
from google.cloud import texttospeech_v1beta1
291
292
client = texttospeech_v1beta1.TextToSpeechClient()
293
294
request = texttospeech_v1beta1.SynthesizeSpeechRequest(
295
input=texttospeech_v1beta1.SynthesisInput(
296
ssml='<speak>Hello <mark name="greeting"/> world!</speak>'
297
),
298
voice=texttospeech_v1beta1.VoiceSelectionParams(language_code="en-US"),
299
audio_config=texttospeech_v1beta1.AudioConfig(
300
audio_encoding=texttospeech_v1beta1.AudioEncoding.LINEAR16
301
),
302
enable_time_pointing=[
303
texttospeech_v1beta1.SynthesizeSpeechRequest.TimepointType.SSML_MARK
304
]
305
)
306
307
response = client.synthesize_speech(request=request)
308
# Response includes timepoints field with timestamp information
309
```