Tessl Tile for pypi/google-cloud-texttospeech@2.29.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

async-clients.md configuration-types.md index.md long-audio-synthesis.md speech-synthesis.md streaming-synthesis.md voice-management.md

index.mddocs/

0
# Google Cloud Text-to-Speech API
1

2
## Overview
3

4
The Google Cloud Text-to-Speech API provides advanced text-to-speech capabilities that convert text into natural-sounding speech. The API supports over 380 voices across more than 50 languages and variants, offering both standard and WaveNet neural voices for high-quality audio synthesis.
5

6
**Key Features:**
7
- High-quality neural voices (WaveNet) and standard voices
8
- Real-time and streaming synthesis
9
- Long-form audio synthesis for extended content
10
- SSML (Speech Synthesis Markup Language) support
11
- Custom voice models and pronunciation
12
- Multiple audio formats and quality settings
13
- Async/await support for all operations
14

15
## Package Information
16

17
```api { .api }
18
# Installation
19
pip install google-cloud-texttospeech
20

21
# Package: google-cloud-texttospeech
22
# Version: 2.29.0
23
# Main Module: google.cloud.texttospeech
24
```
25

26
## Core Imports
27

28
### Basic Import
29
```api { .api }
30
from google.cloud import texttospeech
31

32
# Main client classes
33
client = texttospeech.TextToSpeechClient()
34
async_client = texttospeech.TextToSpeechAsyncClient()
35
```
36

37
### Version-Specific Imports
38
```api { .api }
39
# Stable API (v1)
40
from google.cloud import texttospeech_v1
41

42
# Beta API (v1beta1) - includes timepoint features
43
from google.cloud import texttospeech_v1beta1
44
```
45

46
### Complete Type Imports
47
```api { .api }
48
from google.cloud.texttospeech import (
49
    TextToSpeechClient,
50
    AudioConfig,
51
    AudioEncoding,
52
    SynthesisInput,
53
    VoiceSelectionParams,
54
    SsmlVoiceGender,
55
    SynthesizeSpeechRequest,
56
    SynthesizeSpeechResponse
57
)
58
```
59

60
## Basic Usage
61

62
### Simple Text-to-Speech Synthesis
63
```api { .api }
64
from google.cloud import texttospeech
65

66
# Initialize the client
67
client = texttospeech.TextToSpeechClient()
68

69
# Configure the synthesis input
70
synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")
71

72
# Select voice parameters
73
voice = texttospeech.VoiceSelectionParams(
74
    language_code="en-US",
75
    ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
76
)
77

78
# Configure audio output
79
audio_config = texttospeech.AudioConfig(
80
    audio_encoding=texttospeech.AudioEncoding.MP3
81
)
82

83
# Create synthesis request
84
request = texttospeech.SynthesizeSpeechRequest(
85
    input=synthesis_input,
86
    voice=voice,
87
    audio_config=audio_config
88
)
89

90
# Perform the text-to-speech synthesis
91
response = client.synthesize_speech(request=request)
92

93
# Save the synthesized audio to a file
94
with open("output.mp3", "wb") as out:
95
    out.write(response.audio_content)
96
    print("Audio content written to file 'output.mp3'")
97
```
98

99
## Architecture
100

101
### Client Classes
102
The API provides four main client classes for different use cases:
103

104
1. **TextToSpeechClient** - Synchronous client for standard operations
105
2. **TextToSpeechAsyncClient** - Asynchronous client for async/await patterns
106
3. **TextToSpeechLongAudioSynthesizeClient** - Synchronous client for long-form audio
107
4. **TextToSpeechLongAudioSynthesizeAsyncClient** - Async client for long-form audio
108

109
### Core Components
110
- **Request Types**: Structured request objects for different operations
111
- **Response Types**: Structured response objects containing results
112
- **Configuration Classes**: Objects for configuring voice, audio, and synthesis parameters
113
- **Enums**: Constants for audio encodings, voice genders, and other options
114

115
## Capabilities
116

117
### Speech Synthesis Operations
118
Basic text-to-speech synthesis with support for plain text and SSML input.
119

120
```api { .api }
121
# Quick synthesis example
122
response = client.synthesize_speech(
123
    input=texttospeech.SynthesisInput(text="Convert this text to speech"),
124
    voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
125
    audio_config=texttospeech.AudioConfig(
126
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
127
    )
128
)
129
```
130

131
**See:** [Speech Synthesis](./speech-synthesis.md) for complete synthesis operations documentation.
132

133
### Voice Management
134
List and select from available voices with filtering by language and characteristics.
135

136
```api { .api }
137
# List all available voices
138
voices_response = client.list_voices()
139
for voice in voices_response.voices:
140
    print(f"Voice: {voice.name}, Language: {voice.language_codes}")
141

142
# List voices for specific language
143
request = texttospeech.ListVoicesRequest(language_code="en-US")
144
response = client.list_voices(request=request)
145
```
146

147
**See:** [Voice Management](./voice-management.md) for voice discovery and selection.
148

149
### Streaming Synthesis
150
Real-time bidirectional streaming for interactive applications.
151

152
```api { .api }
153
# Streaming synthesis configuration
154
config = texttospeech.StreamingSynthesizeConfig(
155
    voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
156
    audio_config=texttospeech.StreamingAudioConfig(
157
        audio_encoding=texttospeech.AudioEncoding.LINEAR16,
158
        sample_rate_hertz=22050
159
    )
160
)
161
```
162

163
**See:** [Streaming Synthesis](./streaming-synthesis.md) for real-time streaming operations.
164

165
### Long Audio Synthesis
166
Generate extended audio content using long-running operations.
167

168
```api { .api }
169
from google.cloud.texttospeech_v1.services import text_to_speech_long_audio_synthesize
170

171
# Long audio client
172
long_client = text_to_speech_long_audio_synthesize.TextToSpeechLongAudioSynthesizeClient()
173

174
# Create long audio request
175
request = texttospeech.SynthesizeLongAudioRequest(
176
    parent="projects/your-project-id/locations/us-central1",
177
    input=texttospeech.SynthesisInput(text="Very long text content..."),
178
    audio_config=texttospeech.AudioConfig(
179
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
180
    ),
181
    voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
182
    output_gcs_uri="gs://your-bucket/output.wav"
183
)
184
```
185

186
**See:** [Long Audio Synthesis](./long-audio-synthesis.md) for extended audio operations.
187

188
### Configuration and Types
189
Comprehensive configuration options for voice selection, audio output, and advanced features.
190

191
```api { .api }
192
# Advanced voice configuration
193
advanced_voice = texttospeech.AdvancedVoiceOptions(
194
    low_latency_journey_synthesis=True
195
)
196

197
# Custom pronunciations
198
custom_pronunciations = texttospeech.CustomPronunciations(
199
    pronunciations=[
200
        texttospeech.CustomPronunciationParams(
201
            phrase="example",
202
            ipa="ɪɡˈzæmpəl",
203
            phonetic_encoding=texttospeech.CustomPronunciationParams.PhoneticEncoding.IPA
204
        )
205
    ]
206
)
207
```
208

209
**See:** [Configuration Types](./configuration-types.md) for all configuration classes and options.
210

211
### Async Operations
212
Full async/await support for all Text-to-Speech operations.
213

214
```api { .api }
215
import asyncio
216
from google.cloud import texttospeech
217

218
async def synthesize_async():
219
    async_client = texttospeech.TextToSpeechAsyncClient()
220
    
221
    request = texttospeech.SynthesizeSpeechRequest(
222
        input=texttospeech.SynthesisInput(text="Async synthesis"),
223
        voice=texttospeech.VoiceSelectionParams(language_code="en-US"),
224
        audio_config=texttospeech.AudioConfig(
225
            audio_encoding=texttospeech.AudioEncoding.MP3
226
        )
227
    )
228
    
229
    response = await async_client.synthesize_speech(request=request)
230
    return response.audio_content
231

232
# Run async operation
233
audio_data = asyncio.run(synthesize_async())
234
```
235

236
**See:** [Async Clients](./async-clients.md) for asynchronous operation patterns.
237

238
## Audio Formats and Encodings
239

240
### Supported Audio Encodings
241
```api { .api }
242
# Available audio encoding formats
243
from google.cloud.texttospeech import AudioEncoding
244

245
LINEAR16 = AudioEncoding.LINEAR16        # 16-bit PCM with WAV header
246
MP3 = AudioEncoding.MP3                  # MP3 at 32kbps
247
OGG_OPUS = AudioEncoding.OGG_OPUS        # Opus in Ogg container
248
MULAW = AudioEncoding.MULAW              # 8-bit G.711 PCMU/mu-law
249
ALAW = AudioEncoding.ALAW                # 8-bit G.711 PCMU/A-law
250
PCM = AudioEncoding.PCM                  # 16-bit PCM without header
251
M4A = AudioEncoding.M4A                  # M4A format
252
```
253

254
## Error Handling
255

256
### Common Exception Patterns
257
```api { .api }
258
from google.api_core import exceptions
259
from google.cloud import texttospeech
260

261
try:
262
    client = texttospeech.TextToSpeechClient()
263
    response = client.synthesize_speech(request=request)
264
except exceptions.InvalidArgument as e:
265
    print(f"Invalid request parameters: {e}")
266
except exceptions.PermissionDenied as e:
267
    print(f"Permission denied: {e}")
268
except exceptions.ResourceExhausted as e:
269
    print(f"Quota exceeded: {e}")
270
except Exception as e:
271
    print(f"Unexpected error: {e}")
272
```
273

274
## API Versions
275

276
### Stable API (v1)
277
- Core synthesis operations
278
- Standard voice and audio configuration
279
- Streaming synthesis
280
- Long audio synthesis
281

282
### Beta API (v1beta1)
283
- All v1 features
284
- Timepoint information for SSML marks
285
- Enhanced response metadata
286
- Advanced voice features
287

288
```api { .api }
289
# Using beta API for timepoint information
290
from google.cloud import texttospeech_v1beta1
291

292
client = texttospeech_v1beta1.TextToSpeechClient()
293

294
request = texttospeech_v1beta1.SynthesizeSpeechRequest(
295
    input=texttospeech_v1beta1.SynthesisInput(
296
        ssml='<speak>Hello <mark name="greeting"/> world!</speak>'
297
    ),
298
    voice=texttospeech_v1beta1.VoiceSelectionParams(language_code="en-US"),
299
    audio_config=texttospeech_v1beta1.AudioConfig(
300
        audio_encoding=texttospeech_v1beta1.AudioEncoding.LINEAR16
301
    ),
302
    enable_time_pointing=[
303
        texttospeech_v1beta1.SynthesizeSpeechRequest.TimepointType.SSML_MARK
304
    ]
305
)
306

307
response = client.synthesize_speech(request=request)
308
# Response includes timepoints field with timestamp information
309
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/