or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

async-clients.mdconfiguration-types.mdindex.mdlong-audio-synthesis.mdspeech-synthesis.mdstreaming-synthesis.mdvoice-management.md

index.mddocs/

0

# Google Cloud Text-to-Speech API

1

2

## Overview

3

4

The Google Cloud Text-to-Speech API provides advanced text-to-speech capabilities that convert text into natural-sounding speech. The API supports over 380 voices across more than 50 languages and variants, offering both standard and WaveNet neural voices for high-quality audio synthesis.

5

6

**Key Features:**

7

- High-quality neural voices (WaveNet) and standard voices

8

- Real-time and streaming synthesis

9

- Long-form audio synthesis for extended content

10

- SSML (Speech Synthesis Markup Language) support

11

- Custom voice models and pronunciation

12

- Multiple audio formats and quality settings

13

- Async/await support for all operations

14

15

## Package Information

16

17

```api { .api }

18

# Installation

19

pip install google-cloud-texttospeech

20

21

# Package: google-cloud-texttospeech

22

# Version: 2.29.0

23

# Main Module: google.cloud.texttospeech

24

```

25

26

## Core Imports

27

28

### Basic Import

29

```api { .api }

30

from google.cloud import texttospeech

31

32

# Main client classes

33

client = texttospeech.TextToSpeechClient()

34

async_client = texttospeech.TextToSpeechAsyncClient()

35

```

36

37

### Version-Specific Imports

38

```api { .api }

39

# Stable API (v1)

40

from google.cloud import texttospeech_v1

41

42

# Beta API (v1beta1) - includes timepoint features

43

from google.cloud import texttospeech_v1beta1

44

```

45

46

### Complete Type Imports

47

```api { .api }

48

from google.cloud.texttospeech import (

49

TextToSpeechClient,

50

AudioConfig,

51

AudioEncoding,

52

SynthesisInput,

53

VoiceSelectionParams,

54

SsmlVoiceGender,

55

SynthesizeSpeechRequest,

56

SynthesizeSpeechResponse

57

)

58

```

59

60

## Basic Usage

61

62

### Simple Text-to-Speech Synthesis

63

```api { .api }

64

from google.cloud import texttospeech

65

66

# Initialize the client

67

client = texttospeech.TextToSpeechClient()

68

69

# Configure the synthesis input

70

synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")

71

72

# Select voice parameters

73

voice = texttospeech.VoiceSelectionParams(

74

language_code="en-US",

75

ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL

76

)

77

78

# Configure audio output

79

audio_config = texttospeech.AudioConfig(

80

audio_encoding=texttospeech.AudioEncoding.MP3

81

)

82

83

# Create synthesis request

84

request = texttospeech.SynthesizeSpeechRequest(

85

input=synthesis_input,

86

voice=voice,

87

audio_config=audio_config

88

)

89

90

# Perform the text-to-speech synthesis

91

response = client.synthesize_speech(request=request)

92

93

# Save the synthesized audio to a file

94

with open("output.mp3", "wb") as out:

95

out.write(response.audio_content)

96

print("Audio content written to file 'output.mp3'")

97

```

98

99

## Architecture

100

101

### Client Classes

102

The API provides four main client classes for different use cases:

103

104

1. **TextToSpeechClient** - Synchronous client for standard operations

105

2. **TextToSpeechAsyncClient** - Asynchronous client for async/await patterns

106

3. **TextToSpeechLongAudioSynthesizeClient** - Synchronous client for long-form audio

107

4. **TextToSpeechLongAudioSynthesizeAsyncClient** - Async client for long-form audio

108

109

### Core Components

110

- **Request Types**: Structured request objects for different operations

111

- **Response Types**: Structured response objects containing results

112

- **Configuration Classes**: Objects for configuring voice, audio, and synthesis parameters

113

- **Enums**: Constants for audio encodings, voice genders, and other options

114

115

## Capabilities

116

117

### Speech Synthesis Operations

118

Basic text-to-speech synthesis with support for plain text and SSML input.

119

120

```api { .api }

121

# Quick synthesis example

122

response = client.synthesize_speech(

123

input=texttospeech.SynthesisInput(text="Convert this text to speech"),

124

voice=texttospeech.VoiceSelectionParams(language_code="en-US"),

125

audio_config=texttospeech.AudioConfig(

126

audio_encoding=texttospeech.AudioEncoding.LINEAR16

127

)

128

)

129

```

130

131

**See:** [Speech Synthesis](./speech-synthesis.md) for complete synthesis operations documentation.

132

133

### Voice Management

134

List and select from available voices with filtering by language and characteristics.

135

136

```api { .api }

137

# List all available voices

138

voices_response = client.list_voices()

139

for voice in voices_response.voices:

140

print(f"Voice: {voice.name}, Language: {voice.language_codes}")

141

142

# List voices for specific language

143

request = texttospeech.ListVoicesRequest(language_code="en-US")

144

response = client.list_voices(request=request)

145

```

146

147

**See:** [Voice Management](./voice-management.md) for voice discovery and selection.

148

149

### Streaming Synthesis

150

Real-time bidirectional streaming for interactive applications.

151

152

```api { .api }

153

# Streaming synthesis configuration

154

config = texttospeech.StreamingSynthesizeConfig(

155

voice=texttospeech.VoiceSelectionParams(language_code="en-US"),

156

audio_config=texttospeech.StreamingAudioConfig(

157

audio_encoding=texttospeech.AudioEncoding.LINEAR16,

158

sample_rate_hertz=22050

159

)

160

)

161

```

162

163

**See:** [Streaming Synthesis](./streaming-synthesis.md) for real-time streaming operations.

164

165

### Long Audio Synthesis

166

Generate extended audio content using long-running operations.

167

168

```api { .api }

169

from google.cloud.texttospeech_v1.services import text_to_speech_long_audio_synthesize

170

171

# Long audio client

172

long_client = text_to_speech_long_audio_synthesize.TextToSpeechLongAudioSynthesizeClient()

173

174

# Create long audio request

175

request = texttospeech.SynthesizeLongAudioRequest(

176

parent="projects/your-project-id/locations/us-central1",

177

input=texttospeech.SynthesisInput(text="Very long text content..."),

178

audio_config=texttospeech.AudioConfig(

179

audio_encoding=texttospeech.AudioEncoding.LINEAR16

180

),

181

voice=texttospeech.VoiceSelectionParams(language_code="en-US"),

182

output_gcs_uri="gs://your-bucket/output.wav"

183

)

184

```

185

186

**See:** [Long Audio Synthesis](./long-audio-synthesis.md) for extended audio operations.

187

188

### Configuration and Types

189

Comprehensive configuration options for voice selection, audio output, and advanced features.

190

191

```api { .api }

192

# Advanced voice configuration

193

advanced_voice = texttospeech.AdvancedVoiceOptions(

194

low_latency_journey_synthesis=True

195

)

196

197

# Custom pronunciations

198

custom_pronunciations = texttospeech.CustomPronunciations(

199

pronunciations=[

200

texttospeech.CustomPronunciationParams(

201

phrase="example",

202

ipa="ɪɡˈzæmpəl",

203

phonetic_encoding=texttospeech.CustomPronunciationParams.PhoneticEncoding.IPA

204

)

205

]

206

)

207

```

208

209

**See:** [Configuration Types](./configuration-types.md) for all configuration classes and options.

210

211

### Async Operations

212

Full async/await support for all Text-to-Speech operations.

213

214

```api { .api }

215

import asyncio

216

from google.cloud import texttospeech

217

218

async def synthesize_async():

219

async_client = texttospeech.TextToSpeechAsyncClient()

220

221

request = texttospeech.SynthesizeSpeechRequest(

222

input=texttospeech.SynthesisInput(text="Async synthesis"),

223

voice=texttospeech.VoiceSelectionParams(language_code="en-US"),

224

audio_config=texttospeech.AudioConfig(

225

audio_encoding=texttospeech.AudioEncoding.MP3

226

)

227

)

228

229

response = await async_client.synthesize_speech(request=request)

230

return response.audio_content

231

232

# Run async operation

233

audio_data = asyncio.run(synthesize_async())

234

```

235

236

**See:** [Async Clients](./async-clients.md) for asynchronous operation patterns.

237

238

## Audio Formats and Encodings

239

240

### Supported Audio Encodings

241

```api { .api }

242

# Available audio encoding formats

243

from google.cloud.texttospeech import AudioEncoding

244

245

LINEAR16 = AudioEncoding.LINEAR16 # 16-bit PCM with WAV header

246

MP3 = AudioEncoding.MP3 # MP3 at 32kbps

247

OGG_OPUS = AudioEncoding.OGG_OPUS # Opus in Ogg container

248

MULAW = AudioEncoding.MULAW # 8-bit G.711 PCMU/mu-law

249

ALAW = AudioEncoding.ALAW # 8-bit G.711 PCMU/A-law

250

PCM = AudioEncoding.PCM # 16-bit PCM without header

251

M4A = AudioEncoding.M4A # M4A format

252

```

253

254

## Error Handling

255

256

### Common Exception Patterns

257

```api { .api }

258

from google.api_core import exceptions

259

from google.cloud import texttospeech

260

261

try:

262

client = texttospeech.TextToSpeechClient()

263

response = client.synthesize_speech(request=request)

264

except exceptions.InvalidArgument as e:

265

print(f"Invalid request parameters: {e}")

266

except exceptions.PermissionDenied as e:

267

print(f"Permission denied: {e}")

268

except exceptions.ResourceExhausted as e:

269

print(f"Quota exceeded: {e}")

270

except Exception as e:

271

print(f"Unexpected error: {e}")

272

```

273

274

## API Versions

275

276

### Stable API (v1)

277

- Core synthesis operations

278

- Standard voice and audio configuration

279

- Streaming synthesis

280

- Long audio synthesis

281

282

### Beta API (v1beta1)

283

- All v1 features

284

- Timepoint information for SSML marks

285

- Enhanced response metadata

286

- Advanced voice features

287

288

```api { .api }

289

# Using beta API for timepoint information

290

from google.cloud import texttospeech_v1beta1

291

292

client = texttospeech_v1beta1.TextToSpeechClient()

293

294

request = texttospeech_v1beta1.SynthesizeSpeechRequest(

295

input=texttospeech_v1beta1.SynthesisInput(

296

ssml='<speak>Hello <mark name="greeting"/> world!</speak>'

297

),

298

voice=texttospeech_v1beta1.VoiceSelectionParams(language_code="en-US"),

299

audio_config=texttospeech_v1beta1.AudioConfig(

300

audio_encoding=texttospeech_v1beta1.AudioEncoding.LINEAR16

301

),

302

enable_time_pointing=[

303

texttospeech_v1beta1.SynthesizeSpeechRequest.TimepointType.SSML_MARK

304

]

305

)

306

307

response = client.synthesize_speech(request=request)

308

# Response includes timepoints field with timestamp information

309

```