or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-google-cloud-speech

Google Cloud Speech API client library for speech-to-text conversion with support for real-time streaming, batch processing, and advanced speech recognition models

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/google-cloud-speech@2.33.x

To install, run

npx @tessl/cli install tessl/pypi-google-cloud-speech@2.33.0

0

# Google Cloud Speech

1

2

Google Cloud Speech API client library providing advanced speech-to-text conversion capabilities. This package offers real-time streaming recognition, batch processing, and custom speech adaptation, serving as Python's interface to Google's industry-leading speech recognition technology.

3

4

## Package Information

5

6

- **Package Name**: google-cloud-speech

7

- **Language**: Python

8

- **Installation**: `pip install google-cloud-speech`

9

- **Minimum Python Version**: 3.7+

10

11

## Core Imports

12

13

Default import (uses v1 API):

14

15

```python

16

from google.cloud import speech

17

```

18

19

Version-specific imports:

20

21

```python

22

from google.cloud import speech_v1 # Stable API

23

from google.cloud import speech_v1p1beta1 # Beta features

24

from google.cloud import speech_v2 # Next-generation API

25

```

26

27

Common client initialization:

28

29

```python

30

from google.cloud import speech

31

32

# Initialize the speech client

33

client = speech.SpeechClient()

34

```

35

36

## Basic Usage

37

38

```python

39

from google.cloud import speech

40

import io

41

42

# Initialize the client

43

client = speech.SpeechClient()

44

45

# Load audio file

46

with io.open("audio_file.wav", "rb") as audio_file:

47

content = audio_file.read()

48

49

# Configure recognition

50

audio = speech.RecognitionAudio(content=content)

51

config = speech.RecognitionConfig(

52

encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,

53

sample_rate_hertz=16000,

54

language_code="en-US",

55

)

56

57

# Perform speech recognition

58

response = client.recognize(config=config, audio=audio)

59

60

# Process results

61

for result in response.results:

62

print(f"Transcript: {result.alternatives[0].transcript}")

63

print(f"Confidence: {result.alternatives[0].confidence}")

64

```

65

66

## Architecture

67

68

The Google Cloud Speech API provides three main API versions:

69

70

- **v1 (Stable)**: Core speech recognition functionality with synchronous, asynchronous, and streaming recognition

71

- **v1p1beta1 (Beta)**: Extended v1 features with experimental capabilities

72

- **v2 (Next-Generation)**: Advanced features including recognizer management, batch processing, and enhanced output formats

73

74

### Client Structure

75

76

- **SpeechClient**: Primary client for speech recognition operations

77

- **AdaptationClient**: Manages custom speech adaptation resources (phrase sets, custom classes)

78

- **SpeechHelpers**: Simplified interfaces for complex operations like streaming (mixed into SpeechClient)

79

- **AsyncClients**: Asynchronous versions of all clients for non-blocking operations

80

81

### Recognition Modes

82

83

- **Synchronous**: Real-time recognition for short audio (< 1 minute)

84

- **Asynchronous**: Long-running recognition for longer audio files

85

- **Streaming**: Real-time bidirectional streaming for live audio

86

87

## Capabilities

88

89

### Speech Recognition

90

91

Core speech-to-text functionality supporting synchronous, asynchronous, and streaming recognition modes with extensive configuration options.

92

93

```python { .api }

94

class SpeechClient:

95

def recognize(

96

self,

97

config: RecognitionConfig,

98

audio: RecognitionAudio,

99

*,

100

retry: OptionalRetry = None,

101

timeout: Optional[float] = None,

102

metadata: Sequence[Tuple[str, str]] = ()

103

) -> RecognizeResponse: ...

104

105

def long_running_recognize(

106

self,

107

config: RecognitionConfig,

108

audio: RecognitionAudio,

109

*,

110

retry: OptionalRetry = None,

111

timeout: Optional[float] = None,

112

metadata: Sequence[Tuple[str, str]] = ()

113

) -> Operation: ...

114

```

115

116

[Speech Recognition](./speech-recognition.md)

117

118

### Streaming Recognition

119

120

Real-time bidirectional streaming speech recognition for live audio processing with immediate results.

121

122

```python { .api }

123

class SpeechClient:

124

def streaming_recognize(

125

self,

126

requests: Iterator[StreamingRecognizeRequest],

127

*,

128

retry: OptionalRetry = None,

129

timeout: Optional[float] = None,

130

metadata: Sequence[Tuple[str, str]] = ()

131

) -> Iterator[StreamingRecognizeResponse]: ...

132

```

133

134

[Streaming Recognition](./streaming-recognition.md)

135

136

### Speech Adaptation

137

138

Custom speech model adaptation using phrase sets and custom word classes to improve recognition accuracy for domain-specific vocabulary.

139

140

```python { .api }

141

class AdaptationClient:

142

def create_phrase_set(

143

self,

144

request: CreatePhraseSetRequest,

145

*,

146

retry: OptionalRetry = None,

147

timeout: Optional[float] = None,

148

metadata: Sequence[Tuple[str, str]] = ()

149

) -> PhraseSet: ...

150

151

def create_custom_class(

152

self,

153

request: CreateCustomClassRequest,

154

*,

155

retry: OptionalRetry = None,

156

timeout: Optional[float] = None,

157

metadata: Sequence[Tuple[str, str]] = ()

158

) -> CustomClass: ...

159

```

160

161

[Speech Adaptation](./speech-adaptation.md)

162

163

### Advanced Features (v2)

164

165

Next-generation API features including batch recognition, recognizer management, and enhanced output formatting.

166

167

```python { .api }

168

class SpeechClient: # v2

169

def batch_recognize(

170

self,

171

request: BatchRecognizeRequest,

172

*,

173

retry: OptionalRetry = None,

174

timeout: Optional[float] = None,

175

metadata: Sequence[Tuple[str, str]] = ()

176

) -> Operation: ...

177

178

def create_recognizer(

179

self,

180

request: CreateRecognizerRequest,

181

*,

182

retry: OptionalRetry = None,

183

timeout: Optional[float] = None,

184

metadata: Sequence[Tuple[str, str]] = ()

185

) -> Operation: ...

186

```

187

188

[Advanced Features](./advanced-features.md)

189

190

### Async Clients

191

192

Asynchronous client interfaces for all API versions, enabling non-blocking speech recognition operations in async Python applications.

193

194

```python { .api }

195

class SpeechAsyncClient:

196

async def recognize(

197

self,

198

config: RecognitionConfig,

199

audio: RecognitionAudio,

200

*,

201

retry: OptionalRetry = None,

202

timeout: Optional[float] = None,

203

metadata: Sequence[Tuple[str, str]] = ()

204

) -> RecognizeResponse: ...

205

206

async def long_running_recognize(

207

self,

208

config: RecognitionConfig,

209

audio: RecognitionAudio,

210

*,

211

retry: OptionalRetry = None,

212

timeout: Optional[float] = None,

213

metadata: Sequence[Tuple[str, str]] = ()

214

) -> Operation: ...

215

216

class AdaptationAsyncClient:

217

async def create_phrase_set(

218

self,

219

request: CreatePhraseSetRequest,

220

*,

221

retry: OptionalRetry = None,

222

timeout: Optional[float] = None,

223

metadata: Sequence[Tuple[str, str]] = ()

224

) -> PhraseSet: ...

225

```

226

227

### Types and Configuration

228

229

Core data types, configuration objects, and enums for speech recognition setup and result processing.

230

231

```python { .api }

232

class RecognitionConfig:

233

encoding: AudioEncoding

234

sample_rate_hertz: int

235

language_code: str

236

enable_automatic_punctuation: bool

237

enable_speaker_diarization: bool

238

diarization_config: SpeakerDiarizationConfig

239

speech_contexts: Sequence[SpeechContext]

240

241

class RecognitionAudio:

242

content: bytes

243

uri: str

244

```

245

246

[Types and Configuration](./types-and-configuration.md)

247

248

## Common Patterns

249

250

### Error Handling

251

252

```python

253

from google.api_core import exceptions

254

from google.cloud import speech

255

256

client = speech.SpeechClient()

257

258

try:

259

response = client.recognize(config=config, audio=audio)

260

except exceptions.InvalidArgument as e:

261

print(f"Invalid request: {e}")

262

except exceptions.DeadlineExceeded as e:

263

print(f"Request timeout: {e}")

264

```

265

266

### Async Operations

267

268

```python

269

from google.cloud import speech

270

271

client = speech.SpeechClient()

272

273

# Start long-running operation

274

operation = client.long_running_recognize(config=config, audio=audio)

275

276

# Wait for completion

277

response = operation.result(timeout=300)

278

```

279

280

### Async Client Usage

281

282

```python

283

import asyncio

284

from google.cloud import speech

285

286

async def async_speech_recognition():

287

# Initialize async client

288

client = speech.SpeechAsyncClient()

289

290

# Configure recognition

291

config = speech.RecognitionConfig(

292

encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,

293

sample_rate_hertz=16000,

294

language_code="en-US",

295

)

296

audio = speech.RecognitionAudio(content=audio_content)

297

298

# Perform async recognition

299

response = await client.recognize(config=config, audio=audio)

300

301

# Process results

302

for result in response.results:

303

print(f"Transcript: {result.alternatives[0].transcript}")

304

305

# Close the client

306

await client.transport.close()

307

308

# Run async function

309

asyncio.run(async_speech_recognition())

310

```