0
# Google Cloud Speech
1
2
Google Cloud Speech API client library providing advanced speech-to-text conversion capabilities. This package offers real-time streaming recognition, batch processing, and custom speech adaptation, serving as Python's interface to Google's industry-leading speech recognition technology.
3
4
## Package Information
5
6
- **Package Name**: google-cloud-speech
7
- **Language**: Python
8
- **Installation**: `pip install google-cloud-speech`
9
- **Minimum Python Version**: 3.7+
10
11
## Core Imports
12
13
Default import (uses v1 API):
14
15
```python
16
from google.cloud import speech
17
```
18
19
Version-specific imports:
20
21
```python
22
from google.cloud import speech_v1 # Stable API
23
from google.cloud import speech_v1p1beta1 # Beta features
24
from google.cloud import speech_v2 # Next-generation API
25
```
26
27
Common client initialization:
28
29
```python
30
from google.cloud import speech
31
32
# Initialize the speech client
33
client = speech.SpeechClient()
34
```
35
36
## Basic Usage
37
38
```python
39
from google.cloud import speech
40
import io
41
42
# Initialize the client
43
client = speech.SpeechClient()
44
45
# Load audio file
46
with io.open("audio_file.wav", "rb") as audio_file:
47
content = audio_file.read()
48
49
# Configure recognition
50
audio = speech.RecognitionAudio(content=content)
51
config = speech.RecognitionConfig(
52
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
53
sample_rate_hertz=16000,
54
language_code="en-US",
55
)
56
57
# Perform speech recognition
58
response = client.recognize(config=config, audio=audio)
59
60
# Process results
61
for result in response.results:
62
print(f"Transcript: {result.alternatives[0].transcript}")
63
print(f"Confidence: {result.alternatives[0].confidence}")
64
```
65
66
## Architecture
67
68
The Google Cloud Speech API provides three main API versions:
69
70
- **v1 (Stable)**: Core speech recognition functionality with synchronous, asynchronous, and streaming recognition
71
- **v1p1beta1 (Beta)**: Extended v1 features with experimental capabilities
72
- **v2 (Next-Generation)**: Advanced features including recognizer management, batch processing, and enhanced output formats
73
74
### Client Structure
75
76
- **SpeechClient**: Primary client for speech recognition operations
77
- **AdaptationClient**: Manages custom speech adaptation resources (phrase sets, custom classes)
78
- **SpeechHelpers**: Simplified interfaces for complex operations like streaming (mixed into SpeechClient)
79
- **AsyncClients**: Asynchronous versions of all clients for non-blocking operations
80
81
### Recognition Modes
82
83
- **Synchronous**: Real-time recognition for short audio (< 1 minute)
84
- **Asynchronous**: Long-running recognition for longer audio files
85
- **Streaming**: Real-time bidirectional streaming for live audio
86
87
## Capabilities
88
89
### Speech Recognition
90
91
Core speech-to-text functionality supporting synchronous, asynchronous, and streaming recognition modes with extensive configuration options.
92
93
```python { .api }
94
class SpeechClient:
95
def recognize(
96
self,
97
config: RecognitionConfig,
98
audio: RecognitionAudio,
99
*,
100
retry: OptionalRetry = None,
101
timeout: Optional[float] = None,
102
metadata: Sequence[Tuple[str, str]] = ()
103
) -> RecognizeResponse: ...
104
105
def long_running_recognize(
106
self,
107
config: RecognitionConfig,
108
audio: RecognitionAudio,
109
*,
110
retry: OptionalRetry = None,
111
timeout: Optional[float] = None,
112
metadata: Sequence[Tuple[str, str]] = ()
113
) -> Operation: ...
114
```
115
116
[Speech Recognition](./speech-recognition.md)
117
118
### Streaming Recognition
119
120
Real-time bidirectional streaming speech recognition for live audio processing with immediate results.
121
122
```python { .api }
123
class SpeechClient:
124
def streaming_recognize(
125
self,
126
requests: Iterator[StreamingRecognizeRequest],
127
*,
128
retry: OptionalRetry = None,
129
timeout: Optional[float] = None,
130
metadata: Sequence[Tuple[str, str]] = ()
131
) -> Iterator[StreamingRecognizeResponse]: ...
132
```
133
134
[Streaming Recognition](./streaming-recognition.md)
135
136
### Speech Adaptation
137
138
Custom speech model adaptation using phrase sets and custom word classes to improve recognition accuracy for domain-specific vocabulary.
139
140
```python { .api }
141
class AdaptationClient:
142
def create_phrase_set(
143
self,
144
request: CreatePhraseSetRequest,
145
*,
146
retry: OptionalRetry = None,
147
timeout: Optional[float] = None,
148
metadata: Sequence[Tuple[str, str]] = ()
149
) -> PhraseSet: ...
150
151
def create_custom_class(
152
self,
153
request: CreateCustomClassRequest,
154
*,
155
retry: OptionalRetry = None,
156
timeout: Optional[float] = None,
157
metadata: Sequence[Tuple[str, str]] = ()
158
) -> CustomClass: ...
159
```
160
161
[Speech Adaptation](./speech-adaptation.md)
162
163
### Advanced Features (v2)
164
165
Next-generation API features including batch recognition, recognizer management, and enhanced output formatting.
166
167
```python { .api }
168
class SpeechClient: # v2
169
def batch_recognize(
170
self,
171
request: BatchRecognizeRequest,
172
*,
173
retry: OptionalRetry = None,
174
timeout: Optional[float] = None,
175
metadata: Sequence[Tuple[str, str]] = ()
176
) -> Operation: ...
177
178
def create_recognizer(
179
self,
180
request: CreateRecognizerRequest,
181
*,
182
retry: OptionalRetry = None,
183
timeout: Optional[float] = None,
184
metadata: Sequence[Tuple[str, str]] = ()
185
) -> Operation: ...
186
```
187
188
[Advanced Features](./advanced-features.md)
189
190
### Async Clients
191
192
Asynchronous client interfaces for all API versions, enabling non-blocking speech recognition operations in async Python applications.
193
194
```python { .api }
195
class SpeechAsyncClient:
196
async def recognize(
197
self,
198
config: RecognitionConfig,
199
audio: RecognitionAudio,
200
*,
201
retry: OptionalRetry = None,
202
timeout: Optional[float] = None,
203
metadata: Sequence[Tuple[str, str]] = ()
204
) -> RecognizeResponse: ...
205
206
async def long_running_recognize(
207
self,
208
config: RecognitionConfig,
209
audio: RecognitionAudio,
210
*,
211
retry: OptionalRetry = None,
212
timeout: Optional[float] = None,
213
metadata: Sequence[Tuple[str, str]] = ()
214
) -> Operation: ...
215
216
class AdaptationAsyncClient:
217
async def create_phrase_set(
218
self,
219
request: CreatePhraseSetRequest,
220
*,
221
retry: OptionalRetry = None,
222
timeout: Optional[float] = None,
223
metadata: Sequence[Tuple[str, str]] = ()
224
) -> PhraseSet: ...
225
```
226
227
### Types and Configuration
228
229
Core data types, configuration objects, and enums for speech recognition setup and result processing.
230
231
```python { .api }
232
class RecognitionConfig:
233
encoding: AudioEncoding
234
sample_rate_hertz: int
235
language_code: str
236
enable_automatic_punctuation: bool
237
enable_speaker_diarization: bool
238
diarization_config: SpeakerDiarizationConfig
239
speech_contexts: Sequence[SpeechContext]
240
241
class RecognitionAudio:
242
content: bytes
243
uri: str
244
```
245
246
[Types and Configuration](./types-and-configuration.md)
247
248
## Common Patterns
249
250
### Error Handling
251
252
```python
253
from google.api_core import exceptions
254
from google.cloud import speech
255
256
client = speech.SpeechClient()
257
258
try:
259
response = client.recognize(config=config, audio=audio)
260
except exceptions.InvalidArgument as e:
261
print(f"Invalid request: {e}")
262
except exceptions.DeadlineExceeded as e:
263
print(f"Request timeout: {e}")
264
```
265
266
### Async Operations
267
268
```python
269
from google.cloud import speech
270
271
client = speech.SpeechClient()
272
273
# Start long-running operation
274
operation = client.long_running_recognize(config=config, audio=audio)
275
276
# Wait for completion
277
response = operation.result(timeout=300)
278
```
279
280
### Async Client Usage
281
282
```python
283
import asyncio
284
from google.cloud import speech
285
286
async def async_speech_recognition():
287
# Initialize async client
288
client = speech.SpeechAsyncClient()
289
290
# Configure recognition
291
config = speech.RecognitionConfig(
292
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
293
sample_rate_hertz=16000,
294
language_code="en-US",
295
)
296
audio = speech.RecognitionAudio(content=audio_content)
297
298
# Perform async recognition
299
response = await client.recognize(config=config, audio=audio)
300
301
# Process results
302
for result in response.results:
303
print(f"Transcript: {result.alternatives[0].transcript}")
304
305
# Close the client
306
await client.transport.close()
307
308
# Run async function
309
asyncio.run(async_speech_recognition())
310
```