Tessl Tile for pypi/google-cloud-speech@2.33.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-features.md index.md speech-adaptation.md speech-recognition.md streaming-recognition.md types-and-configuration.md

index.mddocs/

0
# Google Cloud Speech
1

2
Google Cloud Speech API client library providing advanced speech-to-text conversion capabilities. This package offers real-time streaming recognition, batch processing, and custom speech adaptation, serving as Python's interface to Google's industry-leading speech recognition technology.
3

4
## Package Information
5

6
- **Package Name**: google-cloud-speech
7
- **Language**: Python
8
- **Installation**: `pip install google-cloud-speech`
9
- **Minimum Python Version**: 3.7+
10

11
## Core Imports
12

13
Default import (uses v1 API):
14

15
```python
16
from google.cloud import speech
17
```
18

19
Version-specific imports:
20

21
```python
22
from google.cloud import speech_v1      # Stable API
23
from google.cloud import speech_v1p1beta1  # Beta features
24
from google.cloud import speech_v2      # Next-generation API
25
```
26

27
Common client initialization:
28

29
```python
30
from google.cloud import speech
31

32
# Initialize the speech client
33
client = speech.SpeechClient()
34
```
35

36
## Basic Usage
37

38
```python
39
from google.cloud import speech
40
import io
41

42
# Initialize the client
43
client = speech.SpeechClient()
44

45
# Load audio file
46
with io.open("audio_file.wav", "rb") as audio_file:
47
    content = audio_file.read()
48

49
# Configure recognition
50
audio = speech.RecognitionAudio(content=content)
51
config = speech.RecognitionConfig(
52
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
53
    sample_rate_hertz=16000,
54
    language_code="en-US",
55
)
56

57
# Perform speech recognition
58
response = client.recognize(config=config, audio=audio)
59

60
# Process results
61
for result in response.results:
62
    print(f"Transcript: {result.alternatives[0].transcript}")
63
    print(f"Confidence: {result.alternatives[0].confidence}")
64
```
65

66
## Architecture
67

68
The Google Cloud Speech API provides three main API versions:
69

70
- **v1 (Stable)**: Core speech recognition functionality with synchronous, asynchronous, and streaming recognition
71
- **v1p1beta1 (Beta)**: Extended v1 features with experimental capabilities
72
- **v2 (Next-Generation)**: Advanced features including recognizer management, batch processing, and enhanced output formats
73

74
### Client Structure
75

76
- **SpeechClient**: Primary client for speech recognition operations
77
- **AdaptationClient**: Manages custom speech adaptation resources (phrase sets, custom classes)
78
- **SpeechHelpers**: Simplified interfaces for complex operations like streaming (mixed into SpeechClient)
79
- **AsyncClients**: Asynchronous versions of all clients for non-blocking operations
80

81
### Recognition Modes
82

83
- **Synchronous**: Real-time recognition for short audio (< 1 minute)
84
- **Asynchronous**: Long-running recognition for longer audio files
85
- **Streaming**: Real-time bidirectional streaming for live audio
86

87
## Capabilities
88

89
### Speech Recognition
90

91
Core speech-to-text functionality supporting synchronous, asynchronous, and streaming recognition modes with extensive configuration options.
92

93
```python { .api }
94
class SpeechClient:
95
    def recognize(
96
        self,
97
        config: RecognitionConfig,
98
        audio: RecognitionAudio,
99
        *,
100
        retry: OptionalRetry = None,
101
        timeout: Optional[float] = None,
102
        metadata: Sequence[Tuple[str, str]] = ()
103
    ) -> RecognizeResponse: ...
104
    
105
    def long_running_recognize(
106
        self,
107
        config: RecognitionConfig,
108
        audio: RecognitionAudio,
109
        *,
110
        retry: OptionalRetry = None,
111
        timeout: Optional[float] = None,
112
        metadata: Sequence[Tuple[str, str]] = ()
113
    ) -> Operation: ...
114
```
115

116
[Speech Recognition](./speech-recognition.md)
117

118
### Streaming Recognition
119

120
Real-time bidirectional streaming speech recognition for live audio processing with immediate results.
121

122
```python { .api }
123
class SpeechClient:
124
    def streaming_recognize(
125
        self,
126
        requests: Iterator[StreamingRecognizeRequest],
127
        *,
128
        retry: OptionalRetry = None,
129
        timeout: Optional[float] = None,
130
        metadata: Sequence[Tuple[str, str]] = ()
131
    ) -> Iterator[StreamingRecognizeResponse]: ...
132
```
133

134
[Streaming Recognition](./streaming-recognition.md)
135

136
### Speech Adaptation
137

138
Custom speech model adaptation using phrase sets and custom word classes to improve recognition accuracy for domain-specific vocabulary.
139

140
```python { .api }
141
class AdaptationClient:
142
    def create_phrase_set(
143
        self,
144
        request: CreatePhraseSetRequest,
145
        *,
146
        retry: OptionalRetry = None,
147
        timeout: Optional[float] = None,
148
        metadata: Sequence[Tuple[str, str]] = ()
149
    ) -> PhraseSet: ...
150
    
151
    def create_custom_class(
152
        self,
153
        request: CreateCustomClassRequest,
154
        *,
155
        retry: OptionalRetry = None,
156
        timeout: Optional[float] = None,
157
        metadata: Sequence[Tuple[str, str]] = ()
158
    ) -> CustomClass: ...
159
```
160

161
[Speech Adaptation](./speech-adaptation.md)
162

163
### Advanced Features (v2)
164

165
Next-generation API features including batch recognition, recognizer management, and enhanced output formatting.
166

167
```python { .api }
168
class SpeechClient:  # v2
169
    def batch_recognize(
170
        self,
171
        request: BatchRecognizeRequest,
172
        *,
173
        retry: OptionalRetry = None,
174
        timeout: Optional[float] = None,
175
        metadata: Sequence[Tuple[str, str]] = ()
176
    ) -> Operation: ...
177
    
178
    def create_recognizer(
179
        self,
180
        request: CreateRecognizerRequest,
181
        *,
182
        retry: OptionalRetry = None,
183
        timeout: Optional[float] = None,
184
        metadata: Sequence[Tuple[str, str]] = ()
185
    ) -> Operation: ...
186
```
187

188
[Advanced Features](./advanced-features.md)
189

190
### Async Clients
191

192
Asynchronous client interfaces for all API versions, enabling non-blocking speech recognition operations in async Python applications.
193

194
```python { .api }
195
class SpeechAsyncClient:
196
    async def recognize(
197
        self,
198
        config: RecognitionConfig,
199
        audio: RecognitionAudio,
200
        *,
201
        retry: OptionalRetry = None,
202
        timeout: Optional[float] = None,
203
        metadata: Sequence[Tuple[str, str]] = ()
204
    ) -> RecognizeResponse: ...
205
    
206
    async def long_running_recognize(
207
        self,
208
        config: RecognitionConfig,
209
        audio: RecognitionAudio,
210
        *,
211
        retry: OptionalRetry = None,
212
        timeout: Optional[float] = None,
213
        metadata: Sequence[Tuple[str, str]] = ()
214
    ) -> Operation: ...
215

216
class AdaptationAsyncClient:
217
    async def create_phrase_set(
218
        self,
219
        request: CreatePhraseSetRequest,
220
        *,
221
        retry: OptionalRetry = None,
222
        timeout: Optional[float] = None,
223
        metadata: Sequence[Tuple[str, str]] = ()
224
    ) -> PhraseSet: ...
225
```
226

227
### Types and Configuration
228

229
Core data types, configuration objects, and enums for speech recognition setup and result processing.
230

231
```python { .api }
232
class RecognitionConfig:
233
    encoding: AudioEncoding
234
    sample_rate_hertz: int
235
    language_code: str
236
    enable_automatic_punctuation: bool
237
    enable_speaker_diarization: bool
238
    diarization_config: SpeakerDiarizationConfig
239
    speech_contexts: Sequence[SpeechContext]
240
    
241
class RecognitionAudio:
242
    content: bytes
243
    uri: str
244
```
245

246
[Types and Configuration](./types-and-configuration.md)
247

248
## Common Patterns
249

250
### Error Handling
251

252
```python
253
from google.api_core import exceptions
254
from google.cloud import speech
255

256
client = speech.SpeechClient()
257

258
try:
259
    response = client.recognize(config=config, audio=audio)
260
except exceptions.InvalidArgument as e:
261
    print(f"Invalid request: {e}")
262
except exceptions.DeadlineExceeded as e:
263
    print(f"Request timeout: {e}")
264
```
265

266
### Async Operations
267

268
```python
269
from google.cloud import speech
270

271
client = speech.SpeechClient()
272

273
# Start long-running operation
274
operation = client.long_running_recognize(config=config, audio=audio)
275

276
# Wait for completion
277
response = operation.result(timeout=300)
278
```
279

280
### Async Client Usage
281

282
```python
283
import asyncio
284
from google.cloud import speech
285

286
async def async_speech_recognition():
287
    # Initialize async client
288
    client = speech.SpeechAsyncClient()
289
    
290
    # Configure recognition
291
    config = speech.RecognitionConfig(
292
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
293
        sample_rate_hertz=16000,
294
        language_code="en-US",
295
    )
296
    audio = speech.RecognitionAudio(content=audio_content)
297
    
298
    # Perform async recognition
299
    response = await client.recognize(config=config, audio=audio)
300
    
301
    # Process results
302
    for result in response.results:
303
        print(f"Transcript: {result.alternatives[0].transcript}")
304
    
305
    # Close the client
306
    await client.transport.close()
307

308
# Run async function
309
asyncio.run(async_speech_recognition())
310
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/