Tessl Tile for pypi/google-cloud-speech@2.33.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-features.md index.md speech-adaptation.md speech-recognition.md streaming-recognition.md types-and-configuration.md

types-and-configuration.mddocs/

0
# Types and Configuration
1

2
Core data types, configuration objects, and enums for speech recognition setup and result processing across all API versions.
3

4
## Core Configuration Types
5

6
### RecognitionConfig
7

8
Main configuration object for speech recognition requests.
9

10
```python { .api }
11
class RecognitionConfig:
12
    """Configuration for speech recognition."""
13
    encoding: AudioEncoding
14
    sample_rate_hertz: int
15
    audio_channel_count: int
16
    enable_separate_recognition_per_channel: bool
17
    language_code: str
18
    alternative_language_codes: Sequence[str]
19
    max_alternatives: int
20
    profanity_filter: bool
21
    speech_contexts: Sequence[SpeechContext]
22
    enable_word_time_offsets: bool
23
    enable_word_confidence: bool
24
    enable_automatic_punctuation: bool
25
    enable_spoken_punctuation: bool
26
    enable_spoken_emojis: bool
27
    enable_speaker_diarization: bool
28
    diarization_config: SpeakerDiarizationConfig
29
    metadata: RecognitionMetadata
30
    model: str
31
    use_enhanced: bool
32
    adaptation: SpeechAdaptation
33
    transcript_normalization: TranscriptNormalization
34
    enable_voice_activity_events: bool
35
```
36

37
### RecognitionAudio
38

39
Specifies the audio input for recognition.
40

41
```python { .api }
42
class RecognitionAudio:
43
    """Audio input specification."""
44
    content: bytes  # Raw audio bytes
45
    uri: str       # Cloud Storage URI (gs://bucket/file)
46
```
47

48
### SpeakerDiarizationConfig
49

50
Configuration for speaker diarization (identifying different speakers).
51

52
```python { .api }
53
class SpeakerDiarizationConfig:
54
    """Configuration for speaker diarization."""
55
    enable_speaker_diarization: bool
56
    min_speaker_count: int
57
    max_speaker_count: int
58
    speaker_tag: int
59
```
60

61
### SpeechContext
62

63
Provides hints to improve recognition accuracy.
64

65
```python { .api }
66
class SpeechContext:
67
    """Context hints for speech recognition."""
68
    phrases: Sequence[str]
69
    boost: float
70
    speech_adaptation: SpeechAdaptation
71
```
72

73
### RecognitionMetadata
74

75
Metadata about the recognition request for analytics and optimization.
76

77
```python { .api }
78
class RecognitionMetadata:
79
    """Metadata for recognition requests."""
80
    interaction_type: InteractionType
81
    industry_naics_code_of_audio: int
82
    microphone_distance: MicrophoneDistance
83
    original_media_type: OriginalMediaType
84
    recording_device_type: RecordingDeviceType
85
    recording_device_name: str
86
    original_mime_type: str
87
    audio_topic: str
88
```
89

90
## Result Types
91

92
### SpeechRecognitionResult
93

94
Container for recognition results.
95

96
```python { .api }
97
class SpeechRecognitionResult:
98
    """Container for speech recognition results."""
99
    alternatives: Sequence[SpeechRecognitionAlternative]
100
    channel_tag: int
101
    result_end_time: Duration
102
    language_code: str
103
```
104

105
### SpeechRecognitionAlternative
106

107
Individual recognition hypothesis with confidence score.
108

109
```python { .api }
110
class SpeechRecognitionAlternative:
111
    """Individual recognition alternative."""
112
    transcript: str
113
    confidence: float
114
    words: Sequence[WordInfo]
115
```
116

117
### WordInfo
118

119
Word-level information including timing and confidence.
120

121
```python { .api }
122
class WordInfo:
123
    """Word-level recognition information."""
124
    start_time: Duration
125
    end_time: Duration
126
    word: str
127
    confidence: float
128
    speaker_tag: int
129
    speaker_label: str
130
```
131

132
### SpeechAdaptationInfo
133

134
Information about applied speech adaptations.
135

136
```python { .api }
137
class SpeechAdaptationInfo:
138
    """Information about applied speech adaptations."""
139
    adaptation_timeout: bool
140
    timeout_message: str
141
```
142

143
## Enumeration Types
144

145
### AudioEncoding
146

147
Supported audio encoding formats.
148

149
```python { .api }
150
class AudioEncoding:
151
    """Audio encoding formats."""
152
    ENCODING_UNSPECIFIED = 0
153
    LINEAR16 = 1          # 16-bit linear PCM
154
    FLAC = 2              # FLAC lossless
155
    MULAW = 3             # 8-bit mu-law
156
    AMR = 4               # AMR narrowband
157
    AMR_WB = 5            # AMR wideband
158
    OGG_OPUS = 6          # Ogg Opus
159
    SPEEX_WITH_HEADER_BYTE = 7  # Speex with header
160
    MP3 = 8               # MP3
161
    WEBM_OPUS = 9         # WebM Opus
162
```
163

164
### InteractionType
165

166
Types of user interactions for recognition optimization.
167

168
```python { .api }
169
class InteractionType:
170
    """Interaction types for recognition optimization."""
171
    INTERACTION_TYPE_UNSPECIFIED = 0
172
    DISCUSSION = 1         # Multi-participant discussion
173
    PRESENTATION = 2       # Single speaker presentation
174
    PHONE_CALL = 3        # Phone conversation
175
    VOICEMAIL = 4         # Voicemail message
176
    PROFESSIONALLY_PRODUCED = 5  # Professional audio content
177
    VOICE_SEARCH = 6      # Voice search queries
178
    VOICE_COMMAND = 7     # Voice commands
179
    DICTATION = 8         # Dictation use case
180
```
181

182
### MicrophoneDistance
183

184
Microphone distance from the audio source.
185

186
```python { .api }
187
class MicrophoneDistance:
188
    """Microphone distance categories."""
189
    MICROPHONE_DISTANCE_UNSPECIFIED = 0
190
    NEARFIELD = 1         # 0-1 meter from source
191
    MIDFIELD = 2          # 1-3 meters from source
192
    FARFIELD = 3          # 3+ meters from source
193
```
194

195
### OriginalMediaType
196

197
Original media type of the audio.
198

199
```python { .api }
200
class OriginalMediaType:
201
    """Original media type categories."""
202
    ORIGINAL_MEDIA_TYPE_UNSPECIFIED = 0
203
    AUDIO = 1             # Audio-only content
204
    VIDEO = 2             # Video content with audio track
205
```
206

207
### RecordingDeviceType
208

209
Type of device used for recording.
210

211
```python { .api }
212
class RecordingDeviceType:
213
    """Recording device types."""
214
    RECORDING_DEVICE_TYPE_UNSPECIFIED = 0
215
    SMARTPHONE = 1        # Mobile phone
216
    PC = 2               # Personal computer
217
    PHONE_LINE = 3       # Traditional phone line
218
    VEHICLE = 4          # In-vehicle system
219
    OTHER_OUTDOOR_DEVICE = 5  # Other outdoor recording
220
    OTHER_INDOOR_DEVICE = 6   # Other indoor recording
221
```
222

223
## Usage Examples
224

225
### Basic Configuration
226

227
```python
228
from google.cloud import speech
229

230
# Simple configuration for high-quality audio
231
config = speech.RecognitionConfig(
232
    encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
233
    sample_rate_hertz=44100,
234
    language_code="en-US",
235
    enable_automatic_punctuation=True,
236
    enable_word_time_offsets=True,
237
)
238

239
# Audio from file content
240
with open("audio.flac", "rb") as f:
241
    audio_content = f.read()
242

243
audio = speech.RecognitionAudio(content=audio_content)
244
```
245

246
### Advanced Configuration
247

248
```python
249
from google.cloud import speech
250

251
# Comprehensive configuration with all features
252
config = speech.RecognitionConfig(
253
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
254
    sample_rate_hertz=16000,
255
    audio_channel_count=2,
256
    enable_separate_recognition_per_channel=True,
257
    language_code="en-US",
258
    alternative_language_codes=["en-GB", "en-AU"],
259
    max_alternatives=3,
260
    profanity_filter=True,
261
    enable_word_time_offsets=True,
262
    enable_word_confidence=True,
263
    enable_automatic_punctuation=True,
264
    enable_speaker_diarization=True,
265
    diarization_config=speech.SpeakerDiarizationConfig(
266
        enable_speaker_diarization=True,
267
        min_speaker_count=2,
268
        max_speaker_count=6,
269
    ),
270
    metadata=speech.RecognitionMetadata(
271
        interaction_type=speech.RecognitionMetadata.InteractionType.DISCUSSION,
272
        microphone_distance=speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD,
273
        original_media_type=speech.RecognitionMetadata.OriginalMediaType.AUDIO,
274
        recording_device_type=speech.RecognitionMetadata.RecordingDeviceType.SMARTPHONE,
275
    ),
276
    speech_contexts=[
277
        speech.SpeechContext(
278
            phrases=["technical", "terminology", "API", "cloud computing"],
279
            boost=10.0
280
        )
281
    ],
282
    use_enhanced=True,  # Use enhanced model
283
)
284

285
# Cloud Storage audio
286
audio = speech.RecognitionAudio(
287
    uri="gs://your-bucket/meeting-recording.wav"
288
)
289
```
290

291
### Processing Results
292

293
```python
294
# Process comprehensive results
295
response = client.recognize(config=config, audio=audio)
296

297
for i, result in enumerate(response.results):
298
    print(f"Result {i + 1}:")
299
    
300
    # Process alternatives
301
    for j, alternative in enumerate(result.alternatives):
302
        print(f"  Alternative {j + 1} (confidence: {alternative.confidence:.2f}):")
303
        print(f"    Transcript: {alternative.transcript}")
304
        
305
        # Process word-level information
306
        if alternative.words:
307
            print("    Word details:")
308
            for word in alternative.words[:5]:  # Show first 5 words
309
                print(f"      '{word.word}': "
310
                      f"{word.start_time.total_seconds():.1f}s-"
311
                      f"{word.end_time.total_seconds():.1f}s "
312
                      f"(confidence: {word.confidence:.2f})")
313
                if word.speaker_tag:
314
                    print(f"        Speaker: {word.speaker_tag}")
315

316
# Access metadata
317
if response.speech_adaptation_info:
318
    if response.speech_adaptation_info.adaptation_timeout:
319
        print("Warning: Speech adaptation timed out")
320
```
321

322
## Configuration Best Practices
323

324
### Audio Quality Settings
325

326
```python
327
# Optimal settings for different audio sources
328
phone_config = speech.RecognitionConfig(
329
    encoding=speech.RecognitionConfig.AudioEncoding.MULAW,
330
    sample_rate_hertz=8000,
331
    language_code="en-US",
332
    metadata=speech.RecognitionMetadata(
333
        interaction_type=speech.RecognitionMetadata.InteractionType.PHONE_CALL,
334
        microphone_distance=speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD,
335
        recording_device_type=speech.RecognitionMetadata.RecordingDeviceType.PHONE_LINE,
336
    ),
337
)
338

339
# High-quality studio recording
340
studio_config = speech.RecognitionConfig(
341
    encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
342
    sample_rate_hertz=48000,
343
    language_code="en-US",
344
    use_enhanced=True,
345
    metadata=speech.RecognitionMetadata(
346
        interaction_type=speech.RecognitionMetadata.InteractionType.PROFESSIONALLY_PRODUCED,
347
        microphone_distance=speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD,
348
        original_media_type=speech.RecognitionMetadata.OriginalMediaType.AUDIO,
349
    ),
350
)
351

352
# Mobile app recording
353
mobile_config = speech.RecognitionConfig(
354
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
355
    sample_rate_hertz=16000,
356
    language_code="en-US",
357
    enable_automatic_punctuation=True,
358
    metadata=speech.RecognitionMetadata(
359
        interaction_type=speech.RecognitionMetadata.InteractionType.VOICE_COMMAND,
360
        microphone_distance=speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD,
361
        recording_device_type=speech.RecognitionMetadata.RecordingDeviceType.SMARTPHONE,
362
    ),
363
)
364
```
365

366
### Language Configuration
367

368
```python
369
# Multi-language support
370
multilingual_config = speech.RecognitionConfig(
371
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
372
    sample_rate_hertz=16000,
373
    language_code="en-US",  # Primary language
374
    alternative_language_codes=[
375
        "es-ES",  # Spanish
376
        "fr-FR",  # French
377
        "de-DE",  # German
378
    ],
379
    max_alternatives=2,  # Get alternatives for uncertain regions
380
)
381
```
382

383
### Performance Optimization
384

385
```python
386
# Optimized for speed vs accuracy trade-offs
387
fast_config = speech.RecognitionConfig(
388
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
389
    sample_rate_hertz=16000,
390
    language_code="en-US",
391
    max_alternatives=1,          # Single alternative
392
    enable_word_time_offsets=False,  # Skip word timing
393
    enable_word_confidence=False,    # Skip word confidence
394
    # Keep automatic punctuation for readability
395
    enable_automatic_punctuation=True,
396
)
397

398
# Optimized for maximum accuracy
399
accurate_config = speech.RecognitionConfig(
400
    encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
401
    sample_rate_hertz=48000,
402
    language_code="en-US",
403
    use_enhanced=True,               # Enhanced model
404
    max_alternatives=3,              # Multiple alternatives
405
    enable_word_time_offsets=True,   # Word-level timing
406
    enable_word_confidence=True,     # Word-level confidence
407
    enable_automatic_punctuation=True,
408
    enable_speaker_diarization=True,
409
    diarization_config=speech.SpeakerDiarizationConfig(
410
        enable_speaker_diarization=True,
411
        min_speaker_count=1,
412
        max_speaker_count=10,
413
    ),
414
)
415
```
416

417
## Common Data Patterns
418

419
### Duration and Timestamp Handling
420

421
```python
422
from google.protobuf.duration_pb2 import Duration
423

424
# Working with Duration objects
425
for word in alternative.words:
426
    # Convert to seconds
427
    start_seconds = word.start_time.total_seconds()
428
    end_seconds = word.end_time.total_seconds()
429
    duration = end_seconds - start_seconds
430
    
431
    print(f"Word '{word.word}': {start_seconds:.2f}s - {end_seconds:.2f}s ({duration:.2f}s)")
432
```
433

434
### Error Handling with Type Information
435

436
```python
437
from google.api_core import exceptions
438
from google.cloud import speech
439

440
try:
441
    response = client.recognize(config=config, audio=audio)
442
    
443
    # Check for empty results
444
    if not response.results:
445
        print("No speech detected in audio")
446
    
447
    # Validate result structure
448
    for result in response.results:
449
        if not result.alternatives:
450
            print("No alternatives found for this result")
451
            continue
452
            
453
        best_alternative = result.alternatives[0]
454
        if best_alternative.confidence < 0.5:
455
            print(f"Low confidence result: {best_alternative.confidence}")
456
            
457
except exceptions.InvalidArgument as e:
458
    print(f"Invalid configuration: {e}")
459
except exceptions.OutOfRange as e:
460
    print(f"Audio too long or other limit exceeded: {e}")
461
except exceptions.DeadlineExceeded as e:
462
    print(f"Request timed out: {e}")
463
```

Version

Tile

Files

types-and-configuration.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

types-and-configuration.mddocs/