or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

advanced-features.mdindex.mdspeech-adaptation.mdspeech-recognition.mdstreaming-recognition.mdtypes-and-configuration.md
tile.json

tessl/pypi-google-cloud-speech

Google Cloud Speech API client library for speech-to-text conversion with support for real-time streaming, batch processing, and advanced speech recognition models

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/google-cloud-speech@2.33.x

To install, run

npx @tessl/cli install tessl/pypi-google-cloud-speech@2.33.0

index.mddocs/

Google Cloud Speech

Google Cloud Speech API client library providing advanced speech-to-text conversion capabilities. This package offers real-time streaming recognition, batch processing, and custom speech adaptation, serving as Python's interface to Google's industry-leading speech recognition technology.

Package Information

  • Package Name: google-cloud-speech
  • Language: Python
  • Installation: pip install google-cloud-speech
  • Minimum Python Version: 3.7+

Core Imports

Default import (uses v1 API):

from google.cloud import speech

Version-specific imports:

from google.cloud import speech_v1      # Stable API
from google.cloud import speech_v1p1beta1  # Beta features
from google.cloud import speech_v2      # Next-generation API

Common client initialization:

from google.cloud import speech

# Initialize the speech client
client = speech.SpeechClient()

Basic Usage

from google.cloud import speech
import io

# Initialize the client
client = speech.SpeechClient()

# Load audio file
with io.open("audio_file.wav", "rb") as audio_file:
    content = audio_file.read()

# Configure recognition
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
)

# Perform speech recognition
response = client.recognize(config=config, audio=audio)

# Process results
for result in response.results:
    print(f"Transcript: {result.alternatives[0].transcript}")
    print(f"Confidence: {result.alternatives[0].confidence}")

Architecture

The Google Cloud Speech API provides three main API versions:

  • v1 (Stable): Core speech recognition functionality with synchronous, asynchronous, and streaming recognition
  • v1p1beta1 (Beta): Extended v1 features with experimental capabilities
  • v2 (Next-Generation): Advanced features including recognizer management, batch processing, and enhanced output formats

Client Structure

  • SpeechClient: Primary client for speech recognition operations
  • AdaptationClient: Manages custom speech adaptation resources (phrase sets, custom classes)
  • SpeechHelpers: Simplified interfaces for complex operations like streaming (mixed into SpeechClient)
  • AsyncClients: Asynchronous versions of all clients for non-blocking operations

Recognition Modes

  • Synchronous: Real-time recognition for short audio (< 1 minute)
  • Asynchronous: Long-running recognition for longer audio files
  • Streaming: Real-time bidirectional streaming for live audio

Capabilities

Speech Recognition

Core speech-to-text functionality supporting synchronous, asynchronous, and streaming recognition modes with extensive configuration options.

class SpeechClient:
    def recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> RecognizeResponse: ...
    
    def long_running_recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...

Speech Recognition

Streaming Recognition

Real-time bidirectional streaming speech recognition for live audio processing with immediate results.

class SpeechClient:
    def streaming_recognize(
        self,
        requests: Iterator[StreamingRecognizeRequest],
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Iterator[StreamingRecognizeResponse]: ...

Streaming Recognition

Speech Adaptation

Custom speech model adaptation using phrase sets and custom word classes to improve recognition accuracy for domain-specific vocabulary.

class AdaptationClient:
    def create_phrase_set(
        self,
        request: CreatePhraseSetRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> PhraseSet: ...
    
    def create_custom_class(
        self,
        request: CreateCustomClassRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> CustomClass: ...

Speech Adaptation

Advanced Features (v2)

Next-generation API features including batch recognition, recognizer management, and enhanced output formatting.

class SpeechClient:  # v2
    def batch_recognize(
        self,
        request: BatchRecognizeRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...
    
    def create_recognizer(
        self,
        request: CreateRecognizerRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...

Advanced Features

Async Clients

Asynchronous client interfaces for all API versions, enabling non-blocking speech recognition operations in async Python applications.

class SpeechAsyncClient:
    async def recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> RecognizeResponse: ...
    
    async def long_running_recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...

class AdaptationAsyncClient:
    async def create_phrase_set(
        self,
        request: CreatePhraseSetRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> PhraseSet: ...

Types and Configuration

Core data types, configuration objects, and enums for speech recognition setup and result processing.

class RecognitionConfig:
    encoding: AudioEncoding
    sample_rate_hertz: int
    language_code: str
    enable_automatic_punctuation: bool
    enable_speaker_diarization: bool
    diarization_config: SpeakerDiarizationConfig
    speech_contexts: Sequence[SpeechContext]
    
class RecognitionAudio:
    content: bytes
    uri: str

Types and Configuration

Common Patterns

Error Handling

from google.api_core import exceptions
from google.cloud import speech

client = speech.SpeechClient()

try:
    response = client.recognize(config=config, audio=audio)
except exceptions.InvalidArgument as e:
    print(f"Invalid request: {e}")
except exceptions.DeadlineExceeded as e:
    print(f"Request timeout: {e}")

Async Operations

from google.cloud import speech

client = speech.SpeechClient()

# Start long-running operation
operation = client.long_running_recognize(config=config, audio=audio)

# Wait for completion
response = operation.result(timeout=300)

Async Client Usage

import asyncio
from google.cloud import speech

async def async_speech_recognition():
    # Initialize async client
    client = speech.SpeechAsyncClient()
    
    # Configure recognition
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )
    audio = speech.RecognitionAudio(content=audio_content)
    
    # Perform async recognition
    response = await client.recognize(config=config, audio=audio)
    
    # Process results
    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
    
    # Close the client
    await client.transport.close()

# Run async function
asyncio.run(async_speech_recognition())