or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-features.md index.md speech-adaptation.md speech-recognition.md streaming-recognition.md types-and-configuration.md

tile.json

tessl/pypi-google-cloud-speech

Google Cloud Speech API client library for speech-to-text conversion with support for real-time streaming, batch processing, and advanced speech recognition models

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/google-cloud-speech@2.33.x

To install, run

npx @tessl/cli install tessl/pypi-google-cloud-speech@2.33.0

Google Cloud Speech

Google Cloud Speech API client library providing advanced speech-to-text conversion capabilities. This package offers real-time streaming recognition, batch processing, and custom speech adaptation, serving as Python's interface to Google's industry-leading speech recognition technology.

Package Information

Package Name: google-cloud-speech
Language: Python
Installation: pip install google-cloud-speech
Minimum Python Version: 3.7+

Core Imports

Default import (uses v1 API):

from google.cloud import speech

Version-specific imports:

from google.cloud import speech_v1      # Stable API
from google.cloud import speech_v1p1beta1  # Beta features
from google.cloud import speech_v2      # Next-generation API

Common client initialization:

from google.cloud import speech

# Initialize the speech client
client = speech.SpeechClient()

Basic Usage

from google.cloud import speech
import io

# Initialize the client
client = speech.SpeechClient()

# Load audio file
with io.open("audio_file.wav", "rb") as audio_file:
    content = audio_file.read()

# Configure recognition
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
)

# Perform speech recognition
response = client.recognize(config=config, audio=audio)

# Process results
for result in response.results:
    print(f"Transcript: {result.alternatives[0].transcript}")
    print(f"Confidence: {result.alternatives[0].confidence}")

Architecture

The Google Cloud Speech API provides three main API versions:

v1 (Stable): Core speech recognition functionality with synchronous, asynchronous, and streaming recognition
v1p1beta1 (Beta): Extended v1 features with experimental capabilities
v2 (Next-Generation): Advanced features including recognizer management, batch processing, and enhanced output formats

Client Structure

SpeechClient: Primary client for speech recognition operations
AdaptationClient: Manages custom speech adaptation resources (phrase sets, custom classes)
SpeechHelpers: Simplified interfaces for complex operations like streaming (mixed into SpeechClient)
AsyncClients: Asynchronous versions of all clients for non-blocking operations

Recognition Modes

Synchronous: Real-time recognition for short audio (< 1 minute)
Asynchronous: Long-running recognition for longer audio files
Streaming: Real-time bidirectional streaming for live audio

Capabilities

Speech Recognition

Core speech-to-text functionality supporting synchronous, asynchronous, and streaming recognition modes with extensive configuration options.

class SpeechClient:
    def recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> RecognizeResponse: ...
    
    def long_running_recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...

Speech Recognition

Streaming Recognition

Real-time bidirectional streaming speech recognition for live audio processing with immediate results.

class SpeechClient:
    def streaming_recognize(
        self,
        requests: Iterator[StreamingRecognizeRequest],
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Iterator[StreamingRecognizeResponse]: ...

Streaming Recognition

Speech Adaptation

Custom speech model adaptation using phrase sets and custom word classes to improve recognition accuracy for domain-specific vocabulary.

class AdaptationClient:
    def create_phrase_set(
        self,
        request: CreatePhraseSetRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> PhraseSet: ...
    
    def create_custom_class(
        self,
        request: CreateCustomClassRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> CustomClass: ...

Speech Adaptation

Advanced Features (v2)

Next-generation API features including batch recognition, recognizer management, and enhanced output formatting.

class SpeechClient:  # v2
    def batch_recognize(
        self,
        request: BatchRecognizeRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...
    
    def create_recognizer(
        self,
        request: CreateRecognizerRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...

Advanced Features

Async Clients

Asynchronous client interfaces for all API versions, enabling non-blocking speech recognition operations in async Python applications.

class SpeechAsyncClient:
    async def recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> RecognizeResponse: ...
    
    async def long_running_recognize(
        self,
        config: RecognitionConfig,
        audio: RecognitionAudio,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> Operation: ...

class AdaptationAsyncClient:
    async def create_phrase_set(
        self,
        request: CreatePhraseSetRequest,
        *,
        retry: OptionalRetry = None,
        timeout: Optional[float] = None,
        metadata: Sequence[Tuple[str, str]] = ()
    ) -> PhraseSet: ...

Types and Configuration

Core data types, configuration objects, and enums for speech recognition setup and result processing.

class RecognitionConfig:
    encoding: AudioEncoding
    sample_rate_hertz: int
    language_code: str
    enable_automatic_punctuation: bool
    enable_speaker_diarization: bool
    diarization_config: SpeakerDiarizationConfig
    speech_contexts: Sequence[SpeechContext]
    
class RecognitionAudio:
    content: bytes
    uri: str

Types and Configuration

Common Patterns

Error Handling

from google.api_core import exceptions
from google.cloud import speech

client = speech.SpeechClient()

try:
    response = client.recognize(config=config, audio=audio)
except exceptions.InvalidArgument as e:
    print(f"Invalid request: {e}")
except exceptions.DeadlineExceeded as e:
    print(f"Request timeout: {e}")

Async Operations

from google.cloud import speech

client = speech.SpeechClient()

# Start long-running operation
operation = client.long_running_recognize(config=config, audio=audio)

# Wait for completion
response = operation.result(timeout=300)

Async Client Usage

import asyncio
from google.cloud import speech

async def async_speech_recognition():
    # Initialize async client
    client = speech.SpeechAsyncClient()
    
    # Configure recognition
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )
    audio = speech.RecognitionAudio(content=audio_content)
    
    # Perform async recognition
    response = await client.recognize(config=config, audio=audio)
    
    # Process results
    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
    
    # Close the client
    await client.transport.close()

# Run async function
asyncio.run(async_speech_recognition())

Version

Tile

Files

tessl/pypi-google-cloud-speech

To install, run

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Google Cloud Speech

Package Information

Core Imports

Basic Usage

Architecture

Client Structure

Recognition Modes

Capabilities

Speech Recognition

Streaming Recognition

Speech Adaptation

Advanced Features (v2)

Async Clients

Types and Configuration

Common Patterns

Error Handling

Async Operations

Async Client Usage

index.mddocs/