Tessl Tile for pypi/faster-whisper@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-faster-whisper

Faster Whisper transcription with CTranslate2 for high-performance speech recognition

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/faster-whisper@1.2.x

To install, run

npx @tessl/cli install tessl/pypi-faster-whisper@1.2.0

0
# Faster Whisper
1

2
A high-performance reimplementation of OpenAI's Whisper automatic speech recognition model using CTranslate2 for fast inference. Faster Whisper delivers up to 4x faster transcription than the original openai/whisper implementation while maintaining the same accuracy and using less memory, with support for various precision levels (FP16, INT8) for both CPU and GPU execution.
3

4
## Package Information
5

6
- **Package Name**: faster-whisper
7
- **Language**: Python
8
- **Installation**: `pip install faster-whisper`
9
- **Requirements**: Python 3.9+
10

11
## Core Imports
12

13
```python
14
from faster_whisper import WhisperModel
15
```
16

17
Common additional imports:
18

19
```python
20
from faster_whisper import (
21
    WhisperModel,
22
    BatchedInferencePipeline,
23
    decode_audio,
24
    available_models,
25
    download_model,
26
    format_timestamp
27
)
28
```
29

30
## Basic Usage
31

32
```python
33
from faster_whisper import WhisperModel
34

35
# Initialize model
36
model = WhisperModel("base", device="cpu", compute_type="int8")
37

38
# Transcribe audio file
39
segments, info = model.transcribe("audio.mp3", beam_size=5)
40

41
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
42

43
# Process transcription segments
44
for segment in segments:
45
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
46
```
47

48
## Architecture
49

50
The library is built around several key components:
51

52
- **WhisperModel**: Main interface for speech recognition providing transcription and language detection
53
- **BatchedInferencePipeline**: Batched processing for improved throughput on multiple audio files
54
- **Audio Processing**: PyAV-based audio decoding with automatic format conversion and resampling
55
- **VAD Integration**: Silero VAD for automatic voice activity detection and silence filtering
56
- **CTranslate2 Backend**: Optimized inference engine with support for multiple compute types and devices
57

58
This design enables efficient speech-to-text processing with extensive customization options for different deployment scenarios.
59

60
## Capabilities
61

62
### Core Speech Recognition
63

64
Primary speech recognition functionality including transcription, language detection, and model management. These are the main operations for converting audio to text.
65

66
```python { .api }
67
class WhisperModel:
68
    def __init__(self, model_size_or_path, device="auto", compute_type="default", **kwargs): ...
69
    def transcribe(self, audio, language=None, task="transcribe", **kwargs): ...
70
    def detect_language(self, audio=None, features=None, **kwargs): ...
71

72
def available_models(): ...
73
def download_model(size_or_id, output_dir=None, **kwargs): ...
74
```
75

76
[Core Speech Recognition](./core-speech-recognition.md)
77

78
### Batched Processing
79

80
High-throughput batch processing capabilities for processing multiple audio files or chunks efficiently.
81

82
```python { .api }
83
class BatchedInferencePipeline:
84
    def __init__(self, model): ...
85
    def forward(self, features, tokenizer, chunks_metadata, options): ...
86
```
87

88
[Batched Processing](./batched-processing.md)
89

90
### Audio Processing
91

92
Audio decoding, format conversion, and preprocessing utilities for preparing audio data for transcription.
93

94
```python { .api }
95
def decode_audio(input_file, sampling_rate=16000, split_stereo=False): ...
96
def pad_or_trim(array, length=3000, *, axis=-1): ...
97
```
98

99
[Audio Processing](./audio-processing.md)
100

101
### Voice Activity Detection
102

103
Voice activity detection functionality using Silero VAD for automatic silence detection and audio segmentation.
104

105
```python { .api }
106
@dataclass
107
class VadOptions:
108
    threshold: float = 0.5
109
    min_speech_duration_ms: int = 0
110
    max_speech_duration_s: float = float("inf")
111
    min_silence_duration_ms: int = 2000
112
    speech_pad_ms: int = 400
113

114
def get_speech_timestamps(audio, vad_options=None, sampling_rate=16000, **kwargs): ...
115
```
116

117
[Voice Activity Detection](./voice-activity-detection.md)
118

119
### Utilities
120

121
Helper functions for timestamp formatting, model information, and other utility operations.
122

123
```python { .api }
124
def format_timestamp(seconds, always_include_hours=False, decimal_marker="."): ...
125
def get_logger(): ...
126
def get_assets_path(): ...
127
```
128

129
[Utilities](./utilities.md)
130

131
## Core Types
132

133
```python { .api }
134
@dataclass
135
class Word:
136
    start: float
137
    end: float
138
    word: str
139
    probability: float
140

141
@dataclass
142
class Segment:
143
    id: int
144
    seek: int
145
    start: float
146
    end: float
147
    text: str
148
    tokens: list[int]
149
    avg_logprob: float
150
    compression_ratio: float
151
    no_speech_prob: float
152
    words: list[Word] | None
153
    temperature: float | None
154

155
@dataclass
156
class TranscriptionInfo:
157
    language: str
158
    language_probability: float
159
    duration: float
160
    duration_after_vad: float
161
    all_language_probs: list[tuple[str, float]] | None
162
    transcription_options: TranscriptionOptions
163
    vad_options: VadOptions
164

165
@dataclass
166
class TranscriptionOptions:
167
    beam_size: int
168
    best_of: int
169
    patience: float
170
    length_penalty: float
171
    repetition_penalty: float
172
    no_repeat_ngram_size: int
173
    log_prob_threshold: float | None
174
    no_speech_threshold: float | None
175
    compression_ratio_threshold: float | None
176
    condition_on_previous_text: bool
177
    prompt_reset_on_temperature: float
178
    temperatures: list[float]
179
    initial_prompt: str | list[int] | None
180
    prefix: str | None
181
    suppress_blank: bool
182
    suppress_tokens: list[int] | None
183
    without_timestamps: bool
184
    max_initial_timestamp: float
185
    word_timestamps: bool
186
    prepend_punctuations: str
187
    append_punctuations: str
188
    multilingual: bool
189
    max_new_tokens: int | None
190
    clip_timestamps: str | list[float]
191
    hallucination_silence_threshold: float | None
192
    hotwords: str | None
193
```