or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-faster-whisper

Faster Whisper transcription with CTranslate2 for high-performance speech recognition

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/faster-whisper@1.2.x

To install, run

npx @tessl/cli install tessl/pypi-faster-whisper@1.2.0

0

# Faster Whisper

1

2

A high-performance reimplementation of OpenAI's Whisper automatic speech recognition model using CTranslate2 for fast inference. Faster Whisper delivers up to 4x faster transcription than the original openai/whisper implementation while maintaining the same accuracy and using less memory, with support for various precision levels (FP16, INT8) for both CPU and GPU execution.

3

4

## Package Information

5

6

- **Package Name**: faster-whisper

7

- **Language**: Python

8

- **Installation**: `pip install faster-whisper`

9

- **Requirements**: Python 3.9+

10

11

## Core Imports

12

13

```python

14

from faster_whisper import WhisperModel

15

```

16

17

Common additional imports:

18

19

```python

20

from faster_whisper import (

21

WhisperModel,

22

BatchedInferencePipeline,

23

decode_audio,

24

available_models,

25

download_model,

26

format_timestamp

27

)

28

```

29

30

## Basic Usage

31

32

```python

33

from faster_whisper import WhisperModel

34

35

# Initialize model

36

model = WhisperModel("base", device="cpu", compute_type="int8")

37

38

# Transcribe audio file

39

segments, info = model.transcribe("audio.mp3", beam_size=5)

40

41

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

42

43

# Process transcription segments

44

for segment in segments:

45

print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

46

```

47

48

## Architecture

49

50

The library is built around several key components:

51

52

- **WhisperModel**: Main interface for speech recognition providing transcription and language detection

53

- **BatchedInferencePipeline**: Batched processing for improved throughput on multiple audio files

54

- **Audio Processing**: PyAV-based audio decoding with automatic format conversion and resampling

55

- **VAD Integration**: Silero VAD for automatic voice activity detection and silence filtering

56

- **CTranslate2 Backend**: Optimized inference engine with support for multiple compute types and devices

57

58

This design enables efficient speech-to-text processing with extensive customization options for different deployment scenarios.

59

60

## Capabilities

61

62

### Core Speech Recognition

63

64

Primary speech recognition functionality including transcription, language detection, and model management. These are the main operations for converting audio to text.

65

66

```python { .api }

67

class WhisperModel:

68

def __init__(self, model_size_or_path, device="auto", compute_type="default", **kwargs): ...

69

def transcribe(self, audio, language=None, task="transcribe", **kwargs): ...

70

def detect_language(self, audio=None, features=None, **kwargs): ...

71

72

def available_models(): ...

73

def download_model(size_or_id, output_dir=None, **kwargs): ...

74

```

75

76

[Core Speech Recognition](./core-speech-recognition.md)

77

78

### Batched Processing

79

80

High-throughput batch processing capabilities for processing multiple audio files or chunks efficiently.

81

82

```python { .api }

83

class BatchedInferencePipeline:

84

def __init__(self, model): ...

85

def forward(self, features, tokenizer, chunks_metadata, options): ...

86

```

87

88

[Batched Processing](./batched-processing.md)

89

90

### Audio Processing

91

92

Audio decoding, format conversion, and preprocessing utilities for preparing audio data for transcription.

93

94

```python { .api }

95

def decode_audio(input_file, sampling_rate=16000, split_stereo=False): ...

96

def pad_or_trim(array, length=3000, *, axis=-1): ...

97

```

98

99

[Audio Processing](./audio-processing.md)

100

101

### Voice Activity Detection

102

103

Voice activity detection functionality using Silero VAD for automatic silence detection and audio segmentation.

104

105

```python { .api }

106

@dataclass

107

class VadOptions:

108

threshold: float = 0.5

109

min_speech_duration_ms: int = 0

110

max_speech_duration_s: float = float("inf")

111

min_silence_duration_ms: int = 2000

112

speech_pad_ms: int = 400

113

114

def get_speech_timestamps(audio, vad_options=None, sampling_rate=16000, **kwargs): ...

115

```

116

117

[Voice Activity Detection](./voice-activity-detection.md)

118

119

### Utilities

120

121

Helper functions for timestamp formatting, model information, and other utility operations.

122

123

```python { .api }

124

def format_timestamp(seconds, always_include_hours=False, decimal_marker="."): ...

125

def get_logger(): ...

126

def get_assets_path(): ...

127

```

128

129

[Utilities](./utilities.md)

130

131

## Core Types

132

133

```python { .api }

134

@dataclass

135

class Word:

136

start: float

137

end: float

138

word: str

139

probability: float

140

141

@dataclass

142

class Segment:

143

id: int

144

seek: int

145

start: float

146

end: float

147

text: str

148

tokens: list[int]

149

avg_logprob: float

150

compression_ratio: float

151

no_speech_prob: float

152

words: list[Word] | None

153

temperature: float | None

154

155

@dataclass

156

class TranscriptionInfo:

157

language: str

158

language_probability: float

159

duration: float

160

duration_after_vad: float

161

all_language_probs: list[tuple[str, float]] | None

162

transcription_options: TranscriptionOptions

163

vad_options: VadOptions

164

165

@dataclass

166

class TranscriptionOptions:

167

beam_size: int

168

best_of: int

169

patience: float

170

length_penalty: float

171

repetition_penalty: float

172

no_repeat_ngram_size: int

173

log_prob_threshold: float | None

174

no_speech_threshold: float | None

175

compression_ratio_threshold: float | None

176

condition_on_previous_text: bool

177

prompt_reset_on_temperature: float

178

temperatures: list[float]

179

initial_prompt: str | list[int] | None

180

prefix: str | None

181

suppress_blank: bool

182

suppress_tokens: list[int] | None

183

without_timestamps: bool

184

max_initial_timestamp: float

185

word_timestamps: bool

186

prepend_punctuations: str

187

append_punctuations: str

188

multilingual: bool

189

max_new_tokens: int | None

190

clip_timestamps: str | list[float]

191

hallucination_silence_threshold: float | None

192

hotwords: str | None

193

```