or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

audio-processing.mdbatched-processing.mdcore-speech-recognition.mdindex.mdutilities.mdvoice-activity-detection.md

batched-processing.mddocs/

0

# Batched Processing

1

2

High-throughput batch processing capabilities for processing multiple audio files or chunks efficiently. The BatchedInferencePipeline enables improved performance when processing large amounts of audio data.

3

4

## Capabilities

5

6

### BatchedInferencePipeline Initialization

7

8

Create a batched inference pipeline that wraps a WhisperModel for improved throughput processing.

9

10

```python { .api }

11

class BatchedInferencePipeline:

12

def __init__(self, model):

13

"""

14

Initialize batched inference pipeline.

15

16

Args:

17

model: WhisperModel instance to use for batched processing

18

"""

19

```

20

21

### Batched Forward Processing

22

23

Process multiple audio features in a single batch operation for improved throughput.

24

25

```python { .api }

26

def forward(

27

self,

28

features: np.ndarray,

29

tokenizer,

30

chunks_metadata: list[dict],

31

options: TranscriptionOptions

32

) -> list[list[dict]]:

33

"""

34

Process batched features through the model.

35

36

Args:

37

features: Batched audio features array

38

tokenizer: Tokenizer instance for text processing

39

chunks_metadata: List of metadata dictionaries for each chunk

40

options: TranscriptionOptions for processing configuration

41

42

Returns:

43

List of segmented outputs for each input chunk

44

"""

45

```

46

47

### Batched Segment Generation

48

49

Generate transcription segments from batched audio features with improved efficiency.

50

51

```python { .api }

52

def generate_segment_batched(

53

self,

54

features: np.ndarray,

55

tokenizer,

56

options: TranscriptionOptions

57

) -> tuple[np.ndarray, list[dict]]:

58

"""

59

Generate segments from batched features.

60

61

Args:

62

features: Batched audio features array

63

tokenizer: Tokenizer instance for processing

64

options: TranscriptionOptions configuration

65

66

Returns:

67

Tuple of (encoder_output, segment_outputs)

68

"""

69

```

70

71

## Usage Examples

72

73

### Basic Batched Processing

74

75

```python

76

from faster_whisper import WhisperModel, BatchedInferencePipeline, decode_audio

77

78

# Initialize model and batched pipeline

79

model = WhisperModel("base", device="cuda")

80

batched_model = BatchedInferencePipeline(model=model)

81

82

# Process single audio file with batched pipeline

83

segments, info = batched_model.transcribe("audio.mp3", vad_filter=False)

84

85

print(f"Language: {info.language}")

86

for segment in segments:

87

print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

88

```

89

90

### Processing Multiple Audio Files

91

92

```python

93

from faster_whisper import WhisperModel, BatchedInferencePipeline

94

import numpy as np

95

96

model = WhisperModel("medium", device="cuda", compute_type="float16")

97

batched_model = BatchedInferencePipeline(model=model)

98

99

audio_files = ["audio1.mp3", "audio2.wav", "audio3.mp4"]

100

101

# Process each file with the batched pipeline

102

for audio_file in audio_files:

103

print(f"Processing {audio_file}...")

104

segments, info = batched_model.transcribe(

105

audio_file,

106

word_timestamps=True,

107

vad_filter=True

108

)

109

110

print(f" Language: {info.language} (confidence: {info.language_probability:.2f})")

111

print(f" Duration: {info.duration:.2f}s")

112

113

for segment in segments:

114

print(f" [{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

115

```

116

117

### Custom Batched Processing with Features

118

119

```python

120

from faster_whisper import WhisperModel, BatchedInferencePipeline, decode_audio

121

from faster_whisper.transcribe import TranscriptionOptions

122

import numpy as np

123

124

model = WhisperModel("base")

125

batched_model = BatchedInferencePipeline(model=model)

126

127

# Prepare audio data

128

audio_files = ["file1.wav", "file2.wav"]

129

audio_arrays = []

130

chunks_metadata = []

131

132

for i, file_path in enumerate(audio_files):

133

audio = decode_audio(file_path)

134

audio_arrays.append(audio)

135

chunks_metadata.append({

136

"file_id": i,

137

"offset": 0.0,

138

"duration": len(audio) / 16000.0 # assuming 16kHz sample rate

139

})

140

141

# Convert to batched features (simplified example)

142

# In practice, you would use the model's feature extractor

143

features = np.stack([model.feature_extractor(audio) for audio in audio_arrays])

144

145

# Configure transcription options

146

options = TranscriptionOptions(

147

beam_size=5,

148

word_timestamps=True,

149

without_timestamps=False,

150

temperatures=[0.0]

151

)

152

153

# Process batch

154

tokenizer = model.tokenizer

155

results = batched_model.forward(features, tokenizer, chunks_metadata, options)

156

157

# Process results

158

for i, (file_path, result) in enumerate(zip(audio_files, results)):

159

print(f"Results for {file_path}:")

160

for segment_data in result:

161

print(f" [{segment_data['start']:.2f}s -> {segment_data['end']:.2f}s] {segment_data['text']}")

162

```

163

164

## Performance Considerations

165

166

- **GPU Memory**: Batched processing requires more GPU memory for larger batch sizes

167

- **Batch Size**: Optimal batch size depends on available memory and model size

168

- **Audio Length**: Longer audio segments may require chunking before batching

169

- **VAD Integration**: Voice activity detection can be combined with batching for better efficiency

170

171

## Note on API Compatibility

172

173

The BatchedInferencePipeline provides a `transcribe` method that maintains compatibility with the WhisperModel API while providing improved throughput for batch processing scenarios. The method signature and return format are identical to WhisperModel.transcribe().