0
# Batched Processing
1
2
High-throughput batch processing capabilities for processing multiple audio files or chunks efficiently. The BatchedInferencePipeline enables improved performance when processing large amounts of audio data.
3
4
## Capabilities
5
6
### BatchedInferencePipeline Initialization
7
8
Create a batched inference pipeline that wraps a WhisperModel for improved throughput processing.
9
10
```python { .api }
11
class BatchedInferencePipeline:
12
def __init__(self, model):
13
"""
14
Initialize batched inference pipeline.
15
16
Args:
17
model: WhisperModel instance to use for batched processing
18
"""
19
```
20
21
### Batched Forward Processing
22
23
Process multiple audio features in a single batch operation for improved throughput.
24
25
```python { .api }
26
def forward(
27
self,
28
features: np.ndarray,
29
tokenizer,
30
chunks_metadata: list[dict],
31
options: TranscriptionOptions
32
) -> list[list[dict]]:
33
"""
34
Process batched features through the model.
35
36
Args:
37
features: Batched audio features array
38
tokenizer: Tokenizer instance for text processing
39
chunks_metadata: List of metadata dictionaries for each chunk
40
options: TranscriptionOptions for processing configuration
41
42
Returns:
43
List of segmented outputs for each input chunk
44
"""
45
```
46
47
### Batched Segment Generation
48
49
Generate transcription segments from batched audio features with improved efficiency.
50
51
```python { .api }
52
def generate_segment_batched(
53
self,
54
features: np.ndarray,
55
tokenizer,
56
options: TranscriptionOptions
57
) -> tuple[np.ndarray, list[dict]]:
58
"""
59
Generate segments from batched features.
60
61
Args:
62
features: Batched audio features array
63
tokenizer: Tokenizer instance for processing
64
options: TranscriptionOptions configuration
65
66
Returns:
67
Tuple of (encoder_output, segment_outputs)
68
"""
69
```
70
71
## Usage Examples
72
73
### Basic Batched Processing
74
75
```python
76
from faster_whisper import WhisperModel, BatchedInferencePipeline, decode_audio
77
78
# Initialize model and batched pipeline
79
model = WhisperModel("base", device="cuda")
80
batched_model = BatchedInferencePipeline(model=model)
81
82
# Process single audio file with batched pipeline
83
segments, info = batched_model.transcribe("audio.mp3", vad_filter=False)
84
85
print(f"Language: {info.language}")
86
for segment in segments:
87
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
88
```
89
90
### Processing Multiple Audio Files
91
92
```python
93
from faster_whisper import WhisperModel, BatchedInferencePipeline
94
import numpy as np
95
96
model = WhisperModel("medium", device="cuda", compute_type="float16")
97
batched_model = BatchedInferencePipeline(model=model)
98
99
audio_files = ["audio1.mp3", "audio2.wav", "audio3.mp4"]
100
101
# Process each file with the batched pipeline
102
for audio_file in audio_files:
103
print(f"Processing {audio_file}...")
104
segments, info = batched_model.transcribe(
105
audio_file,
106
word_timestamps=True,
107
vad_filter=True
108
)
109
110
print(f" Language: {info.language} (confidence: {info.language_probability:.2f})")
111
print(f" Duration: {info.duration:.2f}s")
112
113
for segment in segments:
114
print(f" [{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
115
```
116
117
### Custom Batched Processing with Features
118
119
```python
120
from faster_whisper import WhisperModel, BatchedInferencePipeline, decode_audio
121
from faster_whisper.transcribe import TranscriptionOptions
122
import numpy as np
123
124
model = WhisperModel("base")
125
batched_model = BatchedInferencePipeline(model=model)
126
127
# Prepare audio data
128
audio_files = ["file1.wav", "file2.wav"]
129
audio_arrays = []
130
chunks_metadata = []
131
132
for i, file_path in enumerate(audio_files):
133
audio = decode_audio(file_path)
134
audio_arrays.append(audio)
135
chunks_metadata.append({
136
"file_id": i,
137
"offset": 0.0,
138
"duration": len(audio) / 16000.0 # assuming 16kHz sample rate
139
})
140
141
# Convert to batched features (simplified example)
142
# In practice, you would use the model's feature extractor
143
features = np.stack([model.feature_extractor(audio) for audio in audio_arrays])
144
145
# Configure transcription options
146
options = TranscriptionOptions(
147
beam_size=5,
148
word_timestamps=True,
149
without_timestamps=False,
150
temperatures=[0.0]
151
)
152
153
# Process batch
154
tokenizer = model.tokenizer
155
results = batched_model.forward(features, tokenizer, chunks_metadata, options)
156
157
# Process results
158
for i, (file_path, result) in enumerate(zip(audio_files, results)):
159
print(f"Results for {file_path}:")
160
for segment_data in result:
161
print(f" [{segment_data['start']:.2f}s -> {segment_data['end']:.2f}s] {segment_data['text']}")
162
```
163
164
## Performance Considerations
165
166
- **GPU Memory**: Batched processing requires more GPU memory for larger batch sizes
167
- **Batch Size**: Optimal batch size depends on available memory and model size
168
- **Audio Length**: Longer audio segments may require chunking before batching
169
- **VAD Integration**: Voice activity detection can be combined with batching for better efficiency
170
171
## Note on API Compatibility
172
173
The BatchedInferencePipeline provides a `transcribe` method that maintains compatibility with the WhisperModel API while providing improved throughput for batch processing scenarios. The method signature and return format are identical to WhisperModel.transcribe().