Tessl Tile for pypi/faster-whisper@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

audio-processing.md batched-processing.md core-speech-recognition.md index.md utilities.md voice-activity-detection.md

batched-processing.mddocs/

0
# Batched Processing
1

2
High-throughput batch processing capabilities for processing multiple audio files or chunks efficiently. The BatchedInferencePipeline enables improved performance when processing large amounts of audio data.
3

4
## Capabilities
5

6
### BatchedInferencePipeline Initialization
7

8
Create a batched inference pipeline that wraps a WhisperModel for improved throughput processing.
9

10
```python { .api }
11
class BatchedInferencePipeline:
12
    def __init__(self, model):
13
        """
14
        Initialize batched inference pipeline.
15
        
16
        Args:
17
            model: WhisperModel instance to use for batched processing
18
        """
19
```
20

21
### Batched Forward Processing
22

23
Process multiple audio features in a single batch operation for improved throughput.
24

25
```python { .api }
26
def forward(
27
    self,
28
    features: np.ndarray,
29
    tokenizer,
30
    chunks_metadata: list[dict],
31
    options: TranscriptionOptions
32
) -> list[list[dict]]:
33
    """
34
    Process batched features through the model.
35
    
36
    Args:
37
        features: Batched audio features array
38
        tokenizer: Tokenizer instance for text processing
39
        chunks_metadata: List of metadata dictionaries for each chunk
40
        options: TranscriptionOptions for processing configuration
41
        
42
    Returns:
43
        List of segmented outputs for each input chunk
44
    """
45
```
46

47
### Batched Segment Generation
48

49
Generate transcription segments from batched audio features with improved efficiency.
50

51
```python { .api }
52
def generate_segment_batched(
53
    self,
54
    features: np.ndarray,
55
    tokenizer,
56
    options: TranscriptionOptions
57
) -> tuple[np.ndarray, list[dict]]:
58
    """
59
    Generate segments from batched features.
60
    
61
    Args:
62
        features: Batched audio features array
63
        tokenizer: Tokenizer instance for processing
64
        options: TranscriptionOptions configuration
65
        
66
    Returns:
67
        Tuple of (encoder_output, segment_outputs)
68
    """
69
```
70

71
## Usage Examples
72

73
### Basic Batched Processing
74

75
```python
76
from faster_whisper import WhisperModel, BatchedInferencePipeline, decode_audio
77

78
# Initialize model and batched pipeline
79
model = WhisperModel("base", device="cuda")
80
batched_model = BatchedInferencePipeline(model=model)
81

82
# Process single audio file with batched pipeline
83
segments, info = batched_model.transcribe("audio.mp3", vad_filter=False)
84

85
print(f"Language: {info.language}")
86
for segment in segments:
87
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
88
```
89

90
### Processing Multiple Audio Files
91

92
```python
93
from faster_whisper import WhisperModel, BatchedInferencePipeline
94
import numpy as np
95

96
model = WhisperModel("medium", device="cuda", compute_type="float16")
97
batched_model = BatchedInferencePipeline(model=model)
98

99
audio_files = ["audio1.mp3", "audio2.wav", "audio3.mp4"]
100

101
# Process each file with the batched pipeline
102
for audio_file in audio_files:
103
    print(f"Processing {audio_file}...")
104
    segments, info = batched_model.transcribe(
105
        audio_file,
106
        word_timestamps=True,
107
        vad_filter=True
108
    )
109
    
110
    print(f"  Language: {info.language} (confidence: {info.language_probability:.2f})")
111
    print(f"  Duration: {info.duration:.2f}s")
112
    
113
    for segment in segments:
114
        print(f"  [{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
115
```
116

117
### Custom Batched Processing with Features
118

119
```python
120
from faster_whisper import WhisperModel, BatchedInferencePipeline, decode_audio
121
from faster_whisper.transcribe import TranscriptionOptions
122
import numpy as np
123

124
model = WhisperModel("base")
125
batched_model = BatchedInferencePipeline(model=model)
126

127
# Prepare audio data
128
audio_files = ["file1.wav", "file2.wav"]
129
audio_arrays = []
130
chunks_metadata = []
131

132
for i, file_path in enumerate(audio_files):
133
    audio = decode_audio(file_path)
134
    audio_arrays.append(audio)
135
    chunks_metadata.append({
136
        "file_id": i,
137
        "offset": 0.0,
138
        "duration": len(audio) / 16000.0  # assuming 16kHz sample rate
139
    })
140

141
# Convert to batched features (simplified example)
142
# In practice, you would use the model's feature extractor
143
features = np.stack([model.feature_extractor(audio) for audio in audio_arrays])
144

145
# Configure transcription options
146
options = TranscriptionOptions(
147
    beam_size=5,
148
    word_timestamps=True,
149
    without_timestamps=False,
150
    temperatures=[0.0]
151
)
152

153
# Process batch
154
tokenizer = model.tokenizer
155
results = batched_model.forward(features, tokenizer, chunks_metadata, options)
156

157
# Process results
158
for i, (file_path, result) in enumerate(zip(audio_files, results)):
159
    print(f"Results for {file_path}:")
160
    for segment_data in result:
161
        print(f"  [{segment_data['start']:.2f}s -> {segment_data['end']:.2f}s] {segment_data['text']}")
162
```
163

164
## Performance Considerations
165

166
- **GPU Memory**: Batched processing requires more GPU memory for larger batch sizes
167
- **Batch Size**: Optimal batch size depends on available memory and model size
168
- **Audio Length**: Longer audio segments may require chunking before batching
169
- **VAD Integration**: Voice activity detection can be combined with batching for better efficiency
170

171
## Note on API Compatibility
172

173
The BatchedInferencePipeline provides a `transcribe` method that maintains compatibility with the WhisperModel API while providing improved throughput for batch processing scenarios. The method signature and return format are identical to WhisperModel.transcribe().

Version

Tile

Files

batched-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

batched-processing.mddocs/