0
# Audio Models
1
2
Audio processing models for speech recognition, audio-to-text conversion, and audio understanding tasks. Keras Hub provides implementations of state-of-the-art audio models including Whisper and Moonshine.
3
4
## Capabilities
5
6
### Whisper (OpenAI Speech Recognition Model)
7
8
Whisper is a robust speech recognition model that can transcribe audio in multiple languages and handle various audio conditions.
9
10
```python { .api }
11
class WhisperBackbone(Backbone):
12
"""Whisper transformer backbone for speech recognition."""
13
def __init__(
14
self,
15
vocabulary_size: int,
16
num_layers: int,
17
num_heads: int,
18
hidden_dim: int,
19
intermediate_dim: int,
20
num_mels: int = 80,
21
dropout: float = 0.0,
22
max_encoder_sequence_length: int = 3000,
23
max_decoder_sequence_length: int = 448,
24
**kwargs
25
): ...
26
27
class WhisperTokenizer:
28
"""Whisper tokenizer for text processing."""
29
def __init__(
30
self,
31
vocabulary: dict = None,
32
**kwargs
33
): ...
34
```
35
36
### Moonshine (Efficient Speech Recognition)
37
38
Moonshine is an efficient speech recognition model optimized for fast inference and low resource usage.
39
40
```python { .api }
41
class MoonshineBackbone(Backbone):
42
"""Moonshine backbone for audio-to-text conversion."""
43
def __init__(
44
self,
45
vocabulary_size: int,
46
num_layers: int,
47
hidden_dim: int,
48
num_heads: int,
49
**kwargs
50
): ...
51
52
class MoonshineAudioToText:
53
"""Moonshine model for audio-to-text conversion."""
54
def __init__(
55
self,
56
backbone: MoonshineBackbone,
57
preprocessor: Preprocessor = None,
58
**kwargs
59
): ...
60
61
class MoonshineAudioToTextPreprocessor:
62
"""Preprocessor for Moonshine audio-to-text."""
63
def __init__(
64
self,
65
audio_converter: AudioConverter,
66
tokenizer: MoonshineTokenizer,
67
**kwargs
68
): ...
69
70
class MoonshineTokenizer:
71
"""Moonshine tokenizer for text processing."""
72
def __init__(
73
self,
74
vocabulary: dict = None,
75
**kwargs
76
): ...
77
78
class MoonshineAudioConverter:
79
"""Audio converter for Moonshine models."""
80
def __init__(
81
self,
82
sample_rate: int = 16000,
83
num_mels: int = 80,
84
hop_length: int = 160,
85
win_length: int = 400,
86
**kwargs
87
): ...
88
```
89
90
### Audio Converter Base Class
91
92
Base class for audio preprocessing and conversion.
93
94
```python { .api }
95
class AudioConverter:
96
"""Base class for audio data conversion."""
97
def __init__(
98
self,
99
sample_rate: int = 16000,
100
**kwargs
101
): ...
102
103
def __call__(self, audio_data): ...
104
```
105
106
## Usage Examples
107
108
### Audio-to-Text with Moonshine
109
110
```python
111
import keras_hub
112
import numpy as np
113
114
# Load pretrained Moonshine model
115
model = keras_hub.models.MoonshineAudioToText.from_preset("moonshine_base")
116
117
# Prepare audio data (example with synthetic data)
118
# In practice, you would load actual audio files
119
audio_data = np.random.random((16000,)) # 1 second of audio at 16kHz
120
audio_batch = np.expand_dims(audio_data, axis=0) # Add batch dimension
121
122
# Transcribe audio
123
transcription = model.predict(audio_batch)
124
print("Transcription:", transcription)
125
```
126
127
### Using Audio Converter
128
129
```python
130
import keras_hub
131
132
# Create audio converter
133
audio_converter = keras_hub.layers.MoonshineAudioConverter(
134
sample_rate=16000,
135
num_mels=80
136
)
137
138
# Convert audio to mel spectrogram features
139
audio_features = audio_converter(audio_data)
140
print(f"Audio features shape: {audio_features.shape}")
141
```
142
143
### Custom Audio Processing Pipeline
144
145
```python
146
import keras_hub
147
148
# Load backbone and create custom model
149
backbone = keras_hub.models.MoonshineBackbone.from_preset("moonshine_base")
150
151
# Create preprocessor
152
preprocessor = keras_hub.models.MoonshineAudioToTextPreprocessor(
153
audio_converter=keras_hub.layers.MoonshineAudioConverter(),
154
tokenizer=keras_hub.tokenizers.MoonshineTokenizer.from_preset("moonshine_base")
155
)
156
157
# Create custom model
158
model = keras_hub.models.MoonshineAudioToText(
159
backbone=backbone,
160
preprocessor=preprocessor
161
)
162
163
# Compile and use model
164
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")
165
```