Tessl Tile for pypi/keras-hub@0.22.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

audio-models.md evaluation-metrics.md generative-models.md image-models.md index.md layers-components.md multimodal-models.md text-generation-sampling.md text-models.md tokenizers.md utilities-helpers.md

audio-models.mddocs/

0
# Audio Models
1

2
Audio processing models for speech recognition, audio-to-text conversion, and audio understanding tasks. Keras Hub provides implementations of state-of-the-art audio models including Whisper and Moonshine.
3

4
## Capabilities
5

6
### Whisper (OpenAI Speech Recognition Model)
7

8
Whisper is a robust speech recognition model that can transcribe audio in multiple languages and handle various audio conditions.
9

10
```python { .api }
11
class WhisperBackbone(Backbone):
12
    """Whisper transformer backbone for speech recognition."""
13
    def __init__(
14
        self,
15
        vocabulary_size: int,
16
        num_layers: int,
17
        num_heads: int,
18
        hidden_dim: int,
19
        intermediate_dim: int,
20
        num_mels: int = 80,
21
        dropout: float = 0.0,
22
        max_encoder_sequence_length: int = 3000,
23
        max_decoder_sequence_length: int = 448,
24
        **kwargs
25
    ): ...
26

27
class WhisperTokenizer:
28
    """Whisper tokenizer for text processing."""
29
    def __init__(
30
        self,
31
        vocabulary: dict = None,
32
        **kwargs
33
    ): ...
34
```
35

36
### Moonshine (Efficient Speech Recognition)
37

38
Moonshine is an efficient speech recognition model optimized for fast inference and low resource usage.
39

40
```python { .api }
41
class MoonshineBackbone(Backbone):
42
    """Moonshine backbone for audio-to-text conversion."""
43
    def __init__(
44
        self,
45
        vocabulary_size: int,
46
        num_layers: int,
47
        hidden_dim: int,
48
        num_heads: int,
49
        **kwargs
50
    ): ...
51

52
class MoonshineAudioToText:
53
    """Moonshine model for audio-to-text conversion."""
54
    def __init__(
55
        self,
56
        backbone: MoonshineBackbone,
57
        preprocessor: Preprocessor = None,
58
        **kwargs
59
    ): ...
60

61
class MoonshineAudioToTextPreprocessor:
62
    """Preprocessor for Moonshine audio-to-text."""
63
    def __init__(
64
        self,
65
        audio_converter: AudioConverter,
66
        tokenizer: MoonshineTokenizer,
67
        **kwargs
68
    ): ...
69

70
class MoonshineTokenizer:
71
    """Moonshine tokenizer for text processing."""
72
    def __init__(
73
        self,
74
        vocabulary: dict = None,
75
        **kwargs
76
    ): ...
77

78
class MoonshineAudioConverter:
79
    """Audio converter for Moonshine models."""
80
    def __init__(
81
        self,
82
        sample_rate: int = 16000,
83
        num_mels: int = 80,
84
        hop_length: int = 160,
85
        win_length: int = 400,
86
        **kwargs
87
    ): ...
88
```
89

90
### Audio Converter Base Class
91

92
Base class for audio preprocessing and conversion.
93

94
```python { .api }
95
class AudioConverter:
96
    """Base class for audio data conversion."""
97
    def __init__(
98
        self,
99
        sample_rate: int = 16000,
100
        **kwargs
101
    ): ...
102
    
103
    def __call__(self, audio_data): ...
104
```
105

106
## Usage Examples
107

108
### Audio-to-Text with Moonshine
109

110
```python
111
import keras_hub
112
import numpy as np
113

114
# Load pretrained Moonshine model
115
model = keras_hub.models.MoonshineAudioToText.from_preset("moonshine_base")
116

117
# Prepare audio data (example with synthetic data)
118
# In practice, you would load actual audio files
119
audio_data = np.random.random((16000,))  # 1 second of audio at 16kHz
120
audio_batch = np.expand_dims(audio_data, axis=0)  # Add batch dimension
121

122
# Transcribe audio
123
transcription = model.predict(audio_batch)
124
print("Transcription:", transcription)
125
```
126

127
### Using Audio Converter
128

129
```python
130
import keras_hub
131

132
# Create audio converter
133
audio_converter = keras_hub.layers.MoonshineAudioConverter(
134
    sample_rate=16000,
135
    num_mels=80
136
)
137

138
# Convert audio to mel spectrogram features
139
audio_features = audio_converter(audio_data)
140
print(f"Audio features shape: {audio_features.shape}")
141
```
142

143
### Custom Audio Processing Pipeline
144

145
```python
146
import keras_hub
147

148
# Load backbone and create custom model
149
backbone = keras_hub.models.MoonshineBackbone.from_preset("moonshine_base")
150

151
# Create preprocessor
152
preprocessor = keras_hub.models.MoonshineAudioToTextPreprocessor(
153
    audio_converter=keras_hub.layers.MoonshineAudioConverter(),
154
    tokenizer=keras_hub.tokenizers.MoonshineTokenizer.from_preset("moonshine_base")
155
)
156

157
# Create custom model
158
model = keras_hub.models.MoonshineAudioToText(
159
    backbone=backbone,
160
    preprocessor=preprocessor
161
)
162

163
# Compile and use model
164
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")
165
```

Version

Tile

Files

audio-models.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

audio-models.mddocs/