or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

audio-models.mdevaluation-metrics.mdgenerative-models.mdimage-models.mdindex.mdlayers-components.mdmultimodal-models.mdtext-generation-sampling.mdtext-models.mdtokenizers.mdutilities-helpers.md

audio-models.mddocs/

0

# Audio Models

1

2

Audio processing models for speech recognition, audio-to-text conversion, and audio understanding tasks. Keras Hub provides implementations of state-of-the-art audio models including Whisper and Moonshine.

3

4

## Capabilities

5

6

### Whisper (OpenAI Speech Recognition Model)

7

8

Whisper is a robust speech recognition model that can transcribe audio in multiple languages and handle various audio conditions.

9

10

```python { .api }

11

class WhisperBackbone(Backbone):

12

"""Whisper transformer backbone for speech recognition."""

13

def __init__(

14

self,

15

vocabulary_size: int,

16

num_layers: int,

17

num_heads: int,

18

hidden_dim: int,

19

intermediate_dim: int,

20

num_mels: int = 80,

21

dropout: float = 0.0,

22

max_encoder_sequence_length: int = 3000,

23

max_decoder_sequence_length: int = 448,

24

**kwargs

25

): ...

26

27

class WhisperTokenizer:

28

"""Whisper tokenizer for text processing."""

29

def __init__(

30

self,

31

vocabulary: dict = None,

32

**kwargs

33

): ...

34

```

35

36

### Moonshine (Efficient Speech Recognition)

37

38

Moonshine is an efficient speech recognition model optimized for fast inference and low resource usage.

39

40

```python { .api }

41

class MoonshineBackbone(Backbone):

42

"""Moonshine backbone for audio-to-text conversion."""

43

def __init__(

44

self,

45

vocabulary_size: int,

46

num_layers: int,

47

hidden_dim: int,

48

num_heads: int,

49

**kwargs

50

): ...

51

52

class MoonshineAudioToText:

53

"""Moonshine model for audio-to-text conversion."""

54

def __init__(

55

self,

56

backbone: MoonshineBackbone,

57

preprocessor: Preprocessor = None,

58

**kwargs

59

): ...

60

61

class MoonshineAudioToTextPreprocessor:

62

"""Preprocessor for Moonshine audio-to-text."""

63

def __init__(

64

self,

65

audio_converter: AudioConverter,

66

tokenizer: MoonshineTokenizer,

67

**kwargs

68

): ...

69

70

class MoonshineTokenizer:

71

"""Moonshine tokenizer for text processing."""

72

def __init__(

73

self,

74

vocabulary: dict = None,

75

**kwargs

76

): ...

77

78

class MoonshineAudioConverter:

79

"""Audio converter for Moonshine models."""

80

def __init__(

81

self,

82

sample_rate: int = 16000,

83

num_mels: int = 80,

84

hop_length: int = 160,

85

win_length: int = 400,

86

**kwargs

87

): ...

88

```

89

90

### Audio Converter Base Class

91

92

Base class for audio preprocessing and conversion.

93

94

```python { .api }

95

class AudioConverter:

96

"""Base class for audio data conversion."""

97

def __init__(

98

self,

99

sample_rate: int = 16000,

100

**kwargs

101

): ...

102

103

def __call__(self, audio_data): ...

104

```

105

106

## Usage Examples

107

108

### Audio-to-Text with Moonshine

109

110

```python

111

import keras_hub

112

import numpy as np

113

114

# Load pretrained Moonshine model

115

model = keras_hub.models.MoonshineAudioToText.from_preset("moonshine_base")

116

117

# Prepare audio data (example with synthetic data)

118

# In practice, you would load actual audio files

119

audio_data = np.random.random((16000,)) # 1 second of audio at 16kHz

120

audio_batch = np.expand_dims(audio_data, axis=0) # Add batch dimension

121

122

# Transcribe audio

123

transcription = model.predict(audio_batch)

124

print("Transcription:", transcription)

125

```

126

127

### Using Audio Converter

128

129

```python

130

import keras_hub

131

132

# Create audio converter

133

audio_converter = keras_hub.layers.MoonshineAudioConverter(

134

sample_rate=16000,

135

num_mels=80

136

)

137

138

# Convert audio to mel spectrogram features

139

audio_features = audio_converter(audio_data)

140

print(f"Audio features shape: {audio_features.shape}")

141

```

142

143

### Custom Audio Processing Pipeline

144

145

```python

146

import keras_hub

147

148

# Load backbone and create custom model

149

backbone = keras_hub.models.MoonshineBackbone.from_preset("moonshine_base")

150

151

# Create preprocessor

152

preprocessor = keras_hub.models.MoonshineAudioToTextPreprocessor(

153

audio_converter=keras_hub.layers.MoonshineAudioConverter(),

154

tokenizer=keras_hub.tokenizers.MoonshineTokenizer.from_preset("moonshine_base")

155

)

156

157

# Create custom model

158

model = keras_hub.models.MoonshineAudioToText(

159

backbone=backbone,

160

preprocessor=preprocessor

161

)

162

163

# Compile and use model

164

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")

165

```