or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

audio-io.mddatasets.mdeffects.mdfunctional.mdindex.mdmodels.mdpipelines.mdstreaming.mdtransforms.mdutils.md

audio-io.mddocs/

0

# Audio I/O Operations

1

2

Core functionality for loading, saving, and managing audio files with support for multiple backends and formats. TorchAudio provides a unified interface that works across different audio backends (FFmpeg, SoX, SoundFile) while maintaining consistent behavior and PyTorch tensor integration.

3

4

## Capabilities

5

6

### Audio Loading

7

8

Load audio files into PyTorch tensors with control over format, channel layout, and data windowing.

9

10

```python { .api }

11

def load(filepath: str, frame_offset: int = 0, num_frames: int = -1,

12

normalize: bool = True, channels_first: bool = True,

13

format: Optional[str] = None) -> Tuple[torch.Tensor, int]:

14

"""

15

Load audio file into tensor.

16

17

Args:

18

filepath: Path to audio file

19

frame_offset: Number of frames to skip at beginning

20

num_frames: Number of frames to load (-1 for all)

21

normalize: Whether to normalize audio to [-1, 1] range

22

channels_first: Whether to return shape (channels, time) or (time, channels)

23

format: Audio format override (auto-detected if None)

24

25

Returns:

26

Tuple of (waveform tensor, sample_rate)

27

- waveform: Audio data as tensor with shape (channels, samples) if channels_first=True

28

- sample_rate: Sample rate in Hz

29

"""

30

```

31

32

Usage example:

33

34

```python

35

import torchaudio

36

37

# Load entire audio file

38

waveform, sample_rate = torchaudio.load("speech.wav")

39

print(f"Shape: {waveform.shape}, Sample rate: {sample_rate}")

40

41

# Load specific segment (1 second starting at 2 seconds)

42

segment, sr = torchaudio.load("speech.wav", frame_offset=2*16000, num_frames=16000)

43

44

# Load with different channel ordering

45

waveform_tcf, sr = torchaudio.load("speech.wav", channels_first=False) # (time, channels)

46

```

47

48

### Audio Saving

49

50

Save PyTorch tensors as audio files with format control and compression options.

51

52

```python { .api }

53

def save(filepath: str, src: torch.Tensor, sample_rate: int,

54

channels_first: bool = True, compression: Optional[float] = None) -> None:

55

"""

56

Save tensor as audio file.

57

58

Args:

59

filepath: Output path (format determined by extension)

60

src: Audio tensor to save

61

sample_rate: Sample rate in Hz

62

channels_first: Whether input tensor has shape (channels, time) or (time, channels)

63

compression: Compression level (format-dependent, None for default)

64

"""

65

```

66

67

Usage example:

68

69

```python

70

import torch

71

import torchaudio

72

73

# Create synthetic audio

74

sample_rate = 16000

75

duration = 3 # 3 seconds

76

t = torch.linspace(0, duration, int(sample_rate * duration))

77

waveform = torch.sin(2 * torch.pi * 440 * t).unsqueeze(0) # 440 Hz sine wave

78

79

# Save in different formats

80

torchaudio.save("output.wav", waveform, sample_rate)

81

torchaudio.save("output.mp3", waveform, sample_rate, compression=128) # 128 kbps

82

torchaudio.save("output.flac", waveform, sample_rate)

83

```

84

85

### Audio Metadata

86

87

Extract metadata from audio files without loading the full audio data.

88

89

```python { .api }

90

def info(filepath: str, format: Optional[str] = None) -> AudioMetaData:

91

"""

92

Get audio file metadata.

93

94

Args:

95

filepath: Path to audio file

96

format: Audio format override (auto-detected if None)

97

98

Returns:

99

AudioMetaData object with file information

100

"""

101

102

class AudioMetaData:

103

"""Audio file metadata container."""

104

sample_rate: int # Sample rate in Hz

105

num_frames: int # Total number of audio frames

106

num_channels: int # Number of audio channels

107

bits_per_sample: int # Bits per sample (bit depth)

108

encoding: str # Audio encoding format

109

```

110

111

Usage example:

112

113

```python

114

import torchaudio

115

116

# Get file info without loading audio

117

metadata = torchaudio.info("audio.wav")

118

print(f"Duration: {metadata.num_frames / metadata.sample_rate:.2f} seconds")

119

print(f"Channels: {metadata.num_channels}")

120

print(f"Sample rate: {metadata.sample_rate} Hz")

121

print(f"Encoding: {metadata.encoding}")

122

print(f"Bit depth: {metadata.bits_per_sample}")

123

```

124

125

### TorchCodec Integration

126

127

Advanced loading and saving using TorchCodec backend for additional format support and streaming capabilities.

128

129

```python { .api }

130

def load_with_torchcodec(filepath: str, **kwargs) -> Tuple[torch.Tensor, int]:

131

"""

132

Load audio using TorchCodec backend.

133

134

Args:

135

filepath: Path to audio file

136

**kwargs: Additional TorchCodec-specific options

137

138

Returns:

139

Tuple of (waveform tensor, sample_rate)

140

"""

141

142

def save_with_torchcodec(filepath: str, src: torch.Tensor, sample_rate: int, **kwargs) -> None:

143

"""

144

Save audio using TorchCodec backend.

145

146

Args:

147

filepath: Output path

148

src: Audio tensor to save

149

sample_rate: Sample rate in Hz

150

**kwargs: Additional TorchCodec-specific options

151

"""

152

```

153

154

### Backend Management

155

156

Control which audio backend is used for I/O operations across TorchAudio.

157

158

```python { .api }

159

def list_audio_backends() -> List[str]:

160

"""

161

List available audio backends.

162

163

Returns:

164

List of backend names: ["ffmpeg", "sox", "soundfile"]

165

"""

166

167

def get_audio_backend() -> Optional[str]:

168

"""

169

Get currently active audio backend.

170

171

Returns:

172

Backend name or None if using dispatcher mode

173

"""

174

175

def set_audio_backend(backend: Optional[str]) -> None:

176

"""

177

Set global audio backend.

178

179

Args:

180

backend: Backend name ("sox_io", "soundfile") or None to unset

181

182

Note:

183

This function is deprecated with dispatcher mode enabled.

184

Modern TorchAudio automatically selects the best backend.

185

"""

186

```

187

188

Usage example:

189

190

```python

191

import torchaudio

192

193

# Check available backends

194

backends = torchaudio.list_audio_backends()

195

print(f"Available backends: {backends}")

196

197

# Check current backend (returns None in dispatcher mode)

198

current = torchaudio.get_audio_backend()

199

print(f"Current backend: {current}")

200

```

201

202

## Supported Audio Formats

203

204

TorchAudio supports a wide variety of audio formats through its multiple backends:

205

206

### Common Formats

207

- **WAV**: Uncompressed PCM audio (16-bit, 24-bit, 32-bit, float)

208

- **MP3**: MPEG Layer-3 compressed audio

209

- **FLAC**: Free Lossless Audio Codec

210

- **OGG/Vorbis**: Open-source compressed format

211

- **M4A/AAC**: Advanced Audio Coding

212

- **OPUS**: Modern low-latency codec

213

214

### Professional Formats

215

- **AIFF**: Audio Interchange File Format

216

- **SPHERE**: NIST SPHERE format (speech processing)

217

- **AU**: Sun/NeXT audio format

218

- **AMR**: Adaptive Multi-Rate (mobile audio)

219

220

### Backend-Specific Support

221

- **FFmpeg backend**: Widest format support including video containers

222

- **SoX backend**: Professional audio processing formats

223

- **SoundFile backend**: High-quality uncompressed formats

224

225

## Error Handling

226

227

Common exceptions when working with audio I/O:

228

229

```python

230

import torchaudio

231

232

try:

233

waveform, sr = torchaudio.load("nonexistent.wav")

234

except FileNotFoundError:

235

print("Audio file not found")

236

237

try:

238

waveform, sr = torchaudio.load("corrupted.wav")

239

except RuntimeError as e:

240

print(f"Failed to load audio: {e}")

241

242

try:

243

torchaudio.save("readonly/output.wav", waveform, sr)

244

except PermissionError:

245

print("Cannot write to readonly directory")

246

```