0
# Audio I/O Operations
1
2
Core functionality for loading, saving, and managing audio files with support for multiple backends and formats. TorchAudio provides a unified interface that works across different audio backends (FFmpeg, SoX, SoundFile) while maintaining consistent behavior and PyTorch tensor integration.
3
4
## Capabilities
5
6
### Audio Loading
7
8
Load audio files into PyTorch tensors with control over format, channel layout, and data windowing.
9
10
```python { .api }
11
def load(filepath: str, frame_offset: int = 0, num_frames: int = -1,
12
normalize: bool = True, channels_first: bool = True,
13
format: Optional[str] = None) -> Tuple[torch.Tensor, int]:
14
"""
15
Load audio file into tensor.
16
17
Args:
18
filepath: Path to audio file
19
frame_offset: Number of frames to skip at beginning
20
num_frames: Number of frames to load (-1 for all)
21
normalize: Whether to normalize audio to [-1, 1] range
22
channels_first: Whether to return shape (channels, time) or (time, channels)
23
format: Audio format override (auto-detected if None)
24
25
Returns:
26
Tuple of (waveform tensor, sample_rate)
27
- waveform: Audio data as tensor with shape (channels, samples) if channels_first=True
28
- sample_rate: Sample rate in Hz
29
"""
30
```
31
32
Usage example:
33
34
```python
35
import torchaudio
36
37
# Load entire audio file
38
waveform, sample_rate = torchaudio.load("speech.wav")
39
print(f"Shape: {waveform.shape}, Sample rate: {sample_rate}")
40
41
# Load specific segment (1 second starting at 2 seconds)
42
segment, sr = torchaudio.load("speech.wav", frame_offset=2*16000, num_frames=16000)
43
44
# Load with different channel ordering
45
waveform_tcf, sr = torchaudio.load("speech.wav", channels_first=False) # (time, channels)
46
```
47
48
### Audio Saving
49
50
Save PyTorch tensors as audio files with format control and compression options.
51
52
```python { .api }
53
def save(filepath: str, src: torch.Tensor, sample_rate: int,
54
channels_first: bool = True, compression: Optional[float] = None) -> None:
55
"""
56
Save tensor as audio file.
57
58
Args:
59
filepath: Output path (format determined by extension)
60
src: Audio tensor to save
61
sample_rate: Sample rate in Hz
62
channels_first: Whether input tensor has shape (channels, time) or (time, channels)
63
compression: Compression level (format-dependent, None for default)
64
"""
65
```
66
67
Usage example:
68
69
```python
70
import torch
71
import torchaudio
72
73
# Create synthetic audio
74
sample_rate = 16000
75
duration = 3 # 3 seconds
76
t = torch.linspace(0, duration, int(sample_rate * duration))
77
waveform = torch.sin(2 * torch.pi * 440 * t).unsqueeze(0) # 440 Hz sine wave
78
79
# Save in different formats
80
torchaudio.save("output.wav", waveform, sample_rate)
81
torchaudio.save("output.mp3", waveform, sample_rate, compression=128) # 128 kbps
82
torchaudio.save("output.flac", waveform, sample_rate)
83
```
84
85
### Audio Metadata
86
87
Extract metadata from audio files without loading the full audio data.
88
89
```python { .api }
90
def info(filepath: str, format: Optional[str] = None) -> AudioMetaData:
91
"""
92
Get audio file metadata.
93
94
Args:
95
filepath: Path to audio file
96
format: Audio format override (auto-detected if None)
97
98
Returns:
99
AudioMetaData object with file information
100
"""
101
102
class AudioMetaData:
103
"""Audio file metadata container."""
104
sample_rate: int # Sample rate in Hz
105
num_frames: int # Total number of audio frames
106
num_channels: int # Number of audio channels
107
bits_per_sample: int # Bits per sample (bit depth)
108
encoding: str # Audio encoding format
109
```
110
111
Usage example:
112
113
```python
114
import torchaudio
115
116
# Get file info without loading audio
117
metadata = torchaudio.info("audio.wav")
118
print(f"Duration: {metadata.num_frames / metadata.sample_rate:.2f} seconds")
119
print(f"Channels: {metadata.num_channels}")
120
print(f"Sample rate: {metadata.sample_rate} Hz")
121
print(f"Encoding: {metadata.encoding}")
122
print(f"Bit depth: {metadata.bits_per_sample}")
123
```
124
125
### TorchCodec Integration
126
127
Advanced loading and saving using TorchCodec backend for additional format support and streaming capabilities.
128
129
```python { .api }
130
def load_with_torchcodec(filepath: str, **kwargs) -> Tuple[torch.Tensor, int]:
131
"""
132
Load audio using TorchCodec backend.
133
134
Args:
135
filepath: Path to audio file
136
**kwargs: Additional TorchCodec-specific options
137
138
Returns:
139
Tuple of (waveform tensor, sample_rate)
140
"""
141
142
def save_with_torchcodec(filepath: str, src: torch.Tensor, sample_rate: int, **kwargs) -> None:
143
"""
144
Save audio using TorchCodec backend.
145
146
Args:
147
filepath: Output path
148
src: Audio tensor to save
149
sample_rate: Sample rate in Hz
150
**kwargs: Additional TorchCodec-specific options
151
"""
152
```
153
154
### Backend Management
155
156
Control which audio backend is used for I/O operations across TorchAudio.
157
158
```python { .api }
159
def list_audio_backends() -> List[str]:
160
"""
161
List available audio backends.
162
163
Returns:
164
List of backend names: ["ffmpeg", "sox", "soundfile"]
165
"""
166
167
def get_audio_backend() -> Optional[str]:
168
"""
169
Get currently active audio backend.
170
171
Returns:
172
Backend name or None if using dispatcher mode
173
"""
174
175
def set_audio_backend(backend: Optional[str]) -> None:
176
"""
177
Set global audio backend.
178
179
Args:
180
backend: Backend name ("sox_io", "soundfile") or None to unset
181
182
Note:
183
This function is deprecated with dispatcher mode enabled.
184
Modern TorchAudio automatically selects the best backend.
185
"""
186
```
187
188
Usage example:
189
190
```python
191
import torchaudio
192
193
# Check available backends
194
backends = torchaudio.list_audio_backends()
195
print(f"Available backends: {backends}")
196
197
# Check current backend (returns None in dispatcher mode)
198
current = torchaudio.get_audio_backend()
199
print(f"Current backend: {current}")
200
```
201
202
## Supported Audio Formats
203
204
TorchAudio supports a wide variety of audio formats through its multiple backends:
205
206
### Common Formats
207
- **WAV**: Uncompressed PCM audio (16-bit, 24-bit, 32-bit, float)
208
- **MP3**: MPEG Layer-3 compressed audio
209
- **FLAC**: Free Lossless Audio Codec
210
- **OGG/Vorbis**: Open-source compressed format
211
- **M4A/AAC**: Advanced Audio Coding
212
- **OPUS**: Modern low-latency codec
213
214
### Professional Formats
215
- **AIFF**: Audio Interchange File Format
216
- **SPHERE**: NIST SPHERE format (speech processing)
217
- **AU**: Sun/NeXT audio format
218
- **AMR**: Adaptive Multi-Rate (mobile audio)
219
220
### Backend-Specific Support
221
- **FFmpeg backend**: Widest format support including video containers
222
- **SoX backend**: Professional audio processing formats
223
- **SoundFile backend**: High-quality uncompressed formats
224
225
## Error Handling
226
227
Common exceptions when working with audio I/O:
228
229
```python
230
import torchaudio
231
232
try:
233
waveform, sr = torchaudio.load("nonexistent.wav")
234
except FileNotFoundError:
235
print("Audio file not found")
236
237
try:
238
waveform, sr = torchaudio.load("corrupted.wav")
239
except RuntimeError as e:
240
print(f"Failed to load audio: {e}")
241
242
try:
243
torchaudio.save("readonly/output.wav", waveform, sr)
244
except PermissionError:
245
print("Cannot write to readonly directory")
246
```