Tessl Tile for pypi/torchaudio@2.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

audio-io.md datasets.md effects.md functional.md index.md models.md pipelines.md streaming.md transforms.md utils.md

audio-io.mddocs/

0
# Audio I/O Operations
1

2
Core functionality for loading, saving, and managing audio files with support for multiple backends and formats. TorchAudio provides a unified interface that works across different audio backends (FFmpeg, SoX, SoundFile) while maintaining consistent behavior and PyTorch tensor integration.
3

4
## Capabilities
5

6
### Audio Loading
7

8
Load audio files into PyTorch tensors with control over format, channel layout, and data windowing.
9

10
```python { .api }
11
def load(filepath: str, frame_offset: int = 0, num_frames: int = -1, 
12
         normalize: bool = True, channels_first: bool = True, 
13
         format: Optional[str] = None) -> Tuple[torch.Tensor, int]:
14
    """
15
    Load audio file into tensor.
16

17
    Args:
18
        filepath: Path to audio file
19
        frame_offset: Number of frames to skip at beginning 
20
        num_frames: Number of frames to load (-1 for all)
21
        normalize: Whether to normalize audio to [-1, 1] range
22
        channels_first: Whether to return shape (channels, time) or (time, channels)
23
        format: Audio format override (auto-detected if None)
24

25
    Returns:
26
        Tuple of (waveform tensor, sample_rate)
27
        - waveform: Audio data as tensor with shape (channels, samples) if channels_first=True
28
        - sample_rate: Sample rate in Hz
29
    """
30
```
31

32
Usage example:
33

34
```python
35
import torchaudio
36

37
# Load entire audio file
38
waveform, sample_rate = torchaudio.load("speech.wav")
39
print(f"Shape: {waveform.shape}, Sample rate: {sample_rate}")
40

41
# Load specific segment (1 second starting at 2 seconds)
42
segment, sr = torchaudio.load("speech.wav", frame_offset=2*16000, num_frames=16000)
43

44
# Load with different channel ordering
45
waveform_tcf, sr = torchaudio.load("speech.wav", channels_first=False)  # (time, channels)
46
```
47

48
### Audio Saving
49

50
Save PyTorch tensors as audio files with format control and compression options.
51

52
```python { .api }
53
def save(filepath: str, src: torch.Tensor, sample_rate: int, 
54
         channels_first: bool = True, compression: Optional[float] = None) -> None:
55
    """
56
    Save tensor as audio file.
57

58
    Args:
59
        filepath: Output path (format determined by extension)
60
        src: Audio tensor to save
61
        sample_rate: Sample rate in Hz
62
        channels_first: Whether input tensor has shape (channels, time) or (time, channels)  
63
        compression: Compression level (format-dependent, None for default)
64
    """
65
```
66

67
Usage example:
68

69
```python
70
import torch
71
import torchaudio
72

73
# Create synthetic audio
74
sample_rate = 16000
75
duration = 3  # 3 seconds
76
t = torch.linspace(0, duration, int(sample_rate * duration))
77
waveform = torch.sin(2 * torch.pi * 440 * t).unsqueeze(0)  # 440 Hz sine wave
78

79
# Save in different formats
80
torchaudio.save("output.wav", waveform, sample_rate)
81
torchaudio.save("output.mp3", waveform, sample_rate, compression=128)  # 128 kbps
82
torchaudio.save("output.flac", waveform, sample_rate)
83
```
84

85
### Audio Metadata
86

87
Extract metadata from audio files without loading the full audio data.
88

89
```python { .api }
90
def info(filepath: str, format: Optional[str] = None) -> AudioMetaData:
91
    """
92
    Get audio file metadata.
93

94
    Args:
95
        filepath: Path to audio file
96
        format: Audio format override (auto-detected if None)
97

98
    Returns:
99
        AudioMetaData object with file information
100
    """
101

102
class AudioMetaData:
103
    """Audio file metadata container."""
104
    sample_rate: int        # Sample rate in Hz
105
    num_frames: int        # Total number of audio frames
106
    num_channels: int      # Number of audio channels
107
    bits_per_sample: int   # Bits per sample (bit depth)
108
    encoding: str          # Audio encoding format
109
```
110

111
Usage example:
112

113
```python
114
import torchaudio
115

116
# Get file info without loading audio
117
metadata = torchaudio.info("audio.wav")
118
print(f"Duration: {metadata.num_frames / metadata.sample_rate:.2f} seconds")
119
print(f"Channels: {metadata.num_channels}")
120
print(f"Sample rate: {metadata.sample_rate} Hz")  
121
print(f"Encoding: {metadata.encoding}")
122
print(f"Bit depth: {metadata.bits_per_sample}")
123
```
124

125
### TorchCodec Integration
126

127
Advanced loading and saving using TorchCodec backend for additional format support and streaming capabilities.
128

129
```python { .api }
130
def load_with_torchcodec(filepath: str, **kwargs) -> Tuple[torch.Tensor, int]:
131
    """
132
    Load audio using TorchCodec backend.
133

134
    Args:
135
        filepath: Path to audio file
136
        **kwargs: Additional TorchCodec-specific options
137

138
    Returns:
139
        Tuple of (waveform tensor, sample_rate)
140
    """
141

142
def save_with_torchcodec(filepath: str, src: torch.Tensor, sample_rate: int, **kwargs) -> None:
143
    """
144
    Save audio using TorchCodec backend.
145

146
    Args:
147
        filepath: Output path
148
        src: Audio tensor to save
149
        sample_rate: Sample rate in Hz
150
        **kwargs: Additional TorchCodec-specific options
151
    """
152
```
153

154
### Backend Management
155

156
Control which audio backend is used for I/O operations across TorchAudio.
157

158
```python { .api }
159
def list_audio_backends() -> List[str]:
160
    """
161
    List available audio backends.
162

163
    Returns:
164
        List of backend names: ["ffmpeg", "sox", "soundfile"]
165
    """
166

167
def get_audio_backend() -> Optional[str]:
168
    """
169
    Get currently active audio backend.
170
    
171
    Returns:
172
        Backend name or None if using dispatcher mode
173
    """
174

175
def set_audio_backend(backend: Optional[str]) -> None:
176
    """
177
    Set global audio backend.
178
    
179
    Args:
180
        backend: Backend name ("sox_io", "soundfile") or None to unset
181
    
182
    Note:
183
        This function is deprecated with dispatcher mode enabled.
184
        Modern TorchAudio automatically selects the best backend.
185
    """
186
```
187

188
Usage example:
189

190
```python
191
import torchaudio
192

193
# Check available backends
194
backends = torchaudio.list_audio_backends()
195
print(f"Available backends: {backends}")
196

197
# Check current backend (returns None in dispatcher mode)
198
current = torchaudio.get_audio_backend()
199
print(f"Current backend: {current}")
200
```
201

202
## Supported Audio Formats
203

204
TorchAudio supports a wide variety of audio formats through its multiple backends:
205

206
### Common Formats
207
- **WAV**: Uncompressed PCM audio (16-bit, 24-bit, 32-bit, float)
208
- **MP3**: MPEG Layer-3 compressed audio
209
- **FLAC**: Free Lossless Audio Codec
210
- **OGG/Vorbis**: Open-source compressed format
211
- **M4A/AAC**: Advanced Audio Coding
212
- **OPUS**: Modern low-latency codec
213

214
### Professional Formats  
215
- **AIFF**: Audio Interchange File Format
216
- **SPHERE**: NIST SPHERE format (speech processing)
217
- **AU**: Sun/NeXT audio format
218
- **AMR**: Adaptive Multi-Rate (mobile audio)
219

220
### Backend-Specific Support
221
- **FFmpeg backend**: Widest format support including video containers
222
- **SoX backend**: Professional audio processing formats
223
- **SoundFile backend**: High-quality uncompressed formats
224

225
## Error Handling
226

227
Common exceptions when working with audio I/O:
228

229
```python
230
import torchaudio
231

232
try:
233
    waveform, sr = torchaudio.load("nonexistent.wav")
234
except FileNotFoundError:
235
    print("Audio file not found")
236

237
try:
238
    waveform, sr = torchaudio.load("corrupted.wav")
239
except RuntimeError as e:
240
    print(f"Failed to load audio: {e}")
241

242
try:
243
    torchaudio.save("readonly/output.wav", waveform, sr)
244
except PermissionError:
245
    print("Cannot write to readonly directory")
246
```

Version

Tile

Files

audio-io.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

audio-io.mddocs/