Tessl Tile for pypi/torchaudio@2.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

audio-io.md datasets.md effects.md functional.md index.md models.md pipelines.md streaming.md transforms.md utils.md

functional.mddocs/

0
# Signal Processing Functions
1

2
Extensive collection of stateless audio processing functions for spectral analysis, filtering, resampling, pitch manipulation, and advanced signal processing algorithms. These functions operate directly on tensors and are compatible with PyTorch's autograd system for gradient-based optimization.
3

4
## Capabilities
5

6
### Spectral Analysis
7

8
Core spectral analysis functions for converting between time and frequency domains.
9

10
```python { .api }
11
def spectrogram(waveform: torch.Tensor, pad: int, window: torch.Tensor,
12
                n_fft: int, hop_length: int, win_length: int, 
13
                power: Optional[float], normalized: Union[bool, str],
14
                center: bool = True, pad_mode: str = "reflect", 
15
                onesided: bool = True, return_complex: Optional[bool] = None) -> torch.Tensor:
16
    """
17
    Create spectrogram from waveform.
18

19
    Args:
20
        waveform: Tensor of audio of dimension (..., time)
21
        pad: Two sided padding of signal
22
        window: Window tensor that is applied/multiplied to each frame/window
23
        n_fft: Size of FFT
24
        hop_length: Length of hop between STFT windows
25
        win_length: Window size
26
        power: Exponent for the magnitude spectrogram (must be > 0) e.g., 1 for magnitude, 2 for power, etc. If None, then the complex spectrum is returned instead.
27
        normalized: Whether to normalize by magnitude after stft. If input is str, choices are "window" and "frame_length", if specific normalization type is desirable. True maps to "window".
28
        center: whether to pad waveform on both sides so that the t-th frame is centered at time t × hop_length
29
        pad_mode: controls the padding method used when center is True
30
        onesided: controls whether to return half of results to avoid redundancy
31
        return_complex: Deprecated, use power=None instead
32

33
    Returns:
34
        Tensor: Spectrogram with shape (..., freq, time)
35
    """
36

37
def inverse_spectrogram(spectrogram: torch.Tensor, length: Optional[int] = None,
38
                        pad: int = 0, window: Optional[torch.Tensor] = None,
39
                        n_fft: int = 400, hop_length: Optional[int] = None,
40
                        win_length: Optional[int] = None, normalized: bool = False,
41
                        center: bool = True, pad_mode: str = "reflect",
42
                        onesided: bool = True) -> torch.Tensor:
43
    """
44
    Reconstruct waveform from spectrogram using inverse STFT.
45

46
    Args:
47
        spectrogram: Input spectrogram (..., freq, time)
48
        length: Expected length of output
49
        (other parameters same as spectrogram)
50

51
    Returns:
52
        Tensor: Reconstructed waveform (..., time)
53
    """
54

55
def griffinlim(spectrogram: torch.Tensor, window: Optional[torch.Tensor] = None,
56
               n_fft: int = 400, hop_length: Optional[int] = None,
57
               win_length: Optional[int] = None, power: float = 2.0,
58
               n_iter: int = 32, momentum: float = 0.99,
59
               length: Optional[int] = None, rand_init: bool = True) -> torch.Tensor:
60
    """
61
    Reconstruct waveform from magnitude spectrogram using Griffin-Lim algorithm.
62

63
    Args:
64
        spectrogram: Magnitude spectrogram (..., freq, time)
65
        window: Window function
66
        n_fft: Size of FFT
67
        hop_length: Length of hop between STFT windows
68
        win_length: Window size
69
        power: Exponent applied to spectrogram
70
        n_iter: Number of Griffin-Lim iterations
71
        momentum: Momentum parameter for fast Griffin-Lim
72
        length: Expected output length
73
        rand_init: Whether to initialize with random phase
74

75
    Returns:
76
        Tensor: Reconstructed waveform (..., time)
77
    """
78
```
79

80
### Mel-Scale Processing
81

82
Functions for mel-scale analysis commonly used in speech and music processing.
83

84
```python { .api }
85
def melscale_fbanks(n_freqs: int, f_min: float, f_max: float, n_mels: int,
86
                    sample_rate: int, norm: Optional[str] = None,
87
                    mel_scale: str = "htk") -> torch.Tensor:
88
    """
89
    Create mel-scale filter banks.
90

91
    Args:
92
        n_freqs: Number of frequency bins (typically n_fft // 2 + 1)
93
        f_min: Minimum frequency
94
        f_max: Maximum frequency
95
        n_mels: Number of mel filter banks
96
        sample_rate: Sample rate of audio
97
        norm: Normalization method ("slaney" or None)
98
        mel_scale: Scale to use ("htk" or "slaney")
99

100
    Returns:
101
        Tensor: Mel filter bank matrix (n_mels, n_freqs)
102
    """
103

104
def linear_fbanks(n_freqs: int, f_min: float, f_max: float, n_filter: int,
105
                  sample_rate: int) -> torch.Tensor:
106
    """
107
    Create linear-spaced filter banks.
108

109
    Args:
110
        n_freqs: Number of frequency bins
111
        f_min: Minimum frequency
112
        f_max: Maximum frequency  
113
        n_filter: Number of linear filter banks
114
        sample_rate: Sample rate of audio
115

116
    Returns:
117
        Tensor: Linear filter bank matrix (n_filter, n_freqs)
118
    """
119
```
120

121
### Amplitude and Decibel Conversion
122

123
Functions for converting between linear amplitude and logarithmic decibel scales.
124

125
```python { .api }
126
def amplitude_to_DB(x: torch.Tensor, multiplier: float = 10.0, amin: float = 1e-10,
127
                    db_multiplier: float = 0.0, top_db: Optional[float] = None) -> torch.Tensor:
128
    """
129
    Convert amplitude spectrogram to decibel scale.
130

131
    Args:
132
        x: Input tensor (amplitude or power spectrogram)
133
        multiplier: Multiplier for log10 (10.0 for power, 20.0 for amplitude)
134
        amin: Minimum value to clamp x
135
        db_multiplier: Additional multiplier for result
136
        top_db: Minimum negative cut-off in decibels
137

138
    Returns:
139
        Tensor: Spectrogram in decibel scale
140
    """
141

142
def DB_to_amplitude(x: torch.Tensor, ref: float = 1.0, power: float = 1.0) -> torch.Tensor:
143
    """
144
    Convert decibel scale back to amplitude.
145

146
    Args:
147
        x: Input tensor in decibel scale
148
        ref: Reference value
149
        power: Power exponent (1.0 for amplitude, 2.0 for power)
150

151
    Returns:
152
        Tensor: Amplitude spectrogram
153
    """
154
```
155

156
### Resampling
157

158
Audio resampling for sample rate conversion.
159

160
```python { .api }
161
def resample(waveform: torch.Tensor, orig_freq: int, new_freq: int,
162
             resampling_method: str = "sinc_interp_kaiser",
163
             lowpass_filter_width: int = 6, rolloff: float = 0.99,
164
             beta: Optional[float] = None) -> torch.Tensor:
165
    """
166
    Resample waveform to different sample rate.
167

168
    Args:
169
        waveform: Input waveform tensor (..., time)
170
        orig_freq: Original sample rate
171
        new_freq: Target sample rate
172
        resampling_method: Resampling algorithm ("sinc_interp_kaiser" or "sinc_interp_hann")
173
        lowpass_filter_width: Width of lowpass filter
174
        rolloff: Roll-off frequency of lowpass filter
175
        beta: Shape parameter for Kaiser window
176

177
    Returns:
178
        Tensor: Resampled waveform
179
    """
180
```
181

182
### Audio Effects and Filtering
183

184
Comprehensive collection of audio filters and effects.
185

186
```python { .api }
187
def biquad(waveform: torch.Tensor, b0: float, b1: float, b2: float,
188
           a0: float, a1: float, a2: float) -> torch.Tensor:
189
    """
190
    Apply biquad IIR filter.
191

192
    Args:
193
        waveform: Input audio (..., time)
194
        b0, b1, b2: Numerator coefficients
195
        a0, a1, a2: Denominator coefficients
196

197
    Returns:
198
        Tensor: Filtered audio
199
    """
200

201
def allpass_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float, Q: float = 0.707) -> torch.Tensor:
202
    """
203
    Design two-pole all-pass filter. Similar to SoX implementation.
204

205
    Args:
206
        waveform: Audio waveform of dimension (..., time)
207
        sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)
208
        central_freq: Central frequency (in Hz)
209
        Q: Q factor (Default: 0.707)
210

211
    Returns:
212
        Tensor: Waveform of dimension (..., time)
213
    """
214

215
def band_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float,
216
                Q: float = 0.707, noise: bool = False) -> torch.Tensor:
217
    """
218
    Design two-pole band filter. Similar to SoX implementation.
219

220
    Args:
221
        waveform: Audio waveform of dimension (..., time)
222
        sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)
223
        central_freq: Central frequency (in Hz)
224
        Q: Q factor (Default: 0.707)
225
        noise: Add noise to the filter
226

227
    Returns:
228
        Tensor: Waveform of dimension (..., time)
229
    """
230

231
def bandpass_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float,
232
                    Q: float = 0.707, const_skirt_gain: bool = False) -> torch.Tensor:
233
    """
234
    Design two-pole band-pass filter. Similar to SoX implementation.
235

236
    Args:
237
        waveform: Audio waveform of dimension (..., time)
238
        sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)
239
        central_freq: Central frequency (in Hz)
240
        Q: Q factor (Default: 0.707)
241
        const_skirt_gain: Constant skirt gain
242

243
    Returns:
244
        Tensor: Waveform of dimension (..., time)
245
    """
246

247
def bandreject_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float, Q: float = 0.707) -> torch.Tensor:
248
    """
249
    Design two-pole band-reject filter. Similar to SoX implementation.
250

251
    Args:
252
        waveform: Audio waveform of dimension (..., time)
253
        sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)
254
        central_freq: Central frequency (in Hz)
255
        Q: Q factor (Default: 0.707)
256

257
    Returns:
258
        Tensor: Waveform of dimension (..., time)
259
    """
260

261
def bass_biquad(waveform: torch.Tensor, sample_rate: int, gain: float,
262
                central_freq: float = 100, Q: float = 0.707) -> torch.Tensor:
263
    """
264
    Design a bass tone-control effect. Similar to SoX implementation.
265

266
    Args:
267
        waveform: Audio waveform of dimension (..., time)
268
        sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)
269
        gain: Gain in dB
270
        central_freq: Central frequency (in Hz, default: 100)
271
        Q: Q factor (Default: 0.707)
272

273
    Returns:
274
        Tensor: Waveform of dimension (..., time)
275
    """
276

277
def contrast(waveform: torch.Tensor, enhancement_amount: float = 75.0) -> torch.Tensor:
278
    """
279
    Apply contrast effect. Similar to SoX implementation.
280

281
    Args:
282
        waveform: Audio waveform of dimension (..., time)
283
        enhancement_amount: Enhancement amount (default: 75.0)
284

285
    Returns:
286
        Tensor: Waveform of dimension (..., time)
287
    """
288

289
def dcshift(waveform: torch.Tensor, shift: float, limiter_gain: Optional[float] = None) -> torch.Tensor:
290
    """
291
    Apply a DC shift to the audio. Similar to SoX implementation.
292

293
    Args:
294
        waveform: Audio waveform of dimension (..., time)
295
        shift: DC shift amount
296
        limiter_gain: Optional limiter gain
297

298
    Returns:
299
        Tensor: Waveform of dimension (..., time)
300
    """
301

302
def deemph_biquad(waveform: torch.Tensor, sample_rate: int) -> torch.Tensor:
303
    """
304
    Apply ISO 908 CD de-emphasis (shelving) IIR filter. Similar to SoX implementation.
305

306
    Args:
307
        waveform: Audio waveform of dimension (..., time)
308
        sample_rate: Sampling rate of the waveform
309

310
    Returns:
311
        Tensor: Waveform of dimension (..., time)
312
    """
313

314
def dither(waveform: torch.Tensor, density_function: str = "TPDF", noise_shaping: bool = False) -> torch.Tensor:
315
    """
316
    Apply dither. Dither increases the perceived dynamic range of audio stored at a particular bit-depth.
317

318
    Args:
319
        waveform: Audio waveform of dimension (..., time)
320
        density_function: Density function ("TPDF", "RPDF", "GPDF")
321
        noise_shaping: Apply noise shaping
322

323
    Returns:
324
        Tensor: Dithered waveform
325
    """
326

327
def equalizer_biquad(waveform: torch.Tensor, sample_rate: int, center_freq: float,
328
                     gain: float, Q: float = 0.707) -> torch.Tensor:
329
    """
330
    Design biquad peaking equalizer filter and perform filtering. Similar to SoX implementation.
331

332
    Args:
333
        waveform: Audio waveform of dimension (..., time)
334
        sample_rate: Sampling rate of the waveform
335
        center_freq: Center frequency (in Hz)
336
        gain: Gain in dB
337
        Q: Q factor (Default: 0.707)
338

339
    Returns:
340
        Tensor: Waveform of dimension (..., time)
341
    """
342

343
def filtfilt(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,
344
             clamp: bool = True) -> torch.Tensor:
345
    """
346
    Apply an IIR filter forward and backward to a waveform. Inspired by scipy.signal.filtfilt.
347

348
    Args:
349
        waveform: Input waveform (..., time)
350
        a_coeffs: Denominator coefficients of the filter
351
        b_coeffs: Numerator coefficients of the filter
352
        clamp: Clamp intermediate values
353

354
    Returns:
355
        Tensor: Zero-phase filtered waveform
356
    """
357

358
def flanger(waveform: torch.Tensor, sample_rate: int, delay: float = 0.0,
359
            depth: float = 2.0, regen: float = 0.0, width: float = 71.0,
360
            speed: float = 0.5, phase: float = 25.0, modulation: str = "sinusoidal",
361
            interpolation: str = "linear") -> torch.Tensor:
362
    """
363
    Apply a flanger effect to the audio. Similar to SoX implementation.
364

365
    Args:
366
        waveform: Audio waveform of dimension (..., channel, time)
367
        sample_rate: Sampling rate of the waveform
368
        delay: Base delay in milliseconds
369
        depth: Delay depth in milliseconds
370
        regen: Regeneration (feedback) in percent
371
        width: Delay line width in percent
372
        speed: Modulation speed in Hz
373
        phase: Phase in percent
374
        modulation: Modulation type ("sinusoidal" or "triangular")
375
        interpolation: Interpolation type ("linear" or "quadratic")
376

377
    Returns:
378
        Tensor: Waveform of dimension (..., channel, time)
379
    """
380

381
def gain(waveform: torch.Tensor, gain_db: float = 1.0) -> torch.Tensor:
382
    """
383
    Apply amplification or attenuation to the whole waveform.
384

385
    Args:
386
        waveform: Audio waveform of dimension (..., time)
387
        gain_db: Gain in decibels
388

389
    Returns:
390
        Tensor: Amplified waveform
391
    """
392

393
def highpass_biquad(waveform: torch.Tensor, sample_rate: int, cutoff_freq: float, Q: float = 0.707) -> torch.Tensor:
394
    """
395
    Design biquad highpass filter and perform filtering. Similar to SoX implementation.
396

397
    Args:
398
        waveform: Audio waveform of dimension (..., time)
399
        sample_rate: Sampling rate of the waveform
400
        cutoff_freq: Cutoff frequency
401
        Q: Q factor (Default: 0.707)
402

403
    Returns:
404
        Tensor: Waveform dimension (..., time)
405
    """
406

407
def lfilter(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,
408
            clamp: bool = True, batching: bool = True) -> torch.Tensor:
409
    """
410
    Perform an IIR filter by evaluating difference equation.
411

412
    Args:
413
        waveform: Input waveform (..., time)
414
        a_coeffs: Denominator coefficients of the filter
415
        b_coeffs: Numerator coefficients of the filter
416
        clamp: Clamp intermediate values
417
        batching: Enable batching optimization
418

419
    Returns:
420
        Tensor: Filtered waveform
421
    """
422

423
def lowpass_biquad(waveform: torch.Tensor, sample_rate: int, cutoff_freq: float, Q: float = 0.707) -> torch.Tensor:
424
    """
425
    Design biquad lowpass filter and perform filtering. Similar to SoX implementation.
426

427
    Args:
428
        waveform: Audio waveform of dimension (..., time)
429
        sample_rate: Sampling rate of the waveform
430
        cutoff_freq: Cutoff frequency
431
        Q: Q factor (Default: 0.707)
432

433
    Returns:
434
        Tensor: Waveform of dimension (..., time)
435
    """
436

437
def overdrive(waveform: torch.Tensor, gain: float = 20, colour: float = 20) -> torch.Tensor:
438
    """
439
    Apply a overdrive effect to the audio. Similar to SoX implementation.
440

441
    Args:
442
        waveform: Audio waveform of dimension (..., time)
443
        gain: Gain amount
444
        colour: Colour amount
445

446
    Returns:
447
        Tensor: Waveform of dimension (..., time)
448
    """
449

450
def phaser(waveform: torch.Tensor, sample_rate: int, gain_in: float = 0.4,
451
           gain_out: float = 0.74, delay_ms: float = 3.0, decay: float = 0.4,
452
           mod_speed: float = 0.5, sinusoidal: bool = True) -> torch.Tensor:
453
    """
454
    Apply a phasing effect to the audio. Similar to SoX implementation.
455

456
    Args:
457
        waveform: Audio waveform of dimension (..., time)
458
        sample_rate: Sampling rate of the waveform
459
        gain_in: Input gain
460
        gain_out: Output gain
461
        delay_ms: Delay in milliseconds
462
        decay: Decay amount
463
        mod_speed: Modulation speed
464
        sinusoidal: Use sinusoidal modulation
465

466
    Returns:
467
        Tensor: Waveform of dimension (..., time)
468
    """
469

470
def riaa_biquad(waveform: torch.Tensor, sample_rate: int) -> torch.Tensor:
471
    """
472
    Apply RIAA vinyl playback equalization. Similar to SoX implementation.
473

474
    Args:
475
        waveform: Audio waveform of dimension (..., time)
476
        sample_rate: Sampling rate of the waveform
477

478
    Returns:
479
        Tensor: Waveform of dimension (..., time)
480
    """
481

482
def treble_biquad(waveform: torch.Tensor, sample_rate: int, gain: float,
483
                  central_freq: float = 3000, Q: float = 0.707) -> torch.Tensor:
484
    """
485
    Design a treble tone-control effect. Similar to SoX implementation.
486

487
    Args:
488
        waveform: Audio waveform of dimension (..., time)
489
        sample_rate: Sampling rate of the waveform
490
        gain: Gain in dB
491
        central_freq: Central frequency (in Hz, default: 3000)
492
        Q: Q factor (Default: 0.707)
493

494
    Returns:
495
        Tensor: Waveform of dimension (..., time)
496
    """
497

498
def vad(waveform: torch.Tensor, sample_rate: int, trigger_level: float = 7.0,
499
        trigger_time: float = 0.25, search_time: float = 1.0,
500
        allowed_gap: float = 0.25, pre_trigger_time: float = 0.0,
501
        boot_time: float = 0.35, noise_up_time: float = 0.1,
502
        noise_down_time: float = 0.01, noise_reduction_amount: float = 1.35,
503
        measure_freq: float = 20.0, measure_duration: Optional[float] = None,
504
        measure_smooth_time: float = 0.4, hp_filter_freq: float = 50.0,
505
        lp_filter_freq: float = 6000.0, hp_lifter_freq: float = 150.0,
506
        lp_lifter_freq: float = 2000.0) -> torch.Tensor:
507
    """
508
    Voice Activity Detector. Similar to SoX implementation.
509

510
    Args:
511
        waveform: Tensor of audio of dimension (..., time)
512
        sample_rate: Sample rate of audio
513
        trigger_level: Trigger level (default: 7.0)
514
        trigger_time: Trigger time (default: 0.25)
515
        search_time: Search time (default: 1.0)
516
        allowed_gap: Allowed gap (default: 0.25)
517
        pre_trigger_time: Pre-trigger time (default: 0.0)
518
        boot_time: Boot time (default: 0.35)
519
        noise_up_time: Noise up time (default: 0.1)
520
        noise_down_time: Noise down time (default: 0.01)
521
        noise_reduction_amount: Noise reduction amount (default: 1.35)
522
        measure_freq: Measure frequency (default: 20.0)
523
        measure_duration: Measure duration (optional)
524
        measure_smooth_time: Measure smooth time (default: 0.4)
525
        hp_filter_freq: High-pass filter frequency (default: 50.0)
526
        lp_filter_freq: Low-pass filter frequency (default: 6000.0)
527
        hp_lifter_freq: High-pass lifter frequency (default: 150.0)
528
        lp_lifter_freq: Low-pass lifter frequency (default: 2000.0)
529

530
    Returns:
531
        Tensor: Audio with silence trimmed
532
    """
533
534
```
535

536
### Beamforming and Array Processing
537

538
Advanced beamforming algorithms for multi-channel audio processing and spatial filtering.
539

540
```python { .api }
541
def apply_beamforming(multi_channel_audio: torch.Tensor, beamforming_weights: torch.Tensor) -> torch.Tensor:
542
    """
543
    Apply beamforming weights to multi-channel audio.
544

545
    Args:
546
        multi_channel_audio: Multi-channel audio tensor (..., channel, freq, time)
547
        beamforming_weights: Beamforming weights (..., channel, freq)
548

549
    Returns:
550
        Tensor: Beamformed audio (..., freq, time)
551
    """
552

553
def mvdr_weights_souden(psd_s: torch.Tensor, psd_n: torch.Tensor, reference_vector: torch.Tensor,
554
                        diagonal_loading: bool = True, diag_eps: float = 1e-7) -> torch.Tensor:
555
    """
556
    Compute MVDR (Minimum Variance Distortionless Response) beamforming weights using Souden's method.
557

558
    Args:
559
        psd_s: Power spectral density matrix of target speech (..., freq, channel, channel)
560
        psd_n: Power spectral density matrix of noise (..., freq, channel, channel)
561
        reference_vector: Reference microphone vector (..., channel)
562
        diagonal_loading: Whether to apply diagonal loading
563
        diag_eps: Diagonal loading factor
564

565
    Returns:
566
        Tensor: MVDR beamforming weights (..., freq, channel)
567
    """
568

569
def mvdr_weights_rtf(rtf_mat: torch.Tensor, psd_n: torch.Tensor, reference_vector: torch.Tensor,
570
                     diagonal_loading: bool = True, diag_eps: float = 1e-7) -> torch.Tensor:
571
    """
572
    Compute MVDR beamforming weights using Relative Transfer Function (RTF).
573

574
    Args:
575
        rtf_mat: Relative transfer function matrix (..., freq, channel)
576
        psd_n: Power spectral density matrix of noise (..., freq, channel, channel)
577
        reference_vector: Reference microphone vector (..., channel)
578
        diagonal_loading: Whether to apply diagonal loading
579
        diag_eps: Diagonal loading factor
580

581
    Returns:
582
        Tensor: MVDR beamforming weights (..., freq, channel)
583
    """
584

585
def rtf_evd(psd_s: torch.Tensor, psd_n: torch.Tensor) -> torch.Tensor:
586
    """
587
    Estimate relative transfer function (RTF) using eigenvalue decomposition.
588

589
    Args:
590
        psd_s: Power spectral density matrix of target speech (..., freq, channel, channel)
591
        psd_n: Power spectral density matrix of noise (..., freq, channel, channel)
592

593
    Returns:
594
        Tensor: RTF matrix (..., freq, channel)
595
    """
596

597
def rtf_power(psd_s: torch.Tensor, psd_n: torch.Tensor, reference_channel: int = 0) -> torch.Tensor:
598
    """
599
    Estimate relative transfer function (RTF) using power method.
600

601
    Args:
602
        psd_s: Power spectral density matrix of target speech (..., freq, channel, channel)
603
        psd_n: Power spectral density matrix of noise (..., freq, channel, channel)
604
        reference_channel: Reference channel index
605

606
    Returns:
607
        Tensor: RTF matrix (..., freq, channel)
608
    """
609

610
def psd(specgrams: torch.Tensor, mask: Optional[torch.Tensor] = None,
611
        normalize: bool = True, eps: float = 1e-15) -> torch.Tensor:
612
    """
613
    Compute power spectral density (PSD) matrix.
614

615
    Args:
616
        specgrams: Multi-channel spectrograms (..., channel, freq, time)
617
        mask: Optional mask for PSD estimation (..., freq, time)
618
        normalize: Whether to normalize by time frames
619
        eps: Small value for numerical stability
620

621
    Returns:
622
        Tensor: PSD matrix (..., freq, channel, channel)
623
    """
624
```
625

626
### Pitch and Speed Manipulation
627

628
Functions for pitch shifting and time-scale modification.
629

630
```python { .api }
631
def pitch_shift(waveform: torch.Tensor, sample_rate: int, n_steps: float,
632
                bins_per_octave: int = 12, n_fft: int = 512,
633
                win_length: Optional[int] = None, hop_length: Optional[int] = None,
634
                window: Optional[torch.Tensor] = None) -> torch.Tensor:
635
    """
636
    Shift the pitch of waveform by n_steps steps.
637

638
    Args:
639
        waveform: Input waveform (..., time)
640
        sample_rate: Sample rate of waveform
641
        n_steps: Number of pitch steps to shift
642
        bins_per_octave: Number of steps per octave
643
        n_fft: Size of FFT
644
        win_length: Window size
645
        hop_length: Length of hop between STFT windows
646
        window: Window function
647

648
    Returns:
649
        Tensor: Pitch-shifted waveform
650
    """
651

652
def speed(waveform: torch.Tensor, orig_freq: int, factor: float,
653
          lengths: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
654
    """
655
    Adjust waveform speed by a given factor.
656

657
    Args:
658
        waveform: Input waveform (..., time)
659
        orig_freq: Original sample rate
660
        factor: Speed factor (>1.0 makes faster, <1.0 makes slower)
661
        lengths: Original lengths of waveforms
662

663
    Returns:
664
        Tuple: (speed-adjusted waveform, adjusted lengths)
665
    """
666

667
def detect_pitch_frequency(waveform: torch.Tensor, sample_rate: int,
668
                          frame_time: float = 10 ** (-2), win_length: int = 30,
669
                          freq_low: int = 85, freq_high: int = 3400) -> torch.Tensor:
670
    """
671
    Detect pitch frequency using autocorrelation method.
672

673
    Args:
674
        waveform: Input waveform (..., time)
675
        sample_rate: Sample rate of the waveform
676
        frame_time: Duration of a frame in seconds
677
        win_length: Length of the window in frames
678
        freq_low: Lowest detectable frequency
679
        freq_high: Highest detectable frequency
680

681
    Returns:
682
        Tensor: Detected pitch frequencies (..., frame)
683
    """
684
```
685

686
### Codec and Format Processing
687

688
Functions for codec simulation and audio format processing.
689

690
```python { .api }
691
def apply_codec(waveform: torch.Tensor, sample_rate: int, format: str,
692
                encoder: Optional[str] = None, encoder_config: Optional[dict] = None,
693
                decoder: Optional[str] = None, decoder_config: Optional[dict] = None) -> torch.Tensor:
694
    """
695
    Apply codec compression and decompression to waveform.
696

697
    Args:
698
        waveform: Input waveform (..., time)
699
        sample_rate: Sample rate
700
        format: Audio format ("wav", "mp3", "ogg", etc.)
701
        encoder: Encoder name
702
        encoder_config: Encoder configuration
703
        decoder: Decoder name  
704
        decoder_config: Decoder configuration
705

706
    Returns:
707
        Tensor: Codec-processed waveform
708
    """
709

710
def mu_law_encoding(x: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:
711
    """
712
    Encode signal based on mu-law companding.
713

714
    Args:
715
        x: Input tensor (..., time)
716
        quantization_channels: Number of quantization channels
717

718
    Returns:
719
        Tensor: Mu-law encoded tensor
720
    """
721

722
def mu_law_decoding(x_mu: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:
723
    """
724
    Decode mu-law encoded signal.
725

726
    Args:
727
        x_mu: Mu-law encoded input (..., time)
728
        quantization_channels: Number of quantization channels
729

730
    Returns:
731
        Tensor: Decoded tensor
732
    """
733
```
734

735
### Advanced Signal Processing
736

737
Additional signal processing utilities and analysis functions.
738

739
```python { .api }
740
def preemphasis(waveform: torch.Tensor, coeff: float = 0.97) -> torch.Tensor:
741
    """
742
    Apply pre-emphasis filter to waveform.
743

744
    Args:
745
        waveform: Input waveform (..., time)
746
        coeff: Pre-emphasis coefficient
747

748
    Returns:
749
        Tensor: Pre-emphasized waveform
750
    """
751

752
def deemphasis(waveform: torch.Tensor, coeff: float = 0.97) -> torch.Tensor:
753
    """
754
    Apply de-emphasis filter to waveform.
755

756
    Args:
757
        waveform: Input waveform (..., time)
758
        coeff: De-emphasis coefficient
759

760
    Returns:
761
        Tensor: De-emphasized waveform
762
    """
763

764
def phase_vocoder(complex_specgrams: torch.Tensor, rate: float, phase_advance: torch.Tensor) -> torch.Tensor:
765
    """
766
    Given a STFT tensor, speed up in time without modifying pitch by applying phase vocoder.
767

768
    Args:
769
        complex_specgrams: Complex-valued spectrogram (..., freq, time)
770
        rate: Speed-up factor
771
        phase_advance: Expected phase advance in each bin
772

773
    Returns:
774
        Tensor: Time-stretched complex spectrogram
775
    """
776

777
def mask_along_axis(specgrams: torch.Tensor, mask_param: int, mask_value: float,
778
                    axis: int) -> torch.Tensor:
779
    """
780
    Apply masking along the given axis.
781

782
    Args:
783
        specgrams: Tensor spectrogram (..., freq, time)
784
        mask_param: Number of columns to be masked
785
        mask_value: Value to assign to masked columns
786
        axis: Axis to apply masking on (1 for freq, 2 for time)
787

788
    Returns:
789
        Tensor: Masked spectrogram
790
    """
791

792
def mask_along_axis_iid(specgrams: torch.Tensor, mask_param: int, mask_value: float,
793
                        axis: int) -> torch.Tensor:
794
    """
795
    Apply masking along the given axis with independent masks for each example.
796

797
    Args:
798
        specgrams: Tensor spectrogram (..., freq, time)
799
        mask_param: Number of columns to be masked
800
        mask_value: Value to assign to masked columns
801
        axis: Axis to apply masking on (1 for freq, 2 for time)
802

803
    Returns:
804
        Tensor: Masked spectrogram
805
    """
806

807
def compute_deltas(specgram: torch.Tensor, win_length: int = 5, mode: str = "replicate") -> torch.Tensor:
808
    """
809
    Compute delta coefficients of a tensor.
810

811
    Args:
812
        specgram: Input tensor (..., freq, time)
813
        win_length: The window length used for computing delta
814
        mode: Mode for padding ("replicate", "constant", etc.)
815

816
    Returns:
817
        Tensor: Delta coefficients
818
    """
819

820
def create_dct(n_mfcc: int, n_mels: int, norm: Optional[str] = None) -> torch.Tensor:
821
    """
822
    Create DCT transformation matrix.
823

824
    Args:
825
        n_mfcc: Number of MFCC coefficients
826
        n_mels: Number of mel filter banks
827
        norm: Normalization mode ("ortho" or None)
828

829
    Returns:
830
        Tensor: DCT transformation matrix (n_mfcc, n_mels)
831
    """
832

833
def sliding_window_cmn(specgram: torch.Tensor, cmn_window: int = 600,
834
                       min_cmn_window: int = 100, center: bool = False,
835
                       norm_vars: bool = False) -> torch.Tensor:
836
    """
837
    Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.
838

839
    Args:
840
        specgram: Input tensor (..., freq, time)
841
        cmn_window: Window length for normalization
842
        min_cmn_window: Minimum window length
843
        center: Whether to center the window
844
        norm_vars: Whether to normalize variance
845

846
    Returns:
847
        Tensor: Normalized tensor
848
    """
849

850
def spectral_centroid(waveform: torch.Tensor, sample_rate: int, pad: int = 0,
851
                      window: Optional[torch.Tensor] = None, n_fft: int = 400,
852
                      hop_length: Optional[int] = None, win_length: Optional[int] = None) -> torch.Tensor:
853
    """
854
    Compute the spectral centroid for each frame.
855

856
    Args:
857
        waveform: Input tensor (..., time)
858
        sample_rate: Sample rate of waveform
859
        pad: Two sided padding of signal
860
        window: Window tensor
861
        n_fft: Size of FFT
862
        hop_length: Length of hop between STFT windows
863
        win_length: Window size
864

865
    Returns:
866
        Tensor: Spectral centroid (..., time)
867
    """
868

869
def add_noise(waveform: torch.Tensor, noise: torch.Tensor, snr: torch.Tensor,
870
              lengths: Optional[torch.Tensor] = None) -> torch.Tensor:
871
    """
872
    Add noise to waveform with given Signal-to-Noise Ratio (SNR).
873

874
    Args:
875
        waveform: Input waveform (..., time)
876
        noise: Noise tensor (..., time)
877
        snr: Signal-to-noise ratio in dB
878
        lengths: Lengths of waveforms
879

880
    Returns:
881
        Tensor: Noisy waveform
882
    """
883

884
def convolve(waveform: torch.Tensor, kernel: torch.Tensor, mode: str = "full") -> torch.Tensor:
885
    """
886
    Convolve waveform with kernel using PyTorch operations.
887

888
    Args:
889
        waveform: Input waveform (..., time)
890
        kernel: Convolution kernel (..., time)
891
        mode: Convolution mode ("full", "valid", "same")
892

893
    Returns:
894
        Tensor: Convolved waveform
895
    """
896

897
def fftconvolve(waveform: torch.Tensor, kernel: torch.Tensor, mode: str = "full") -> torch.Tensor:
898
    """
899
    Convolve waveform with kernel using FFT.
900

901
    Args:
902
        waveform: Input waveform (..., time)
903
        kernel: Convolution kernel (..., time)  
904
        mode: Convolution mode ("full", "valid", "same")
905

906
    Returns:
907
        Tensor: Convolved waveform
908
    """
909

910
def loudness(specgram: torch.Tensor, sample_rate: int) -> torch.Tensor:
911
    """
912
    Compute loudness according to ITU-R BS.1770-4.
913

914
    Args:
915
        specgram: Input spectrogram (..., freq, time)
916
        sample_rate: Sample rate
917

918
    Returns:
919
        Tensor: Loudness values
920
    """
921

922
def edit_distance(seq1: List[int], seq2: List[int]) -> int:
923
    """
924
    Calculate edit distance between two sequences.
925

926
    Args:
927
        seq1: First sequence
928
        seq2: Second sequence
929

930
    Returns:
931
        int: Edit distance
932
    """
933

934
def rnnt_loss(logits: torch.Tensor, targets: torch.Tensor, logit_lengths: torch.Tensor,
935
              target_lengths: torch.Tensor, blank: int = -1, clamp: float = -1) -> torch.Tensor:
936
    """
937
    Compute RNN-Transducer loss.
938

939
    Args:
940
        logits: Predicted logits (..., time, target_length, n_class)
941
        targets: Target sequences (..., target_length)
942
        logit_lengths: Length of logits for each sample
943
        target_lengths: Length of targets for each sample
944
        blank: Blank label index
945
        clamp: Clamp gradients
946

947
    Returns:
948
        Tensor: RNN-T loss
949
    """
950

951
def frechet_distance(mu_x: torch.Tensor, sigma_x: torch.Tensor,
952
                     mu_y: torch.Tensor, sigma_y: torch.Tensor) -> torch.Tensor:
953
    """
954
    Compute Fréchet distance between two multivariate Gaussians.
955

956
    Args:
957
        mu_x: Mean of first distribution
958
        sigma_x: Covariance of first distribution
959
        mu_y: Mean of second distribution  
960
        sigma_y: Covariance of second distribution
961

962
    Returns:
963
        Tensor: Fréchet distance
964
    """
965
966
```
967

968
```python { .api }
969
def lfilter(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,
970
            zi: Optional[torch.Tensor] = None) -> torch.Tensor:
971
    """
972
    Apply IIR filter using difference equation.
973

974
    Args:
975
        waveform: Input signal (..., time)
976
        a_coeffs: Denominator coefficients (autoregressive)
977
        b_coeffs: Numerator coefficients (moving average)
978
        zi: Initial conditions for filter delays
979

980
    Returns:
981
        Tensor: Filtered signal
982
    """
983

984
def filtfilt(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,
985
             clamp: bool = True) -> torch.Tensor:
986
    """
987
    Apply zero-phase filtering using forward-backward filter.
988

989
    Args:
990
        waveform: Input signal (..., time)
991
        a_coeffs: Denominator coefficients
992
        b_coeffs: Numerator coefficients
993
        clamp: Whether to clamp output to prevent numerical issues
994

995
    Returns:
996
        Tensor: Zero-phase filtered signal
997
    """
998
```
999

1000
### Pitch and Time Manipulation
1001

1002
Functions for manipulating pitch and temporal characteristics of audio.
1003

1004
```python { .api }
1005
def pitch_shift(waveform: torch.Tensor, sample_rate: int, n_steps: float,
1006
                bins_per_octave: int = 12, n_fft: int = 512,
1007
                win_length: Optional[int] = None, hop_length: Optional[int] = None,
1008
                window: Optional[torch.Tensor] = None) -> torch.Tensor:
1009
    """
1010
    Shift pitch of waveform by n_steps semitones.
1011

1012
    Args:
1013
        waveform: Input audio (..., time)
1014
        sample_rate: Sample rate
1015
        n_steps: Number of semitones to shift (positive = higher, negative = lower)
1016
        bins_per_octave: Number of steps per octave
1017
        n_fft: FFT size for STFT
1018
        win_length: Window length
1019
        hop_length: Hop length
1020
        window: Window function
1021

1022
    Returns:
1023
        Tensor: Pitch-shifted audio
1024
    """
1025

1026
def speed(waveform: torch.Tensor, orig_freq: int, factor: float,
1027
          lengths: Optional[torch.Tensor] = None) -> torch.Tensor:
1028
    """
1029
    Adjust playback speed by resampling.
1030

1031
    Args:
1032
        waveform: Input audio (..., time)
1033
        orig_freq: Original sample rate
1034
        factor: Speed factor (>1.0 = faster, <1.0 = slower)
1035
        lengths: Length of each sequence in batch
1036

1037
    Returns:
1038
        Tensor: Speed-adjusted audio
1039
    """
1040

1041
def phase_vocoder(complex_specgrams: torch.Tensor, rate: float,
1042
                  phase_advance: torch.Tensor) -> torch.Tensor:
1043
    """
1044
    Apply phase vocoder for time stretching/compression.
1045

1046
    Args:
1047
        complex_specgrams: Complex STFT (..., freq, time)
1048
        rate: Rate factor (>1.0 = faster, <1.0 = slower)
1049
        phase_advance: Expected phase advance per hop
1050

1051
    Returns:
1052
        Tensor: Time-stretched complex spectrogram
1053
    """
1054
```
1055

1056
### Audio Analysis
1057

1058
Functions for analyzing audio characteristics and extracting features.
1059

1060
```python { .api }
1061
def spectral_centroid(waveform: torch.Tensor, sample_rate: int, pad: int = 0,
1062
                      window: Optional[torch.Tensor] = None, n_fft: int = 400,
1063
                      hop_length: Optional[int] = None, win_length: Optional[int] = None) -> torch.Tensor:
1064
    """
1065
    Compute spectral centroid (center of mass of spectrum).
1066

1067
    Args:
1068
        waveform: Input audio (..., time)
1069
        sample_rate: Sample rate
1070
        (other parameters same as spectrogram)
1071

1072
    Returns:
1073
        Tensor: Spectral centroid over time (..., time)
1074
    """
1075

1076
def detect_pitch_frequency(waveform: torch.Tensor, sample_rate: int, frame_time: float = 10**(-2),
1077
                           win_length: int = 30, freq_low: int = 85, freq_high: int = 3400) -> torch.Tensor:
1078
    """
1079
    Detect pitch frequency using autocorrelation method.
1080

1081
    Args:
1082
        waveform: Input audio (..., time)
1083
        sample_rate: Sample rate
1084
        frame_time: Length of frame in seconds
1085
        win_length: Length of window for median filtering
1086
        freq_low: Lowest frequency that can be detected
1087
        freq_high: Highest frequency that can be detected
1088

1089
    Returns:
1090
        Tensor: Detected pitch frequency over time
1091
    """
1092

1093
def loudness(waveform: torch.Tensor, sample_rate: int) -> torch.Tensor:
1094
    """
1095
    Compute loudness using ITU-R BS.1770-4 standard.
1096

1097
    Args:
1098
        waveform: Input audio (..., time)
1099
        sample_rate: Sample rate
1100

1101
    Returns:
1102
        Tensor: Loudness in LUFS (Loudness Units Full Scale)
1103
    """
1104
```
1105

1106
### Convolution Operations
1107

1108
Convolution-based processing for impulse response application and acoustic modeling.
1109

1110
```python { .api }
1111
def convolve(x: torch.Tensor, y: torch.Tensor, mode: str = "full") -> torch.Tensor:
1112
    """
1113
    Convolve two 1D tensors.
1114

1115
    Args:
1116
        x: First input tensor (..., time)
1117
        y: Second input tensor (..., time) 
1118
        mode: Convolution mode ("full", "valid", "same")
1119

1120
    Returns:
1121
        Tensor: Convolved signal
1122
    """
1123

1124
def fftconvolve(x: torch.Tensor, y: torch.Tensor, mode: str = "full") -> torch.Tensor:
1125
    """
1126
    Convolve using FFT for efficiency with long signals.
1127

1128
    Args:
1129
        x: First input tensor (..., time)
1130
        y: Second input tensor (..., time)
1131
        mode: Convolution mode ("full", "valid", "same")
1132

1133
    Returns:
1134
        Tensor: Convolved signal
1135
    """
1136
```
1137

1138
### Mu-Law Encoding/Decoding
1139

1140
Logarithmic quantization commonly used in telecommunications.
1141

1142
```python { .api }
1143
def mu_law_encoding(x: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:
1144
    """
1145
    Encode waveform using mu-law companding.
1146

1147
    Args:
1148
        x: Input waveform (..., time)
1149
        quantization_channels: Number of quantization levels
1150

1151
    Returns:
1152
        Tensor: Mu-law encoded signal (integer values)
1153
    """
1154

1155
def mu_law_decoding(x_mu: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:
1156
    """
1157
    Decode mu-law encoded waveform.
1158

1159
    Args:
1160
        x_mu: Mu-law encoded signal (..., time)
1161
        quantization_channels: Number of quantization levels
1162

1163
    Returns:
1164
        Tensor: Decoded waveform
1165
    """
1166
```
1167

1168
### Feature Processing
1169

1170
Functions for processing extracted audio features.
1171

1172
```python { .api }
1173
def compute_deltas(specgram: torch.Tensor, win_length: int = 5) -> torch.Tensor:
1174
    """
1175
    Compute delta features (first derivatives) of spectrogram.
1176

1177
    Args:
1178
        specgram: Input spectrogram (..., freq, time)
1179
        win_length: Window length for delta computation
1180

1181
    Returns:
1182
        Tensor: Delta features with same shape as input
1183
    """
1184

1185
def create_dct(n_mfcc: int, n_mels: int, norm: Optional[str] = None) -> torch.Tensor:
1186
    """
1187
    Create Discrete Cosine Transform matrix for MFCC computation.
1188

1189
    Args:
1190
        n_mfcc: Number of MFCC coefficients
1191
        n_mels: Number of mel filter banks
1192
        norm: Normalization method ("ortho" or None)
1193

1194
    Returns:
1195
        Tensor: DCT matrix (n_mfcc, n_mels)
1196
    """
1197

1198
def sliding_window_cmn(specgram: torch.Tensor, cmn_window: int = 600, min_cmn_window: int = 100,
1199
                       center: bool = False, norm_vars: bool = False) -> torch.Tensor:
1200
    """
1201
    Apply sliding window cepstral mean normalization.
1202

1203
    Args:
1204
        specgram: Input spectrogram (..., freq, time)
1205
        cmn_window: Window size for normalization
1206
        min_cmn_window: Minimum window size
1207
        center: Whether to center the window
1208
        norm_vars: Whether to normalize variance
1209

1210
    Returns:
1211
        Tensor: Normalized spectrogram
1212
    """
1213
```
1214

1215
This covers the extensive functional API of TorchAudio, providing stateless functions for all major audio processing operations from basic spectral analysis to advanced effects and feature extraction.

Version

Tile

Files

functional.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

functional.mddocs/