or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

audio-io.mddatasets.mdeffects.mdfunctional.mdindex.mdmodels.mdpipelines.mdstreaming.mdtransforms.mdutils.md

functional.mddocs/

0

# Signal Processing Functions

1

2

Extensive collection of stateless audio processing functions for spectral analysis, filtering, resampling, pitch manipulation, and advanced signal processing algorithms. These functions operate directly on tensors and are compatible with PyTorch's autograd system for gradient-based optimization.

3

4

## Capabilities

5

6

### Spectral Analysis

7

8

Core spectral analysis functions for converting between time and frequency domains.

9

10

```python { .api }

11

def spectrogram(waveform: torch.Tensor, pad: int, window: torch.Tensor,

12

n_fft: int, hop_length: int, win_length: int,

13

power: Optional[float], normalized: Union[bool, str],

14

center: bool = True, pad_mode: str = "reflect",

15

onesided: bool = True, return_complex: Optional[bool] = None) -> torch.Tensor:

16

"""

17

Create spectrogram from waveform.

18

19

Args:

20

waveform: Tensor of audio of dimension (..., time)

21

pad: Two sided padding of signal

22

window: Window tensor that is applied/multiplied to each frame/window

23

n_fft: Size of FFT

24

hop_length: Length of hop between STFT windows

25

win_length: Window size

26

power: Exponent for the magnitude spectrogram (must be > 0) e.g., 1 for magnitude, 2 for power, etc. If None, then the complex spectrum is returned instead.

27

normalized: Whether to normalize by magnitude after stft. If input is str, choices are "window" and "frame_length", if specific normalization type is desirable. True maps to "window".

28

center: whether to pad waveform on both sides so that the t-th frame is centered at time t × hop_length

29

pad_mode: controls the padding method used when center is True

30

onesided: controls whether to return half of results to avoid redundancy

31

return_complex: Deprecated, use power=None instead

32

33

Returns:

34

Tensor: Spectrogram with shape (..., freq, time)

35

"""

36

37

def inverse_spectrogram(spectrogram: torch.Tensor, length: Optional[int] = None,

38

pad: int = 0, window: Optional[torch.Tensor] = None,

39

n_fft: int = 400, hop_length: Optional[int] = None,

40

win_length: Optional[int] = None, normalized: bool = False,

41

center: bool = True, pad_mode: str = "reflect",

42

onesided: bool = True) -> torch.Tensor:

43

"""

44

Reconstruct waveform from spectrogram using inverse STFT.

45

46

Args:

47

spectrogram: Input spectrogram (..., freq, time)

48

length: Expected length of output

49

(other parameters same as spectrogram)

50

51

Returns:

52

Tensor: Reconstructed waveform (..., time)

53

"""

54

55

def griffinlim(spectrogram: torch.Tensor, window: Optional[torch.Tensor] = None,

56

n_fft: int = 400, hop_length: Optional[int] = None,

57

win_length: Optional[int] = None, power: float = 2.0,

58

n_iter: int = 32, momentum: float = 0.99,

59

length: Optional[int] = None, rand_init: bool = True) -> torch.Tensor:

60

"""

61

Reconstruct waveform from magnitude spectrogram using Griffin-Lim algorithm.

62

63

Args:

64

spectrogram: Magnitude spectrogram (..., freq, time)

65

window: Window function

66

n_fft: Size of FFT

67

hop_length: Length of hop between STFT windows

68

win_length: Window size

69

power: Exponent applied to spectrogram

70

n_iter: Number of Griffin-Lim iterations

71

momentum: Momentum parameter for fast Griffin-Lim

72

length: Expected output length

73

rand_init: Whether to initialize with random phase

74

75

Returns:

76

Tensor: Reconstructed waveform (..., time)

77

"""

78

```

79

80

### Mel-Scale Processing

81

82

Functions for mel-scale analysis commonly used in speech and music processing.

83

84

```python { .api }

85

def melscale_fbanks(n_freqs: int, f_min: float, f_max: float, n_mels: int,

86

sample_rate: int, norm: Optional[str] = None,

87

mel_scale: str = "htk") -> torch.Tensor:

88

"""

89

Create mel-scale filter banks.

90

91

Args:

92

n_freqs: Number of frequency bins (typically n_fft // 2 + 1)

93

f_min: Minimum frequency

94

f_max: Maximum frequency

95

n_mels: Number of mel filter banks

96

sample_rate: Sample rate of audio

97

norm: Normalization method ("slaney" or None)

98

mel_scale: Scale to use ("htk" or "slaney")

99

100

Returns:

101

Tensor: Mel filter bank matrix (n_mels, n_freqs)

102

"""

103

104

def linear_fbanks(n_freqs: int, f_min: float, f_max: float, n_filter: int,

105

sample_rate: int) -> torch.Tensor:

106

"""

107

Create linear-spaced filter banks.

108

109

Args:

110

n_freqs: Number of frequency bins

111

f_min: Minimum frequency

112

f_max: Maximum frequency

113

n_filter: Number of linear filter banks

114

sample_rate: Sample rate of audio

115

116

Returns:

117

Tensor: Linear filter bank matrix (n_filter, n_freqs)

118

"""

119

```

120

121

### Amplitude and Decibel Conversion

122

123

Functions for converting between linear amplitude and logarithmic decibel scales.

124

125

```python { .api }

126

def amplitude_to_DB(x: torch.Tensor, multiplier: float = 10.0, amin: float = 1e-10,

127

db_multiplier: float = 0.0, top_db: Optional[float] = None) -> torch.Tensor:

128

"""

129

Convert amplitude spectrogram to decibel scale.

130

131

Args:

132

x: Input tensor (amplitude or power spectrogram)

133

multiplier: Multiplier for log10 (10.0 for power, 20.0 for amplitude)

134

amin: Minimum value to clamp x

135

db_multiplier: Additional multiplier for result

136

top_db: Minimum negative cut-off in decibels

137

138

Returns:

139

Tensor: Spectrogram in decibel scale

140

"""

141

142

def DB_to_amplitude(x: torch.Tensor, ref: float = 1.0, power: float = 1.0) -> torch.Tensor:

143

"""

144

Convert decibel scale back to amplitude.

145

146

Args:

147

x: Input tensor in decibel scale

148

ref: Reference value

149

power: Power exponent (1.0 for amplitude, 2.0 for power)

150

151

Returns:

152

Tensor: Amplitude spectrogram

153

"""

154

```

155

156

### Resampling

157

158

Audio resampling for sample rate conversion.

159

160

```python { .api }

161

def resample(waveform: torch.Tensor, orig_freq: int, new_freq: int,

162

resampling_method: str = "sinc_interp_kaiser",

163

lowpass_filter_width: int = 6, rolloff: float = 0.99,

164

beta: Optional[float] = None) -> torch.Tensor:

165

"""

166

Resample waveform to different sample rate.

167

168

Args:

169

waveform: Input waveform tensor (..., time)

170

orig_freq: Original sample rate

171

new_freq: Target sample rate

172

resampling_method: Resampling algorithm ("sinc_interp_kaiser" or "sinc_interp_hann")

173

lowpass_filter_width: Width of lowpass filter

174

rolloff: Roll-off frequency of lowpass filter

175

beta: Shape parameter for Kaiser window

176

177

Returns:

178

Tensor: Resampled waveform

179

"""

180

```

181

182

### Audio Effects and Filtering

183

184

Comprehensive collection of audio filters and effects.

185

186

```python { .api }

187

def biquad(waveform: torch.Tensor, b0: float, b1: float, b2: float,

188

a0: float, a1: float, a2: float) -> torch.Tensor:

189

"""

190

Apply biquad IIR filter.

191

192

Args:

193

waveform: Input audio (..., time)

194

b0, b1, b2: Numerator coefficients

195

a0, a1, a2: Denominator coefficients

196

197

Returns:

198

Tensor: Filtered audio

199

"""

200

201

def allpass_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float, Q: float = 0.707) -> torch.Tensor:

202

"""

203

Design two-pole all-pass filter. Similar to SoX implementation.

204

205

Args:

206

waveform: Audio waveform of dimension (..., time)

207

sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)

208

central_freq: Central frequency (in Hz)

209

Q: Q factor (Default: 0.707)

210

211

Returns:

212

Tensor: Waveform of dimension (..., time)

213

"""

214

215

def band_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float,

216

Q: float = 0.707, noise: bool = False) -> torch.Tensor:

217

"""

218

Design two-pole band filter. Similar to SoX implementation.

219

220

Args:

221

waveform: Audio waveform of dimension (..., time)

222

sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)

223

central_freq: Central frequency (in Hz)

224

Q: Q factor (Default: 0.707)

225

noise: Add noise to the filter

226

227

Returns:

228

Tensor: Waveform of dimension (..., time)

229

"""

230

231

def bandpass_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float,

232

Q: float = 0.707, const_skirt_gain: bool = False) -> torch.Tensor:

233

"""

234

Design two-pole band-pass filter. Similar to SoX implementation.

235

236

Args:

237

waveform: Audio waveform of dimension (..., time)

238

sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)

239

central_freq: Central frequency (in Hz)

240

Q: Q factor (Default: 0.707)

241

const_skirt_gain: Constant skirt gain

242

243

Returns:

244

Tensor: Waveform of dimension (..., time)

245

"""

246

247

def bandreject_biquad(waveform: torch.Tensor, sample_rate: int, central_freq: float, Q: float = 0.707) -> torch.Tensor:

248

"""

249

Design two-pole band-reject filter. Similar to SoX implementation.

250

251

Args:

252

waveform: Audio waveform of dimension (..., time)

253

sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)

254

central_freq: Central frequency (in Hz)

255

Q: Q factor (Default: 0.707)

256

257

Returns:

258

Tensor: Waveform of dimension (..., time)

259

"""

260

261

def bass_biquad(waveform: torch.Tensor, sample_rate: int, gain: float,

262

central_freq: float = 100, Q: float = 0.707) -> torch.Tensor:

263

"""

264

Design a bass tone-control effect. Similar to SoX implementation.

265

266

Args:

267

waveform: Audio waveform of dimension (..., time)

268

sample_rate: Sampling rate of the waveform, e.g. 44100 (Hz)

269

gain: Gain in dB

270

central_freq: Central frequency (in Hz, default: 100)

271

Q: Q factor (Default: 0.707)

272

273

Returns:

274

Tensor: Waveform of dimension (..., time)

275

"""

276

277

def contrast(waveform: torch.Tensor, enhancement_amount: float = 75.0) -> torch.Tensor:

278

"""

279

Apply contrast effect. Similar to SoX implementation.

280

281

Args:

282

waveform: Audio waveform of dimension (..., time)

283

enhancement_amount: Enhancement amount (default: 75.0)

284

285

Returns:

286

Tensor: Waveform of dimension (..., time)

287

"""

288

289

def dcshift(waveform: torch.Tensor, shift: float, limiter_gain: Optional[float] = None) -> torch.Tensor:

290

"""

291

Apply a DC shift to the audio. Similar to SoX implementation.

292

293

Args:

294

waveform: Audio waveform of dimension (..., time)

295

shift: DC shift amount

296

limiter_gain: Optional limiter gain

297

298

Returns:

299

Tensor: Waveform of dimension (..., time)

300

"""

301

302

def deemph_biquad(waveform: torch.Tensor, sample_rate: int) -> torch.Tensor:

303

"""

304

Apply ISO 908 CD de-emphasis (shelving) IIR filter. Similar to SoX implementation.

305

306

Args:

307

waveform: Audio waveform of dimension (..., time)

308

sample_rate: Sampling rate of the waveform

309

310

Returns:

311

Tensor: Waveform of dimension (..., time)

312

"""

313

314

def dither(waveform: torch.Tensor, density_function: str = "TPDF", noise_shaping: bool = False) -> torch.Tensor:

315

"""

316

Apply dither. Dither increases the perceived dynamic range of audio stored at a particular bit-depth.

317

318

Args:

319

waveform: Audio waveform of dimension (..., time)

320

density_function: Density function ("TPDF", "RPDF", "GPDF")

321

noise_shaping: Apply noise shaping

322

323

Returns:

324

Tensor: Dithered waveform

325

"""

326

327

def equalizer_biquad(waveform: torch.Tensor, sample_rate: int, center_freq: float,

328

gain: float, Q: float = 0.707) -> torch.Tensor:

329

"""

330

Design biquad peaking equalizer filter and perform filtering. Similar to SoX implementation.

331

332

Args:

333

waveform: Audio waveform of dimension (..., time)

334

sample_rate: Sampling rate of the waveform

335

center_freq: Center frequency (in Hz)

336

gain: Gain in dB

337

Q: Q factor (Default: 0.707)

338

339

Returns:

340

Tensor: Waveform of dimension (..., time)

341

"""

342

343

def filtfilt(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,

344

clamp: bool = True) -> torch.Tensor:

345

"""

346

Apply an IIR filter forward and backward to a waveform. Inspired by scipy.signal.filtfilt.

347

348

Args:

349

waveform: Input waveform (..., time)

350

a_coeffs: Denominator coefficients of the filter

351

b_coeffs: Numerator coefficients of the filter

352

clamp: Clamp intermediate values

353

354

Returns:

355

Tensor: Zero-phase filtered waveform

356

"""

357

358

def flanger(waveform: torch.Tensor, sample_rate: int, delay: float = 0.0,

359

depth: float = 2.0, regen: float = 0.0, width: float = 71.0,

360

speed: float = 0.5, phase: float = 25.0, modulation: str = "sinusoidal",

361

interpolation: str = "linear") -> torch.Tensor:

362

"""

363

Apply a flanger effect to the audio. Similar to SoX implementation.

364

365

Args:

366

waveform: Audio waveform of dimension (..., channel, time)

367

sample_rate: Sampling rate of the waveform

368

delay: Base delay in milliseconds

369

depth: Delay depth in milliseconds

370

regen: Regeneration (feedback) in percent

371

width: Delay line width in percent

372

speed: Modulation speed in Hz

373

phase: Phase in percent

374

modulation: Modulation type ("sinusoidal" or "triangular")

375

interpolation: Interpolation type ("linear" or "quadratic")

376

377

Returns:

378

Tensor: Waveform of dimension (..., channel, time)

379

"""

380

381

def gain(waveform: torch.Tensor, gain_db: float = 1.0) -> torch.Tensor:

382

"""

383

Apply amplification or attenuation to the whole waveform.

384

385

Args:

386

waveform: Audio waveform of dimension (..., time)

387

gain_db: Gain in decibels

388

389

Returns:

390

Tensor: Amplified waveform

391

"""

392

393

def highpass_biquad(waveform: torch.Tensor, sample_rate: int, cutoff_freq: float, Q: float = 0.707) -> torch.Tensor:

394

"""

395

Design biquad highpass filter and perform filtering. Similar to SoX implementation.

396

397

Args:

398

waveform: Audio waveform of dimension (..., time)

399

sample_rate: Sampling rate of the waveform

400

cutoff_freq: Cutoff frequency

401

Q: Q factor (Default: 0.707)

402

403

Returns:

404

Tensor: Waveform dimension (..., time)

405

"""

406

407

def lfilter(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,

408

clamp: bool = True, batching: bool = True) -> torch.Tensor:

409

"""

410

Perform an IIR filter by evaluating difference equation.

411

412

Args:

413

waveform: Input waveform (..., time)

414

a_coeffs: Denominator coefficients of the filter

415

b_coeffs: Numerator coefficients of the filter

416

clamp: Clamp intermediate values

417

batching: Enable batching optimization

418

419

Returns:

420

Tensor: Filtered waveform

421

"""

422

423

def lowpass_biquad(waveform: torch.Tensor, sample_rate: int, cutoff_freq: float, Q: float = 0.707) -> torch.Tensor:

424

"""

425

Design biquad lowpass filter and perform filtering. Similar to SoX implementation.

426

427

Args:

428

waveform: Audio waveform of dimension (..., time)

429

sample_rate: Sampling rate of the waveform

430

cutoff_freq: Cutoff frequency

431

Q: Q factor (Default: 0.707)

432

433

Returns:

434

Tensor: Waveform of dimension (..., time)

435

"""

436

437

def overdrive(waveform: torch.Tensor, gain: float = 20, colour: float = 20) -> torch.Tensor:

438

"""

439

Apply a overdrive effect to the audio. Similar to SoX implementation.

440

441

Args:

442

waveform: Audio waveform of dimension (..., time)

443

gain: Gain amount

444

colour: Colour amount

445

446

Returns:

447

Tensor: Waveform of dimension (..., time)

448

"""

449

450

def phaser(waveform: torch.Tensor, sample_rate: int, gain_in: float = 0.4,

451

gain_out: float = 0.74, delay_ms: float = 3.0, decay: float = 0.4,

452

mod_speed: float = 0.5, sinusoidal: bool = True) -> torch.Tensor:

453

"""

454

Apply a phasing effect to the audio. Similar to SoX implementation.

455

456

Args:

457

waveform: Audio waveform of dimension (..., time)

458

sample_rate: Sampling rate of the waveform

459

gain_in: Input gain

460

gain_out: Output gain

461

delay_ms: Delay in milliseconds

462

decay: Decay amount

463

mod_speed: Modulation speed

464

sinusoidal: Use sinusoidal modulation

465

466

Returns:

467

Tensor: Waveform of dimension (..., time)

468

"""

469

470

def riaa_biquad(waveform: torch.Tensor, sample_rate: int) -> torch.Tensor:

471

"""

472

Apply RIAA vinyl playback equalization. Similar to SoX implementation.

473

474

Args:

475

waveform: Audio waveform of dimension (..., time)

476

sample_rate: Sampling rate of the waveform

477

478

Returns:

479

Tensor: Waveform of dimension (..., time)

480

"""

481

482

def treble_biquad(waveform: torch.Tensor, sample_rate: int, gain: float,

483

central_freq: float = 3000, Q: float = 0.707) -> torch.Tensor:

484

"""

485

Design a treble tone-control effect. Similar to SoX implementation.

486

487

Args:

488

waveform: Audio waveform of dimension (..., time)

489

sample_rate: Sampling rate of the waveform

490

gain: Gain in dB

491

central_freq: Central frequency (in Hz, default: 3000)

492

Q: Q factor (Default: 0.707)

493

494

Returns:

495

Tensor: Waveform of dimension (..., time)

496

"""

497

498

def vad(waveform: torch.Tensor, sample_rate: int, trigger_level: float = 7.0,

499

trigger_time: float = 0.25, search_time: float = 1.0,

500

allowed_gap: float = 0.25, pre_trigger_time: float = 0.0,

501

boot_time: float = 0.35, noise_up_time: float = 0.1,

502

noise_down_time: float = 0.01, noise_reduction_amount: float = 1.35,

503

measure_freq: float = 20.0, measure_duration: Optional[float] = None,

504

measure_smooth_time: float = 0.4, hp_filter_freq: float = 50.0,

505

lp_filter_freq: float = 6000.0, hp_lifter_freq: float = 150.0,

506

lp_lifter_freq: float = 2000.0) -> torch.Tensor:

507

"""

508

Voice Activity Detector. Similar to SoX implementation.

509

510

Args:

511

waveform: Tensor of audio of dimension (..., time)

512

sample_rate: Sample rate of audio

513

trigger_level: Trigger level (default: 7.0)

514

trigger_time: Trigger time (default: 0.25)

515

search_time: Search time (default: 1.0)

516

allowed_gap: Allowed gap (default: 0.25)

517

pre_trigger_time: Pre-trigger time (default: 0.0)

518

boot_time: Boot time (default: 0.35)

519

noise_up_time: Noise up time (default: 0.1)

520

noise_down_time: Noise down time (default: 0.01)

521

noise_reduction_amount: Noise reduction amount (default: 1.35)

522

measure_freq: Measure frequency (default: 20.0)

523

measure_duration: Measure duration (optional)

524

measure_smooth_time: Measure smooth time (default: 0.4)

525

hp_filter_freq: High-pass filter frequency (default: 50.0)

526

lp_filter_freq: Low-pass filter frequency (default: 6000.0)

527

hp_lifter_freq: High-pass lifter frequency (default: 150.0)

528

lp_lifter_freq: Low-pass lifter frequency (default: 2000.0)

529

530

Returns:

531

Tensor: Audio with silence trimmed

532

"""

533

534

```

535

536

### Beamforming and Array Processing

537

538

Advanced beamforming algorithms for multi-channel audio processing and spatial filtering.

539

540

```python { .api }

541

def apply_beamforming(multi_channel_audio: torch.Tensor, beamforming_weights: torch.Tensor) -> torch.Tensor:

542

"""

543

Apply beamforming weights to multi-channel audio.

544

545

Args:

546

multi_channel_audio: Multi-channel audio tensor (..., channel, freq, time)

547

beamforming_weights: Beamforming weights (..., channel, freq)

548

549

Returns:

550

Tensor: Beamformed audio (..., freq, time)

551

"""

552

553

def mvdr_weights_souden(psd_s: torch.Tensor, psd_n: torch.Tensor, reference_vector: torch.Tensor,

554

diagonal_loading: bool = True, diag_eps: float = 1e-7) -> torch.Tensor:

555

"""

556

Compute MVDR (Minimum Variance Distortionless Response) beamforming weights using Souden's method.

557

558

Args:

559

psd_s: Power spectral density matrix of target speech (..., freq, channel, channel)

560

psd_n: Power spectral density matrix of noise (..., freq, channel, channel)

561

reference_vector: Reference microphone vector (..., channel)

562

diagonal_loading: Whether to apply diagonal loading

563

diag_eps: Diagonal loading factor

564

565

Returns:

566

Tensor: MVDR beamforming weights (..., freq, channel)

567

"""

568

569

def mvdr_weights_rtf(rtf_mat: torch.Tensor, psd_n: torch.Tensor, reference_vector: torch.Tensor,

570

diagonal_loading: bool = True, diag_eps: float = 1e-7) -> torch.Tensor:

571

"""

572

Compute MVDR beamforming weights using Relative Transfer Function (RTF).

573

574

Args:

575

rtf_mat: Relative transfer function matrix (..., freq, channel)

576

psd_n: Power spectral density matrix of noise (..., freq, channel, channel)

577

reference_vector: Reference microphone vector (..., channel)

578

diagonal_loading: Whether to apply diagonal loading

579

diag_eps: Diagonal loading factor

580

581

Returns:

582

Tensor: MVDR beamforming weights (..., freq, channel)

583

"""

584

585

def rtf_evd(psd_s: torch.Tensor, psd_n: torch.Tensor) -> torch.Tensor:

586

"""

587

Estimate relative transfer function (RTF) using eigenvalue decomposition.

588

589

Args:

590

psd_s: Power spectral density matrix of target speech (..., freq, channel, channel)

591

psd_n: Power spectral density matrix of noise (..., freq, channel, channel)

592

593

Returns:

594

Tensor: RTF matrix (..., freq, channel)

595

"""

596

597

def rtf_power(psd_s: torch.Tensor, psd_n: torch.Tensor, reference_channel: int = 0) -> torch.Tensor:

598

"""

599

Estimate relative transfer function (RTF) using power method.

600

601

Args:

602

psd_s: Power spectral density matrix of target speech (..., freq, channel, channel)

603

psd_n: Power spectral density matrix of noise (..., freq, channel, channel)

604

reference_channel: Reference channel index

605

606

Returns:

607

Tensor: RTF matrix (..., freq, channel)

608

"""

609

610

def psd(specgrams: torch.Tensor, mask: Optional[torch.Tensor] = None,

611

normalize: bool = True, eps: float = 1e-15) -> torch.Tensor:

612

"""

613

Compute power spectral density (PSD) matrix.

614

615

Args:

616

specgrams: Multi-channel spectrograms (..., channel, freq, time)

617

mask: Optional mask for PSD estimation (..., freq, time)

618

normalize: Whether to normalize by time frames

619

eps: Small value for numerical stability

620

621

Returns:

622

Tensor: PSD matrix (..., freq, channel, channel)

623

"""

624

```

625

626

### Pitch and Speed Manipulation

627

628

Functions for pitch shifting and time-scale modification.

629

630

```python { .api }

631

def pitch_shift(waveform: torch.Tensor, sample_rate: int, n_steps: float,

632

bins_per_octave: int = 12, n_fft: int = 512,

633

win_length: Optional[int] = None, hop_length: Optional[int] = None,

634

window: Optional[torch.Tensor] = None) -> torch.Tensor:

635

"""

636

Shift the pitch of waveform by n_steps steps.

637

638

Args:

639

waveform: Input waveform (..., time)

640

sample_rate: Sample rate of waveform

641

n_steps: Number of pitch steps to shift

642

bins_per_octave: Number of steps per octave

643

n_fft: Size of FFT

644

win_length: Window size

645

hop_length: Length of hop between STFT windows

646

window: Window function

647

648

Returns:

649

Tensor: Pitch-shifted waveform

650

"""

651

652

def speed(waveform: torch.Tensor, orig_freq: int, factor: float,

653

lengths: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:

654

"""

655

Adjust waveform speed by a given factor.

656

657

Args:

658

waveform: Input waveform (..., time)

659

orig_freq: Original sample rate

660

factor: Speed factor (>1.0 makes faster, <1.0 makes slower)

661

lengths: Original lengths of waveforms

662

663

Returns:

664

Tuple: (speed-adjusted waveform, adjusted lengths)

665

"""

666

667

def detect_pitch_frequency(waveform: torch.Tensor, sample_rate: int,

668

frame_time: float = 10 ** (-2), win_length: int = 30,

669

freq_low: int = 85, freq_high: int = 3400) -> torch.Tensor:

670

"""

671

Detect pitch frequency using autocorrelation method.

672

673

Args:

674

waveform: Input waveform (..., time)

675

sample_rate: Sample rate of the waveform

676

frame_time: Duration of a frame in seconds

677

win_length: Length of the window in frames

678

freq_low: Lowest detectable frequency

679

freq_high: Highest detectable frequency

680

681

Returns:

682

Tensor: Detected pitch frequencies (..., frame)

683

"""

684

```

685

686

### Codec and Format Processing

687

688

Functions for codec simulation and audio format processing.

689

690

```python { .api }

691

def apply_codec(waveform: torch.Tensor, sample_rate: int, format: str,

692

encoder: Optional[str] = None, encoder_config: Optional[dict] = None,

693

decoder: Optional[str] = None, decoder_config: Optional[dict] = None) -> torch.Tensor:

694

"""

695

Apply codec compression and decompression to waveform.

696

697

Args:

698

waveform: Input waveform (..., time)

699

sample_rate: Sample rate

700

format: Audio format ("wav", "mp3", "ogg", etc.)

701

encoder: Encoder name

702

encoder_config: Encoder configuration

703

decoder: Decoder name

704

decoder_config: Decoder configuration

705

706

Returns:

707

Tensor: Codec-processed waveform

708

"""

709

710

def mu_law_encoding(x: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:

711

"""

712

Encode signal based on mu-law companding.

713

714

Args:

715

x: Input tensor (..., time)

716

quantization_channels: Number of quantization channels

717

718

Returns:

719

Tensor: Mu-law encoded tensor

720

"""

721

722

def mu_law_decoding(x_mu: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:

723

"""

724

Decode mu-law encoded signal.

725

726

Args:

727

x_mu: Mu-law encoded input (..., time)

728

quantization_channels: Number of quantization channels

729

730

Returns:

731

Tensor: Decoded tensor

732

"""

733

```

734

735

### Advanced Signal Processing

736

737

Additional signal processing utilities and analysis functions.

738

739

```python { .api }

740

def preemphasis(waveform: torch.Tensor, coeff: float = 0.97) -> torch.Tensor:

741

"""

742

Apply pre-emphasis filter to waveform.

743

744

Args:

745

waveform: Input waveform (..., time)

746

coeff: Pre-emphasis coefficient

747

748

Returns:

749

Tensor: Pre-emphasized waveform

750

"""

751

752

def deemphasis(waveform: torch.Tensor, coeff: float = 0.97) -> torch.Tensor:

753

"""

754

Apply de-emphasis filter to waveform.

755

756

Args:

757

waveform: Input waveform (..., time)

758

coeff: De-emphasis coefficient

759

760

Returns:

761

Tensor: De-emphasized waveform

762

"""

763

764

def phase_vocoder(complex_specgrams: torch.Tensor, rate: float, phase_advance: torch.Tensor) -> torch.Tensor:

765

"""

766

Given a STFT tensor, speed up in time without modifying pitch by applying phase vocoder.

767

768

Args:

769

complex_specgrams: Complex-valued spectrogram (..., freq, time)

770

rate: Speed-up factor

771

phase_advance: Expected phase advance in each bin

772

773

Returns:

774

Tensor: Time-stretched complex spectrogram

775

"""

776

777

def mask_along_axis(specgrams: torch.Tensor, mask_param: int, mask_value: float,

778

axis: int) -> torch.Tensor:

779

"""

780

Apply masking along the given axis.

781

782

Args:

783

specgrams: Tensor spectrogram (..., freq, time)

784

mask_param: Number of columns to be masked

785

mask_value: Value to assign to masked columns

786

axis: Axis to apply masking on (1 for freq, 2 for time)

787

788

Returns:

789

Tensor: Masked spectrogram

790

"""

791

792

def mask_along_axis_iid(specgrams: torch.Tensor, mask_param: int, mask_value: float,

793

axis: int) -> torch.Tensor:

794

"""

795

Apply masking along the given axis with independent masks for each example.

796

797

Args:

798

specgrams: Tensor spectrogram (..., freq, time)

799

mask_param: Number of columns to be masked

800

mask_value: Value to assign to masked columns

801

axis: Axis to apply masking on (1 for freq, 2 for time)

802

803

Returns:

804

Tensor: Masked spectrogram

805

"""

806

807

def compute_deltas(specgram: torch.Tensor, win_length: int = 5, mode: str = "replicate") -> torch.Tensor:

808

"""

809

Compute delta coefficients of a tensor.

810

811

Args:

812

specgram: Input tensor (..., freq, time)

813

win_length: The window length used for computing delta

814

mode: Mode for padding ("replicate", "constant", etc.)

815

816

Returns:

817

Tensor: Delta coefficients

818

"""

819

820

def create_dct(n_mfcc: int, n_mels: int, norm: Optional[str] = None) -> torch.Tensor:

821

"""

822

Create DCT transformation matrix.

823

824

Args:

825

n_mfcc: Number of MFCC coefficients

826

n_mels: Number of mel filter banks

827

norm: Normalization mode ("ortho" or None)

828

829

Returns:

830

Tensor: DCT transformation matrix (n_mfcc, n_mels)

831

"""

832

833

def sliding_window_cmn(specgram: torch.Tensor, cmn_window: int = 600,

834

min_cmn_window: int = 100, center: bool = False,

835

norm_vars: bool = False) -> torch.Tensor:

836

"""

837

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

838

839

Args:

840

specgram: Input tensor (..., freq, time)

841

cmn_window: Window length for normalization

842

min_cmn_window: Minimum window length

843

center: Whether to center the window

844

norm_vars: Whether to normalize variance

845

846

Returns:

847

Tensor: Normalized tensor

848

"""

849

850

def spectral_centroid(waveform: torch.Tensor, sample_rate: int, pad: int = 0,

851

window: Optional[torch.Tensor] = None, n_fft: int = 400,

852

hop_length: Optional[int] = None, win_length: Optional[int] = None) -> torch.Tensor:

853

"""

854

Compute the spectral centroid for each frame.

855

856

Args:

857

waveform: Input tensor (..., time)

858

sample_rate: Sample rate of waveform

859

pad: Two sided padding of signal

860

window: Window tensor

861

n_fft: Size of FFT

862

hop_length: Length of hop between STFT windows

863

win_length: Window size

864

865

Returns:

866

Tensor: Spectral centroid (..., time)

867

"""

868

869

def add_noise(waveform: torch.Tensor, noise: torch.Tensor, snr: torch.Tensor,

870

lengths: Optional[torch.Tensor] = None) -> torch.Tensor:

871

"""

872

Add noise to waveform with given Signal-to-Noise Ratio (SNR).

873

874

Args:

875

waveform: Input waveform (..., time)

876

noise: Noise tensor (..., time)

877

snr: Signal-to-noise ratio in dB

878

lengths: Lengths of waveforms

879

880

Returns:

881

Tensor: Noisy waveform

882

"""

883

884

def convolve(waveform: torch.Tensor, kernel: torch.Tensor, mode: str = "full") -> torch.Tensor:

885

"""

886

Convolve waveform with kernel using PyTorch operations.

887

888

Args:

889

waveform: Input waveform (..., time)

890

kernel: Convolution kernel (..., time)

891

mode: Convolution mode ("full", "valid", "same")

892

893

Returns:

894

Tensor: Convolved waveform

895

"""

896

897

def fftconvolve(waveform: torch.Tensor, kernel: torch.Tensor, mode: str = "full") -> torch.Tensor:

898

"""

899

Convolve waveform with kernel using FFT.

900

901

Args:

902

waveform: Input waveform (..., time)

903

kernel: Convolution kernel (..., time)

904

mode: Convolution mode ("full", "valid", "same")

905

906

Returns:

907

Tensor: Convolved waveform

908

"""

909

910

def loudness(specgram: torch.Tensor, sample_rate: int) -> torch.Tensor:

911

"""

912

Compute loudness according to ITU-R BS.1770-4.

913

914

Args:

915

specgram: Input spectrogram (..., freq, time)

916

sample_rate: Sample rate

917

918

Returns:

919

Tensor: Loudness values

920

"""

921

922

def edit_distance(seq1: List[int], seq2: List[int]) -> int:

923

"""

924

Calculate edit distance between two sequences.

925

926

Args:

927

seq1: First sequence

928

seq2: Second sequence

929

930

Returns:

931

int: Edit distance

932

"""

933

934

def rnnt_loss(logits: torch.Tensor, targets: torch.Tensor, logit_lengths: torch.Tensor,

935

target_lengths: torch.Tensor, blank: int = -1, clamp: float = -1) -> torch.Tensor:

936

"""

937

Compute RNN-Transducer loss.

938

939

Args:

940

logits: Predicted logits (..., time, target_length, n_class)

941

targets: Target sequences (..., target_length)

942

logit_lengths: Length of logits for each sample

943

target_lengths: Length of targets for each sample

944

blank: Blank label index

945

clamp: Clamp gradients

946

947

Returns:

948

Tensor: RNN-T loss

949

"""

950

951

def frechet_distance(mu_x: torch.Tensor, sigma_x: torch.Tensor,

952

mu_y: torch.Tensor, sigma_y: torch.Tensor) -> torch.Tensor:

953

"""

954

Compute Fréchet distance between two multivariate Gaussians.

955

956

Args:

957

mu_x: Mean of first distribution

958

sigma_x: Covariance of first distribution

959

mu_y: Mean of second distribution

960

sigma_y: Covariance of second distribution

961

962

Returns:

963

Tensor: Fréchet distance

964

"""

965

966

```

967

968

```python { .api }

969

def lfilter(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,

970

zi: Optional[torch.Tensor] = None) -> torch.Tensor:

971

"""

972

Apply IIR filter using difference equation.

973

974

Args:

975

waveform: Input signal (..., time)

976

a_coeffs: Denominator coefficients (autoregressive)

977

b_coeffs: Numerator coefficients (moving average)

978

zi: Initial conditions for filter delays

979

980

Returns:

981

Tensor: Filtered signal

982

"""

983

984

def filtfilt(waveform: torch.Tensor, a_coeffs: torch.Tensor, b_coeffs: torch.Tensor,

985

clamp: bool = True) -> torch.Tensor:

986

"""

987

Apply zero-phase filtering using forward-backward filter.

988

989

Args:

990

waveform: Input signal (..., time)

991

a_coeffs: Denominator coefficients

992

b_coeffs: Numerator coefficients

993

clamp: Whether to clamp output to prevent numerical issues

994

995

Returns:

996

Tensor: Zero-phase filtered signal

997

"""

998

```

999

1000

### Pitch and Time Manipulation

1001

1002

Functions for manipulating pitch and temporal characteristics of audio.

1003

1004

```python { .api }

1005

def pitch_shift(waveform: torch.Tensor, sample_rate: int, n_steps: float,

1006

bins_per_octave: int = 12, n_fft: int = 512,

1007

win_length: Optional[int] = None, hop_length: Optional[int] = None,

1008

window: Optional[torch.Tensor] = None) -> torch.Tensor:

1009

"""

1010

Shift pitch of waveform by n_steps semitones.

1011

1012

Args:

1013

waveform: Input audio (..., time)

1014

sample_rate: Sample rate

1015

n_steps: Number of semitones to shift (positive = higher, negative = lower)

1016

bins_per_octave: Number of steps per octave

1017

n_fft: FFT size for STFT

1018

win_length: Window length

1019

hop_length: Hop length

1020

window: Window function

1021

1022

Returns:

1023

Tensor: Pitch-shifted audio

1024

"""

1025

1026

def speed(waveform: torch.Tensor, orig_freq: int, factor: float,

1027

lengths: Optional[torch.Tensor] = None) -> torch.Tensor:

1028

"""

1029

Adjust playback speed by resampling.

1030

1031

Args:

1032

waveform: Input audio (..., time)

1033

orig_freq: Original sample rate

1034

factor: Speed factor (>1.0 = faster, <1.0 = slower)

1035

lengths: Length of each sequence in batch

1036

1037

Returns:

1038

Tensor: Speed-adjusted audio

1039

"""

1040

1041

def phase_vocoder(complex_specgrams: torch.Tensor, rate: float,

1042

phase_advance: torch.Tensor) -> torch.Tensor:

1043

"""

1044

Apply phase vocoder for time stretching/compression.

1045

1046

Args:

1047

complex_specgrams: Complex STFT (..., freq, time)

1048

rate: Rate factor (>1.0 = faster, <1.0 = slower)

1049

phase_advance: Expected phase advance per hop

1050

1051

Returns:

1052

Tensor: Time-stretched complex spectrogram

1053

"""

1054

```

1055

1056

### Audio Analysis

1057

1058

Functions for analyzing audio characteristics and extracting features.

1059

1060

```python { .api }

1061

def spectral_centroid(waveform: torch.Tensor, sample_rate: int, pad: int = 0,

1062

window: Optional[torch.Tensor] = None, n_fft: int = 400,

1063

hop_length: Optional[int] = None, win_length: Optional[int] = None) -> torch.Tensor:

1064

"""

1065

Compute spectral centroid (center of mass of spectrum).

1066

1067

Args:

1068

waveform: Input audio (..., time)

1069

sample_rate: Sample rate

1070

(other parameters same as spectrogram)

1071

1072

Returns:

1073

Tensor: Spectral centroid over time (..., time)

1074

"""

1075

1076

def detect_pitch_frequency(waveform: torch.Tensor, sample_rate: int, frame_time: float = 10**(-2),

1077

win_length: int = 30, freq_low: int = 85, freq_high: int = 3400) -> torch.Tensor:

1078

"""

1079

Detect pitch frequency using autocorrelation method.

1080

1081

Args:

1082

waveform: Input audio (..., time)

1083

sample_rate: Sample rate

1084

frame_time: Length of frame in seconds

1085

win_length: Length of window for median filtering

1086

freq_low: Lowest frequency that can be detected

1087

freq_high: Highest frequency that can be detected

1088

1089

Returns:

1090

Tensor: Detected pitch frequency over time

1091

"""

1092

1093

def loudness(waveform: torch.Tensor, sample_rate: int) -> torch.Tensor:

1094

"""

1095

Compute loudness using ITU-R BS.1770-4 standard.

1096

1097

Args:

1098

waveform: Input audio (..., time)

1099

sample_rate: Sample rate

1100

1101

Returns:

1102

Tensor: Loudness in LUFS (Loudness Units Full Scale)

1103

"""

1104

```

1105

1106

### Convolution Operations

1107

1108

Convolution-based processing for impulse response application and acoustic modeling.

1109

1110

```python { .api }

1111

def convolve(x: torch.Tensor, y: torch.Tensor, mode: str = "full") -> torch.Tensor:

1112

"""

1113

Convolve two 1D tensors.

1114

1115

Args:

1116

x: First input tensor (..., time)

1117

y: Second input tensor (..., time)

1118

mode: Convolution mode ("full", "valid", "same")

1119

1120

Returns:

1121

Tensor: Convolved signal

1122

"""

1123

1124

def fftconvolve(x: torch.Tensor, y: torch.Tensor, mode: str = "full") -> torch.Tensor:

1125

"""

1126

Convolve using FFT for efficiency with long signals.

1127

1128

Args:

1129

x: First input tensor (..., time)

1130

y: Second input tensor (..., time)

1131

mode: Convolution mode ("full", "valid", "same")

1132

1133

Returns:

1134

Tensor: Convolved signal

1135

"""

1136

```

1137

1138

### Mu-Law Encoding/Decoding

1139

1140

Logarithmic quantization commonly used in telecommunications.

1141

1142

```python { .api }

1143

def mu_law_encoding(x: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:

1144

"""

1145

Encode waveform using mu-law companding.

1146

1147

Args:

1148

x: Input waveform (..., time)

1149

quantization_channels: Number of quantization levels

1150

1151

Returns:

1152

Tensor: Mu-law encoded signal (integer values)

1153

"""

1154

1155

def mu_law_decoding(x_mu: torch.Tensor, quantization_channels: int = 256) -> torch.Tensor:

1156

"""

1157

Decode mu-law encoded waveform.

1158

1159

Args:

1160

x_mu: Mu-law encoded signal (..., time)

1161

quantization_channels: Number of quantization levels

1162

1163

Returns:

1164

Tensor: Decoded waveform

1165

"""

1166

```

1167

1168

### Feature Processing

1169

1170

Functions for processing extracted audio features.

1171

1172

```python { .api }

1173

def compute_deltas(specgram: torch.Tensor, win_length: int = 5) -> torch.Tensor:

1174

"""

1175

Compute delta features (first derivatives) of spectrogram.

1176

1177

Args:

1178

specgram: Input spectrogram (..., freq, time)

1179

win_length: Window length for delta computation

1180

1181

Returns:

1182

Tensor: Delta features with same shape as input

1183

"""

1184

1185

def create_dct(n_mfcc: int, n_mels: int, norm: Optional[str] = None) -> torch.Tensor:

1186

"""

1187

Create Discrete Cosine Transform matrix for MFCC computation.

1188

1189

Args:

1190

n_mfcc: Number of MFCC coefficients

1191

n_mels: Number of mel filter banks

1192

norm: Normalization method ("ortho" or None)

1193

1194

Returns:

1195

Tensor: DCT matrix (n_mfcc, n_mels)

1196

"""

1197

1198

def sliding_window_cmn(specgram: torch.Tensor, cmn_window: int = 600, min_cmn_window: int = 100,

1199

center: bool = False, norm_vars: bool = False) -> torch.Tensor:

1200

"""

1201

Apply sliding window cepstral mean normalization.

1202

1203

Args:

1204

specgram: Input spectrogram (..., freq, time)

1205

cmn_window: Window size for normalization

1206

min_cmn_window: Minimum window size

1207

center: Whether to center the window

1208

norm_vars: Whether to normalize variance

1209

1210

Returns:

1211

Tensor: Normalized spectrogram

1212

"""

1213

```

1214

1215

This covers the extensive functional API of TorchAudio, providing stateless functions for all major audio processing operations from basic spectral analysis to advanced effects and feature extraction.