or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

assistants.mdaudio.mdbatches-evals.mdchat-completions.mdclient-configuration.mdcontainers.mdconversations.mdembeddings.mdfiles-uploads.mdfine-tuning.mdhelpers-audio.mdhelpers-zod.mdimages.mdindex.mdrealtime.mdresponses-api.mdvector-stores.mdvideos.md

helpers-audio.mddocs/

0

# Audio Helpers

1

2

The OpenAI SDK provides Node.js-specific helper functions for playing and recording audio. These utilities use `ffmpeg` and `ffplay` to handle audio streams, making it easy to work with audio from the OpenAI API.

3

4

**Platform Support:** Node.js only - these helpers are not available in browser environments.

5

6

## Package Information

7

8

- **Package Name**: openai

9

- **Version**: 6.9.1

10

- **Language**: TypeScript

11

- **Import Path**: `openai/helpers/audio`

12

- **Platform**: Node.js only (requires `ffmpeg` and `ffplay` to be installed)

13

14

## Core Imports

15

16

```typescript

17

import { playAudio, recordAudio } from 'openai/helpers/audio';

18

```

19

20

## Prerequisites

21

22

To use these helpers, you need to have `ffmpeg` and `ffplay` installed on your system:

23

24

**macOS (via Homebrew):**

25

```bash

26

brew install ffmpeg

27

```

28

29

**Ubuntu/Debian:**

30

```bash

31

sudo apt-get install ffmpeg

32

```

33

34

**Windows:**

35

Download from https://ffmpeg.org/download.html

36

37

## Capabilities

38

39

### playAudio

40

41

Plays audio from a stream, Response object, or File using `ffplay`. This is useful for immediately playing audio generated by the text-to-speech API.

42

43

```typescript { .api }

44

/**

45

* Plays audio from a stream, Response, or File using ffplay

46

* @param input - Audio source (ReadableStream, fetch Response, or File)

47

* @returns Promise that resolves when playback completes

48

* @throws Error if not running in Node.js or if ffplay fails

49

*/

50

function playAudio(

51

input: NodeJS.ReadableStream | Response | File

52

): Promise<void>;

53

```

54

55

**Usage with Text-to-Speech:**

56

57

```typescript

58

import OpenAI from 'openai';

59

import { playAudio } from 'openai/helpers/audio';

60

61

const client = new OpenAI();

62

63

// Generate speech and play it immediately

64

const response = await client.audio.speech.create({

65

model: 'tts-1',

66

voice: 'alloy',

67

input: 'Hello! This is a test of the text-to-speech API.',

68

});

69

70

// Play the audio

71

await playAudio(response);

72

console.log('Playback complete');

73

```

74

75

**Usage with Streaming:**

76

77

```typescript

78

import { playAudio } from 'openai/helpers/audio';

79

import fs from 'fs';

80

81

// Play from a file stream

82

const audioStream = fs.createReadStream('./audio.mp3');

83

await playAudio(audioStream);

84

```

85

86

**Usage with File Object:**

87

88

```typescript

89

import { playAudio } from 'openai/helpers/audio';

90

91

// Play from a File object

92

const audioFile = new File([audioBuffer], 'speech.mp3', { type: 'audio/mpeg' });

93

await playAudio(audioFile);

94

```

95

96

**Error Handling:**

97

98

```typescript

99

try {

100

await playAudio(audioResponse);

101

} catch (error) {

102

console.error('Playback error:', error);

103

// ffplay may not be installed or audio format is unsupported

104

}

105

```

106

107

### recordAudio

108

109

Records audio from the system's default audio input device using `ffmpeg`. Returns a WAV file that can be used with the transcription or translation APIs.

110

111

```typescript { .api }

112

/**

113

* Records audio from the system's default input device

114

* @param options - Recording options

115

* @param options.signal - AbortSignal to cancel recording early

116

* @param options.device - Device index (default: 0)

117

* @param options.timeout - Maximum recording duration in milliseconds

118

* @returns Promise resolving to a File with recorded audio

119

* @throws Error if not running in Node.js or if ffmpeg fails

120

*/

121

function recordAudio(options?: {

122

signal?: AbortSignal;

123

device?: number;

124

timeout?: number;

125

}): Promise<File>;

126

```

127

128

**Basic Recording with Timeout:**

129

130

```typescript

131

import OpenAI from 'openai';

132

import { recordAudio } from 'openai/helpers/audio';

133

134

const client = new OpenAI();

135

136

// Record for 5 seconds

137

const audioFile = await recordAudio({ timeout: 5000 });

138

139

// Transcribe the recording

140

const transcription = await client.audio.transcriptions.create({

141

file: audioFile,

142

model: 'whisper-1',

143

});

144

145

console.log('Transcription:', transcription.text);

146

```

147

148

**Recording with Manual Abort:**

149

150

```typescript

151

import { recordAudio } from 'openai/helpers/audio';

152

153

// Create an abort controller

154

const controller = new AbortController();

155

156

// Start recording

157

const recordingPromise = recordAudio({ signal: controller.signal });

158

159

// Stop recording after user input

160

setTimeout(() => {

161

controller.abort();

162

console.log('Recording stopped');

163

}, 10000);

164

165

const audioFile = await recordingPromise;

166

```

167

168

**Recording from Specific Device:**

169

170

```typescript

171

// Record from device index 1 instead of default (0)

172

const audioFile = await recordAudio({

173

device: 1,

174

timeout: 5000,

175

});

176

```

177

178

**Complete Example - Record and Transcribe:**

179

180

```typescript

181

import OpenAI from 'openai';

182

import { recordAudio } from 'openai/helpers/audio';

183

184

const client = new OpenAI();

185

186

async function recordAndTranscribe() {

187

console.log('Recording... Speak now!');

188

189

// Record for 10 seconds

190

const audioFile = await recordAudio({ timeout: 10000 });

191

192

console.log('Recording complete. Transcribing...');

193

194

// Transcribe the audio

195

const transcription = await client.audio.transcriptions.create({

196

file: audioFile,

197

model: 'whisper-1',

198

language: 'en', // optional

199

});

200

201

console.log('You said:', transcription.text);

202

return transcription.text;

203

}

204

205

recordAndTranscribe();

206

```

207

208

## Recording Configuration

209

210

### Audio Format

211

212

Recordings are captured in WAV format with the following specifications:

213

214

- **Format**: WAV (PCM)

215

- **Sample Rate**: 24,000 Hz

216

- **Channels**: 1 (mono)

217

- **Bit Depth**: 16-bit (default for WAV)

218

219

These settings are optimized for OpenAI's Whisper API.

220

221

### Platform-Specific Providers

222

223

The `recordAudio` function uses different audio providers depending on the operating system:

224

225

- **macOS**: `avfoundation`

226

- **Windows**: `dshow` (DirectShow)

227

- **Linux**: `alsa` (Advanced Linux Sound Architecture)

228

- **Other Unix**: `alsa`

229

230

## Options

231

232

### RecordAudioOptions

233

234

```typescript { .api }

235

interface RecordAudioOptions {

236

/**

237

* AbortSignal to stop recording before timeout

238

* Call controller.abort() to stop recording early

239

*/

240

signal?: AbortSignal;

241

242

/**

243

* Audio input device index

244

* @default 0 (system default device)

245

*/

246

device?: number;

247

248

/**

249

* Maximum recording duration in milliseconds

250

* Recording stops automatically after this duration

251

* If not specified, recording continues until manually aborted

252

*/

253

timeout?: number;

254

}

255

```

256

257

## Error Handling

258

259

### Common Errors

260

261

**Missing ffmpeg/ffplay:**

262

263

```typescript

264

try {

265

await playAudio(audioResponse);

266

} catch (error) {

267

console.error('Error:', error.message);

268

// "ffplay process exited with code 1"

269

// Ensure ffmpeg is installed: brew install ffmpeg

270

}

271

```

272

273

**Browser Environment:**

274

275

```typescript

276

import { playAudio } from 'openai/helpers/audio';

277

278

try {

279

await playAudio(audioResponse);

280

} catch (error) {

281

console.error(error.message);

282

// "Play audio is not supported in the browser yet.

283

// Check out https://npm.im/wavtools as an alternative."

284

}

285

```

286

287

**Recording Errors:**

288

289

```typescript

290

try {

291

const audio = await recordAudio({ device: 99 });

292

} catch (error) {

293

console.error('Recording error:', error);

294

// May indicate invalid device index or permission issues

295

}

296

```

297

298

## Best Practices

299

300

### Recording

301

302

1. **Always set a timeout or use an AbortSignal** to prevent infinite recording

303

2. **Check microphone permissions** before recording

304

3. **Verify ffmpeg is installed** with `ffmpeg -version`

305

4. **Test device index** - device 0 is usually the default microphone

306

307

### Playback

308

309

1. **Handle playback completion** with async/await or promise chaining

310

2. **Consider audio format** - ffplay supports most formats but may have issues with exotic codecs

311

3. **Volume control** - users can't control volume through the API, consider system volume warnings

312

313

### Platform Compatibility

314

315

1. **Node.js only** - these helpers will throw errors in browser environments

316

2. **Server-side use** - useful for CLI tools, demos, and testing

317

3. **Browser alternative** - use [wavtools](https://npm.im/wavtools) for browser-based audio handling

318

319

## Complete Example: Voice Conversation

320

321

```typescript

322

import OpenAI from 'openai';

323

import { recordAudio, playAudio } from 'openai/helpers/audio';

324

325

const client = new OpenAI();

326

327

async function voiceConversation() {

328

// 1. Record user input

329

console.log('Listening... (5 seconds)');

330

const userAudio = await recordAudio({ timeout: 5000 });

331

332

// 2. Transcribe to text

333

console.log('Transcribing...');

334

const transcription = await client.audio.transcriptions.create({

335

file: userAudio,

336

model: 'whisper-1',

337

});

338

339

console.log('You said:', transcription.text);

340

341

// 3. Generate response with chat

342

const completion = await client.chat.completions.create({

343

model: 'gpt-4o',

344

messages: [

345

{ role: 'user', content: transcription.text },

346

],

347

});

348

349

const responseText = completion.choices[0].message.content;

350

console.log('AI response:', responseText);

351

352

// 4. Convert response to speech

353

console.log('Generating speech...');

354

const speech = await client.audio.speech.create({

355

model: 'tts-1',

356

voice: 'alloy',

357

input: responseText,

358

});

359

360

// 5. Play the response

361

console.log('Playing response...');

362

await playAudio(speech);

363

364

console.log('Conversation complete!');

365

}

366

367

voiceConversation();

368

```

369

370

## See Also

371

372

- [Audio API](./audio.md) - Text-to-speech and speech-to-text APIs

373

- [Realtime API](./realtime.md) - WebSocket-based real-time voice conversations

374

- [wavtools](https://npm.im/wavtools) - Browser-compatible audio utilities

375