0
# Audio Helpers
1
2
The OpenAI SDK provides Node.js-specific helper functions for playing and recording audio. These utilities use `ffmpeg` and `ffplay` to handle audio streams, making it easy to work with audio from the OpenAI API.
3
4
**Platform Support:** Node.js only - these helpers are not available in browser environments.
5
6
## Package Information
7
8
- **Package Name**: openai
9
- **Version**: 6.9.1
10
- **Language**: TypeScript
11
- **Import Path**: `openai/helpers/audio`
12
- **Platform**: Node.js only (requires `ffmpeg` and `ffplay` to be installed)
13
14
## Core Imports
15
16
```typescript
17
import { playAudio, recordAudio } from 'openai/helpers/audio';
18
```
19
20
## Prerequisites
21
22
To use these helpers, you need to have `ffmpeg` and `ffplay` installed on your system:
23
24
**macOS (via Homebrew):**
25
```bash
26
brew install ffmpeg
27
```
28
29
**Ubuntu/Debian:**
30
```bash
31
sudo apt-get install ffmpeg
32
```
33
34
**Windows:**
35
Download from https://ffmpeg.org/download.html
36
37
## Capabilities
38
39
### playAudio
40
41
Plays audio from a stream, Response object, or File using `ffplay`. This is useful for immediately playing audio generated by the text-to-speech API.
42
43
```typescript { .api }
44
/**
45
* Plays audio from a stream, Response, or File using ffplay
46
* @param input - Audio source (ReadableStream, fetch Response, or File)
47
* @returns Promise that resolves when playback completes
48
* @throws Error if not running in Node.js or if ffplay fails
49
*/
50
function playAudio(
51
input: NodeJS.ReadableStream | Response | File
52
): Promise<void>;
53
```
54
55
**Usage with Text-to-Speech:**
56
57
```typescript
58
import OpenAI from 'openai';
59
import { playAudio } from 'openai/helpers/audio';
60
61
const client = new OpenAI();
62
63
// Generate speech and play it immediately
64
const response = await client.audio.speech.create({
65
model: 'tts-1',
66
voice: 'alloy',
67
input: 'Hello! This is a test of the text-to-speech API.',
68
});
69
70
// Play the audio
71
await playAudio(response);
72
console.log('Playback complete');
73
```
74
75
**Usage with Streaming:**
76
77
```typescript
78
import { playAudio } from 'openai/helpers/audio';
79
import fs from 'fs';
80
81
// Play from a file stream
82
const audioStream = fs.createReadStream('./audio.mp3');
83
await playAudio(audioStream);
84
```
85
86
**Usage with File Object:**
87
88
```typescript
89
import { playAudio } from 'openai/helpers/audio';
90
91
// Play from a File object
92
const audioFile = new File([audioBuffer], 'speech.mp3', { type: 'audio/mpeg' });
93
await playAudio(audioFile);
94
```
95
96
**Error Handling:**
97
98
```typescript
99
try {
100
await playAudio(audioResponse);
101
} catch (error) {
102
console.error('Playback error:', error);
103
// ffplay may not be installed or audio format is unsupported
104
}
105
```
106
107
### recordAudio
108
109
Records audio from the system's default audio input device using `ffmpeg`. Returns a WAV file that can be used with the transcription or translation APIs.
110
111
```typescript { .api }
112
/**
113
* Records audio from the system's default input device
114
* @param options - Recording options
115
* @param options.signal - AbortSignal to cancel recording early
116
* @param options.device - Device index (default: 0)
117
* @param options.timeout - Maximum recording duration in milliseconds
118
* @returns Promise resolving to a File with recorded audio
119
* @throws Error if not running in Node.js or if ffmpeg fails
120
*/
121
function recordAudio(options?: {
122
signal?: AbortSignal;
123
device?: number;
124
timeout?: number;
125
}): Promise<File>;
126
```
127
128
**Basic Recording with Timeout:**
129
130
```typescript
131
import OpenAI from 'openai';
132
import { recordAudio } from 'openai/helpers/audio';
133
134
const client = new OpenAI();
135
136
// Record for 5 seconds
137
const audioFile = await recordAudio({ timeout: 5000 });
138
139
// Transcribe the recording
140
const transcription = await client.audio.transcriptions.create({
141
file: audioFile,
142
model: 'whisper-1',
143
});
144
145
console.log('Transcription:', transcription.text);
146
```
147
148
**Recording with Manual Abort:**
149
150
```typescript
151
import { recordAudio } from 'openai/helpers/audio';
152
153
// Create an abort controller
154
const controller = new AbortController();
155
156
// Start recording
157
const recordingPromise = recordAudio({ signal: controller.signal });
158
159
// Stop recording after user input
160
setTimeout(() => {
161
controller.abort();
162
console.log('Recording stopped');
163
}, 10000);
164
165
const audioFile = await recordingPromise;
166
```
167
168
**Recording from Specific Device:**
169
170
```typescript
171
// Record from device index 1 instead of default (0)
172
const audioFile = await recordAudio({
173
device: 1,
174
timeout: 5000,
175
});
176
```
177
178
**Complete Example - Record and Transcribe:**
179
180
```typescript
181
import OpenAI from 'openai';
182
import { recordAudio } from 'openai/helpers/audio';
183
184
const client = new OpenAI();
185
186
async function recordAndTranscribe() {
187
console.log('Recording... Speak now!');
188
189
// Record for 10 seconds
190
const audioFile = await recordAudio({ timeout: 10000 });
191
192
console.log('Recording complete. Transcribing...');
193
194
// Transcribe the audio
195
const transcription = await client.audio.transcriptions.create({
196
file: audioFile,
197
model: 'whisper-1',
198
language: 'en', // optional
199
});
200
201
console.log('You said:', transcription.text);
202
return transcription.text;
203
}
204
205
recordAndTranscribe();
206
```
207
208
## Recording Configuration
209
210
### Audio Format
211
212
Recordings are captured in WAV format with the following specifications:
213
214
- **Format**: WAV (PCM)
215
- **Sample Rate**: 24,000 Hz
216
- **Channels**: 1 (mono)
217
- **Bit Depth**: 16-bit (default for WAV)
218
219
These settings are optimized for OpenAI's Whisper API.
220
221
### Platform-Specific Providers
222
223
The `recordAudio` function uses different audio providers depending on the operating system:
224
225
- **macOS**: `avfoundation`
226
- **Windows**: `dshow` (DirectShow)
227
- **Linux**: `alsa` (Advanced Linux Sound Architecture)
228
- **Other Unix**: `alsa`
229
230
## Options
231
232
### RecordAudioOptions
233
234
```typescript { .api }
235
interface RecordAudioOptions {
236
/**
237
* AbortSignal to stop recording before timeout
238
* Call controller.abort() to stop recording early
239
*/
240
signal?: AbortSignal;
241
242
/**
243
* Audio input device index
244
* @default 0 (system default device)
245
*/
246
device?: number;
247
248
/**
249
* Maximum recording duration in milliseconds
250
* Recording stops automatically after this duration
251
* If not specified, recording continues until manually aborted
252
*/
253
timeout?: number;
254
}
255
```
256
257
## Error Handling
258
259
### Common Errors
260
261
**Missing ffmpeg/ffplay:**
262
263
```typescript
264
try {
265
await playAudio(audioResponse);
266
} catch (error) {
267
console.error('Error:', error.message);
268
// "ffplay process exited with code 1"
269
// Ensure ffmpeg is installed: brew install ffmpeg
270
}
271
```
272
273
**Browser Environment:**
274
275
```typescript
276
import { playAudio } from 'openai/helpers/audio';
277
278
try {
279
await playAudio(audioResponse);
280
} catch (error) {
281
console.error(error.message);
282
// "Play audio is not supported in the browser yet.
283
// Check out https://npm.im/wavtools as an alternative."
284
}
285
```
286
287
**Recording Errors:**
288
289
```typescript
290
try {
291
const audio = await recordAudio({ device: 99 });
292
} catch (error) {
293
console.error('Recording error:', error);
294
// May indicate invalid device index or permission issues
295
}
296
```
297
298
## Best Practices
299
300
### Recording
301
302
1. **Always set a timeout or use an AbortSignal** to prevent infinite recording
303
2. **Check microphone permissions** before recording
304
3. **Verify ffmpeg is installed** with `ffmpeg -version`
305
4. **Test device index** - device 0 is usually the default microphone
306
307
### Playback
308
309
1. **Handle playback completion** with async/await or promise chaining
310
2. **Consider audio format** - ffplay supports most formats but may have issues with exotic codecs
311
3. **Volume control** - users can't control volume through the API, consider system volume warnings
312
313
### Platform Compatibility
314
315
1. **Node.js only** - these helpers will throw errors in browser environments
316
2. **Server-side use** - useful for CLI tools, demos, and testing
317
3. **Browser alternative** - use [wavtools](https://npm.im/wavtools) for browser-based audio handling
318
319
## Complete Example: Voice Conversation
320
321
```typescript
322
import OpenAI from 'openai';
323
import { recordAudio, playAudio } from 'openai/helpers/audio';
324
325
const client = new OpenAI();
326
327
async function voiceConversation() {
328
// 1. Record user input
329
console.log('Listening... (5 seconds)');
330
const userAudio = await recordAudio({ timeout: 5000 });
331
332
// 2. Transcribe to text
333
console.log('Transcribing...');
334
const transcription = await client.audio.transcriptions.create({
335
file: userAudio,
336
model: 'whisper-1',
337
});
338
339
console.log('You said:', transcription.text);
340
341
// 3. Generate response with chat
342
const completion = await client.chat.completions.create({
343
model: 'gpt-4o',
344
messages: [
345
{ role: 'user', content: transcription.text },
346
],
347
});
348
349
const responseText = completion.choices[0].message.content;
350
console.log('AI response:', responseText);
351
352
// 4. Convert response to speech
353
console.log('Generating speech...');
354
const speech = await client.audio.speech.create({
355
model: 'tts-1',
356
voice: 'alloy',
357
input: responseText,
358
});
359
360
// 5. Play the response
361
console.log('Playing response...');
362
await playAudio(speech);
363
364
console.log('Conversation complete!');
365
}
366
367
voiceConversation();
368
```
369
370
## See Also
371
372
- [Audio API](./audio.md) - Text-to-speech and speech-to-text APIs
373
- [Realtime API](./realtime.md) - WebSocket-based real-time voice conversations
374
- [wavtools](https://npm.im/wavtools) - Browser-compatible audio utilities
375