Tessl Tile for npm/openai@6.9.1

or run

npx @tessl/cli init

helpers-audio.mddocs/

0
# Audio Helpers
1

2
The OpenAI SDK provides Node.js-specific helper functions for playing and recording audio. These utilities use `ffmpeg` and `ffplay` to handle audio streams, making it easy to work with audio from the OpenAI API.
3

4
**Platform Support:** Node.js only - these helpers are not available in browser environments.
5

6
## Package Information
7

8
- **Package Name**: openai
9
- **Version**: 6.9.1
10
- **Language**: TypeScript
11
- **Import Path**: `openai/helpers/audio`
12
- **Platform**: Node.js only (requires `ffmpeg` and `ffplay` to be installed)
13

14
## Core Imports
15

16
```typescript
17
import { playAudio, recordAudio } from 'openai/helpers/audio';
18
```
19

20
## Prerequisites
21

22
To use these helpers, you need to have `ffmpeg` and `ffplay` installed on your system:
23

24
**macOS (via Homebrew):**
25
```bash
26
brew install ffmpeg
27
```
28

29
**Ubuntu/Debian:**
30
```bash
31
sudo apt-get install ffmpeg
32
```
33

34
**Windows:**
35
Download from https://ffmpeg.org/download.html
36

37
## Capabilities
38

39
### playAudio
40

41
Plays audio from a stream, Response object, or File using `ffplay`. This is useful for immediately playing audio generated by the text-to-speech API.
42

43
```typescript { .api }
44
/**
45
 * Plays audio from a stream, Response, or File using ffplay
46
 * @param input - Audio source (ReadableStream, fetch Response, or File)
47
 * @returns Promise that resolves when playback completes
48
 * @throws Error if not running in Node.js or if ffplay fails
49
 */
50
function playAudio(
51
  input: NodeJS.ReadableStream | Response | File
52
): Promise<void>;
53
```
54

55
**Usage with Text-to-Speech:**
56

57
```typescript
58
import OpenAI from 'openai';
59
import { playAudio } from 'openai/helpers/audio';
60

61
const client = new OpenAI();
62

63
// Generate speech and play it immediately
64
const response = await client.audio.speech.create({
65
  model: 'tts-1',
66
  voice: 'alloy',
67
  input: 'Hello! This is a test of the text-to-speech API.',
68
});
69

70
// Play the audio
71
await playAudio(response);
72
console.log('Playback complete');
73
```
74

75
**Usage with Streaming:**
76

77
```typescript
78
import { playAudio } from 'openai/helpers/audio';
79
import fs from 'fs';
80

81
// Play from a file stream
82
const audioStream = fs.createReadStream('./audio.mp3');
83
await playAudio(audioStream);
84
```
85

86
**Usage with File Object:**
87

88
```typescript
89
import { playAudio } from 'openai/helpers/audio';
90

91
// Play from a File object
92
const audioFile = new File([audioBuffer], 'speech.mp3', { type: 'audio/mpeg' });
93
await playAudio(audioFile);
94
```
95

96
**Error Handling:**
97

98
```typescript
99
try {
100
  await playAudio(audioResponse);
101
} catch (error) {
102
  console.error('Playback error:', error);
103
  // ffplay may not be installed or audio format is unsupported
104
}
105
```
106

107
### recordAudio
108

109
Records audio from the system's default audio input device using `ffmpeg`. Returns a WAV file that can be used with the transcription or translation APIs.
110

111
```typescript { .api }
112
/**
113
 * Records audio from the system's default input device
114
 * @param options - Recording options
115
 * @param options.signal - AbortSignal to cancel recording early
116
 * @param options.device - Device index (default: 0)
117
 * @param options.timeout - Maximum recording duration in milliseconds
118
 * @returns Promise resolving to a File with recorded audio
119
 * @throws Error if not running in Node.js or if ffmpeg fails
120
 */
121
function recordAudio(options?: {
122
  signal?: AbortSignal;
123
  device?: number;
124
  timeout?: number;
125
}): Promise<File>;
126
```
127

128
**Basic Recording with Timeout:**
129

130
```typescript
131
import OpenAI from 'openai';
132
import { recordAudio } from 'openai/helpers/audio';
133

134
const client = new OpenAI();
135

136
// Record for 5 seconds
137
const audioFile = await recordAudio({ timeout: 5000 });
138

139
// Transcribe the recording
140
const transcription = await client.audio.transcriptions.create({
141
  file: audioFile,
142
  model: 'whisper-1',
143
});
144

145
console.log('Transcription:', transcription.text);
146
```
147

148
**Recording with Manual Abort:**
149

150
```typescript
151
import { recordAudio } from 'openai/helpers/audio';
152

153
// Create an abort controller
154
const controller = new AbortController();
155

156
// Start recording
157
const recordingPromise = recordAudio({ signal: controller.signal });
158

159
// Stop recording after user input
160
setTimeout(() => {
161
  controller.abort();
162
  console.log('Recording stopped');
163
}, 10000);
164

165
const audioFile = await recordingPromise;
166
```
167

168
**Recording from Specific Device:**
169

170
```typescript
171
// Record from device index 1 instead of default (0)
172
const audioFile = await recordAudio({
173
  device: 1,
174
  timeout: 5000,
175
});
176
```
177

178
**Complete Example - Record and Transcribe:**
179

180
```typescript
181
import OpenAI from 'openai';
182
import { recordAudio } from 'openai/helpers/audio';
183

184
const client = new OpenAI();
185

186
async function recordAndTranscribe() {
187
  console.log('Recording... Speak now!');
188

189
  // Record for 10 seconds
190
  const audioFile = await recordAudio({ timeout: 10000 });
191

192
  console.log('Recording complete. Transcribing...');
193

194
  // Transcribe the audio
195
  const transcription = await client.audio.transcriptions.create({
196
    file: audioFile,
197
    model: 'whisper-1',
198
    language: 'en', // optional
199
  });
200

201
  console.log('You said:', transcription.text);
202
  return transcription.text;
203
}
204

205
recordAndTranscribe();
206
```
207

208
## Recording Configuration
209

210
### Audio Format
211

212
Recordings are captured in WAV format with the following specifications:
213

214
- **Format**: WAV (PCM)
215
- **Sample Rate**: 24,000 Hz
216
- **Channels**: 1 (mono)
217
- **Bit Depth**: 16-bit (default for WAV)
218

219
These settings are optimized for OpenAI's Whisper API.
220

221
### Platform-Specific Providers
222

223
The `recordAudio` function uses different audio providers depending on the operating system:
224

225
- **macOS**: `avfoundation`
226
- **Windows**: `dshow` (DirectShow)
227
- **Linux**: `alsa` (Advanced Linux Sound Architecture)
228
- **Other Unix**: `alsa`
229

230
## Options
231

232
### RecordAudioOptions
233

234
```typescript { .api }
235
interface RecordAudioOptions {
236
  /**
237
   * AbortSignal to stop recording before timeout
238
   * Call controller.abort() to stop recording early
239
   */
240
  signal?: AbortSignal;
241

242
  /**
243
   * Audio input device index
244
   * @default 0 (system default device)
245
   */
246
  device?: number;
247

248
  /**
249
   * Maximum recording duration in milliseconds
250
   * Recording stops automatically after this duration
251
   * If not specified, recording continues until manually aborted
252
   */
253
  timeout?: number;
254
}
255
```
256

257
## Error Handling
258

259
### Common Errors
260

261
**Missing ffmpeg/ffplay:**
262

263
```typescript
264
try {
265
  await playAudio(audioResponse);
266
} catch (error) {
267
  console.error('Error:', error.message);
268
  // "ffplay process exited with code 1"
269
  // Ensure ffmpeg is installed: brew install ffmpeg
270
}
271
```
272

273
**Browser Environment:**
274

275
```typescript
276
import { playAudio } from 'openai/helpers/audio';
277

278
try {
279
  await playAudio(audioResponse);
280
} catch (error) {
281
  console.error(error.message);
282
  // "Play audio is not supported in the browser yet.
283
  //  Check out https://npm.im/wavtools as an alternative."
284
}
285
```
286

287
**Recording Errors:**
288

289
```typescript
290
try {
291
  const audio = await recordAudio({ device: 99 });
292
} catch (error) {
293
  console.error('Recording error:', error);
294
  // May indicate invalid device index or permission issues
295
}
296
```
297

298
## Best Practices
299

300
### Recording
301

302
1. **Always set a timeout or use an AbortSignal** to prevent infinite recording
303
2. **Check microphone permissions** before recording
304
3. **Verify ffmpeg is installed** with `ffmpeg -version`
305
4. **Test device index** - device 0 is usually the default microphone
306

307
### Playback
308

309
1. **Handle playback completion** with async/await or promise chaining
310
2. **Consider audio format** - ffplay supports most formats but may have issues with exotic codecs
311
3. **Volume control** - users can't control volume through the API, consider system volume warnings
312

313
### Platform Compatibility
314

315
1. **Node.js only** - these helpers will throw errors in browser environments
316
2. **Server-side use** - useful for CLI tools, demos, and testing
317
3. **Browser alternative** - use [wavtools](https://npm.im/wavtools) for browser-based audio handling
318

319
## Complete Example: Voice Conversation
320

321
```typescript
322
import OpenAI from 'openai';
323
import { recordAudio, playAudio } from 'openai/helpers/audio';
324

325
const client = new OpenAI();
326

327
async function voiceConversation() {
328
  // 1. Record user input
329
  console.log('Listening... (5 seconds)');
330
  const userAudio = await recordAudio({ timeout: 5000 });
331

332
  // 2. Transcribe to text
333
  console.log('Transcribing...');
334
  const transcription = await client.audio.transcriptions.create({
335
    file: userAudio,
336
    model: 'whisper-1',
337
  });
338

339
  console.log('You said:', transcription.text);
340

341
  // 3. Generate response with chat
342
  const completion = await client.chat.completions.create({
343
    model: 'gpt-4o',
344
    messages: [
345
      { role: 'user', content: transcription.text },
346
    ],
347
  });
348

349
  const responseText = completion.choices[0].message.content;
350
  console.log('AI response:', responseText);
351

352
  // 4. Convert response to speech
353
  console.log('Generating speech...');
354
  const speech = await client.audio.speech.create({
355
    model: 'tts-1',
356
    voice: 'alloy',
357
    input: responseText,
358
  });
359

360
  // 5. Play the response
361
  console.log('Playing response...');
362
  await playAudio(speech);
363

364
  console.log('Conversation complete!');
365
}
366

367
voiceConversation();
368
```
369

370
## See Also
371

372
- [Audio API](./audio.md) - Text-to-speech and speech-to-text APIs
373
- [Realtime API](./realtime.md) - WebSocket-based real-time voice conversations
374
- [wavtools](https://npm.im/wavtools) - Browser-compatible audio utilities
375

Version

Tile

Files

helpers-audio.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

helpers-audio.mddocs/