CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-openai

The official TypeScript library for the OpenAI API

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

helpers-audio.mddocs/

Audio Helpers

The OpenAI SDK provides Node.js-specific helper functions for playing and recording audio. These utilities use ffmpeg and ffplay to handle audio streams, making it easy to work with audio from the OpenAI API.

Platform Support: Node.js only - these helpers are not available in browser environments.

Package Information

  • Package Name: openai
  • Version: 6.9.1
  • Language: TypeScript
  • Import Path: openai/helpers/audio
  • Platform: Node.js only (requires ffmpeg and ffplay to be installed)

Core Imports

import { playAudio, recordAudio } from 'openai/helpers/audio';

Prerequisites

To use these helpers, you need to have ffmpeg and ffplay installed on your system:

macOS (via Homebrew):

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

Capabilities

playAudio

Plays audio from a stream, Response object, or File using ffplay. This is useful for immediately playing audio generated by the text-to-speech API.

/**
 * Plays audio from a stream, Response, or File using ffplay
 * @param input - Audio source (ReadableStream, fetch Response, or File)
 * @returns Promise that resolves when playback completes
 * @throws Error if not running in Node.js or if ffplay fails
 */
function playAudio(
  input: NodeJS.ReadableStream | Response | File
): Promise<void>;

Usage with Text-to-Speech:

import OpenAI from 'openai';
import { playAudio } from 'openai/helpers/audio';

const client = new OpenAI();

// Generate speech and play it immediately
const response = await client.audio.speech.create({
  model: 'tts-1',
  voice: 'alloy',
  input: 'Hello! This is a test of the text-to-speech API.',
});

// Play the audio
await playAudio(response);
console.log('Playback complete');

Usage with Streaming:

import { playAudio } from 'openai/helpers/audio';
import fs from 'fs';

// Play from a file stream
const audioStream = fs.createReadStream('./audio.mp3');
await playAudio(audioStream);

Usage with File Object:

import { playAudio } from 'openai/helpers/audio';

// Play from a File object
const audioFile = new File([audioBuffer], 'speech.mp3', { type: 'audio/mpeg' });
await playAudio(audioFile);

Error Handling:

try {
  await playAudio(audioResponse);
} catch (error) {
  console.error('Playback error:', error);
  // ffplay may not be installed or audio format is unsupported
}

recordAudio

Records audio from the system's default audio input device using ffmpeg. Returns a WAV file that can be used with the transcription or translation APIs.

/**
 * Records audio from the system's default input device
 * @param options - Recording options
 * @param options.signal - AbortSignal to cancel recording early
 * @param options.device - Device index (default: 0)
 * @param options.timeout - Maximum recording duration in milliseconds
 * @returns Promise resolving to a File with recorded audio
 * @throws Error if not running in Node.js or if ffmpeg fails
 */
function recordAudio(options?: {
  signal?: AbortSignal;
  device?: number;
  timeout?: number;
}): Promise<File>;

Basic Recording with Timeout:

import OpenAI from 'openai';
import { recordAudio } from 'openai/helpers/audio';

const client = new OpenAI();

// Record for 5 seconds
const audioFile = await recordAudio({ timeout: 5000 });

// Transcribe the recording
const transcription = await client.audio.transcriptions.create({
  file: audioFile,
  model: 'whisper-1',
});

console.log('Transcription:', transcription.text);

Recording with Manual Abort:

import { recordAudio } from 'openai/helpers/audio';

// Create an abort controller
const controller = new AbortController();

// Start recording
const recordingPromise = recordAudio({ signal: controller.signal });

// Stop recording after user input
setTimeout(() => {
  controller.abort();
  console.log('Recording stopped');
}, 10000);

const audioFile = await recordingPromise;

Recording from Specific Device:

// Record from device index 1 instead of default (0)
const audioFile = await recordAudio({
  device: 1,
  timeout: 5000,
});

Complete Example - Record and Transcribe:

import OpenAI from 'openai';
import { recordAudio } from 'openai/helpers/audio';

const client = new OpenAI();

async function recordAndTranscribe() {
  console.log('Recording... Speak now!');

  // Record for 10 seconds
  const audioFile = await recordAudio({ timeout: 10000 });

  console.log('Recording complete. Transcribing...');

  // Transcribe the audio
  const transcription = await client.audio.transcriptions.create({
    file: audioFile,
    model: 'whisper-1',
    language: 'en', // optional
  });

  console.log('You said:', transcription.text);
  return transcription.text;
}

recordAndTranscribe();

Recording Configuration

Audio Format

Recordings are captured in WAV format with the following specifications:

  • Format: WAV (PCM)
  • Sample Rate: 24,000 Hz
  • Channels: 1 (mono)
  • Bit Depth: 16-bit (default for WAV)

These settings are optimized for OpenAI's Whisper API.

Platform-Specific Providers

The recordAudio function uses different audio providers depending on the operating system:

  • macOS: avfoundation
  • Windows: dshow (DirectShow)
  • Linux: alsa (Advanced Linux Sound Architecture)
  • Other Unix: alsa

Options

RecordAudioOptions

interface RecordAudioOptions {
  /**
   * AbortSignal to stop recording before timeout
   * Call controller.abort() to stop recording early
   */
  signal?: AbortSignal;

  /**
   * Audio input device index
   * @default 0 (system default device)
   */
  device?: number;

  /**
   * Maximum recording duration in milliseconds
   * Recording stops automatically after this duration
   * If not specified, recording continues until manually aborted
   */
  timeout?: number;
}

Error Handling

Common Errors

Missing ffmpeg/ffplay:

try {
  await playAudio(audioResponse);
} catch (error) {
  console.error('Error:', error.message);
  // "ffplay process exited with code 1"
  // Ensure ffmpeg is installed: brew install ffmpeg
}

Browser Environment:

import { playAudio } from 'openai/helpers/audio';

try {
  await playAudio(audioResponse);
} catch (error) {
  console.error(error.message);
  // "Play audio is not supported in the browser yet.
  //  Check out https://npm.im/wavtools as an alternative."
}

Recording Errors:

try {
  const audio = await recordAudio({ device: 99 });
} catch (error) {
  console.error('Recording error:', error);
  // May indicate invalid device index or permission issues
}

Best Practices

Recording

  1. Always set a timeout or use an AbortSignal to prevent infinite recording
  2. Check microphone permissions before recording
  3. Verify ffmpeg is installed with ffmpeg -version
  4. Test device index - device 0 is usually the default microphone

Playback

  1. Handle playback completion with async/await or promise chaining
  2. Consider audio format - ffplay supports most formats but may have issues with exotic codecs
  3. Volume control - users can't control volume through the API, consider system volume warnings

Platform Compatibility

  1. Node.js only - these helpers will throw errors in browser environments
  2. Server-side use - useful for CLI tools, demos, and testing
  3. Browser alternative - use wavtools for browser-based audio handling

Complete Example: Voice Conversation

import OpenAI from 'openai';
import { recordAudio, playAudio } from 'openai/helpers/audio';

const client = new OpenAI();

async function voiceConversation() {
  // 1. Record user input
  console.log('Listening... (5 seconds)');
  const userAudio = await recordAudio({ timeout: 5000 });

  // 2. Transcribe to text
  console.log('Transcribing...');
  const transcription = await client.audio.transcriptions.create({
    file: userAudio,
    model: 'whisper-1',
  });

  console.log('You said:', transcription.text);

  // 3. Generate response with chat
  const completion = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: transcription.text },
    ],
  });

  const responseText = completion.choices[0].message.content;
  console.log('AI response:', responseText);

  // 4. Convert response to speech
  console.log('Generating speech...');
  const speech = await client.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: responseText,
  });

  // 5. Play the response
  console.log('Playing response...');
  await playAudio(speech);

  console.log('Conversation complete!');
}

voiceConversation();

See Also

  • Audio API - Text-to-speech and speech-to-text APIs
  • Realtime API - WebSocket-based real-time voice conversations
  • wavtools - Browser-compatible audio utilities

Install with Tessl CLI

npx tessl i tessl/npm-openai

docs

assistants.md

audio.md

batches-evals.md

chat-completions.md

client-configuration.md

containers.md

conversations.md

embeddings.md

files-uploads.md

fine-tuning.md

helpers-audio.md

helpers-zod.md

images.md

index.md

realtime.md

responses-api.md

vector-stores.md

videos.md

tile.json