tessl/npm-openai

The official TypeScript library for the OpenAI API

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Audio Helpers

Name: tessl/npm-openai
Author: tessl

The OpenAI SDK provides Node.js-specific helper functions for playing and recording audio. These utilities use ffmpeg and ffplay to handle audio streams, making it easy to work with audio from the OpenAI API.

Platform Support: Node.js only - these helpers are not available in browser environments.

Package Information

Package Name: openai
Version: 6.9.1
Language: TypeScript
Import Path: openai/helpers/audio
Platform: Node.js only (requires ffmpeg and ffplay to be installed)

Core Imports

import { playAudio, recordAudio } from 'openai/helpers/audio';

Prerequisites

To use these helpers, you need to have ffmpeg and ffplay installed on your system:

macOS (via Homebrew):

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

Capabilities

playAudio

Plays audio from a stream, Response object, or File using ffplay. This is useful for immediately playing audio generated by the text-to-speech API.

/**
 * Plays audio from a stream, Response, or File using ffplay
 * @param input - Audio source (ReadableStream, fetch Response, or File)
 * @returns Promise that resolves when playback completes
 * @throws Error if not running in Node.js or if ffplay fails
 */
function playAudio(
  input: NodeJS.ReadableStream | Response | File
): Promise<void>;

Usage with Text-to-Speech:

import OpenAI from 'openai';
import { playAudio } from 'openai/helpers/audio';

const client = new OpenAI();

// Generate speech and play it immediately
const response = await client.audio.speech.create({
  model: 'tts-1',
  voice: 'alloy',
  input: 'Hello! This is a test of the text-to-speech API.',
});

// Play the audio
await playAudio(response);
console.log('Playback complete');

Usage with Streaming:

import { playAudio } from 'openai/helpers/audio';
import fs from 'fs';

// Play from a file stream
const audioStream = fs.createReadStream('./audio.mp3');
await playAudio(audioStream);

Usage with File Object:

import { playAudio } from 'openai/helpers/audio';

// Play from a File object
const audioFile = new File([audioBuffer], 'speech.mp3', { type: 'audio/mpeg' });
await playAudio(audioFile);

Error Handling:

try {
  await playAudio(audioResponse);
} catch (error) {
  console.error('Playback error:', error);
  // ffplay may not be installed or audio format is unsupported
}

recordAudio

Records audio from the system's default audio input device using ffmpeg. Returns a WAV file that can be used with the transcription or translation APIs.

/**
 * Records audio from the system's default input device
 * @param options - Recording options
 * @param options.signal - AbortSignal to cancel recording early
 * @param options.device - Device index (default: 0)
 * @param options.timeout - Maximum recording duration in milliseconds
 * @returns Promise resolving to a File with recorded audio
 * @throws Error if not running in Node.js or if ffmpeg fails
 */
function recordAudio(options?: {
  signal?: AbortSignal;
  device?: number;
  timeout?: number;
}): Promise<File>;

Basic Recording with Timeout:

import OpenAI from 'openai';
import { recordAudio } from 'openai/helpers/audio';

const client = new OpenAI();

// Record for 5 seconds
const audioFile = await recordAudio({ timeout: 5000 });

// Transcribe the recording
const transcription = await client.audio.transcriptions.create({
  file: audioFile,
  model: 'whisper-1',
});

console.log('Transcription:', transcription.text);

Recording with Manual Abort:

import { recordAudio } from 'openai/helpers/audio';

// Create an abort controller
const controller = new AbortController();

// Start recording
const recordingPromise = recordAudio({ signal: controller.signal });

// Stop recording after user input
setTimeout(() => {
  controller.abort();
  console.log('Recording stopped');
}, 10000);

const audioFile = await recordingPromise;

Recording from Specific Device:

// Record from device index 1 instead of default (0)
const audioFile = await recordAudio({
  device: 1,
  timeout: 5000,
});

Complete Example - Record and Transcribe:

import OpenAI from 'openai';
import { recordAudio } from 'openai/helpers/audio';

const client = new OpenAI();

async function recordAndTranscribe() {
  console.log('Recording... Speak now!');

  // Record for 10 seconds
  const audioFile = await recordAudio({ timeout: 10000 });

  console.log('Recording complete. Transcribing...');

  // Transcribe the audio
  const transcription = await client.audio.transcriptions.create({
    file: audioFile,
    model: 'whisper-1',
    language: 'en', // optional
  });

  console.log('You said:', transcription.text);
  return transcription.text;
}

recordAndTranscribe();

Recording Configuration

Audio Format

Recordings are captured in WAV format with the following specifications:

Format: WAV (PCM)
Sample Rate: 24,000 Hz
Channels: 1 (mono)
Bit Depth: 16-bit (default for WAV)

These settings are optimized for OpenAI's Whisper API.

Platform-Specific Providers

The recordAudio function uses different audio providers depending on the operating system:

macOS: avfoundation
Windows: dshow (DirectShow)
Linux: alsa (Advanced Linux Sound Architecture)
Other Unix: alsa

Options

RecordAudioOptions

interface RecordAudioOptions {
  /**
   * AbortSignal to stop recording before timeout
   * Call controller.abort() to stop recording early
   */
  signal?: AbortSignal;

  /**
   * Audio input device index
   * @default 0 (system default device)
   */
  device?: number;

  /**
   * Maximum recording duration in milliseconds
   * Recording stops automatically after this duration
   * If not specified, recording continues until manually aborted
   */
  timeout?: number;
}

Error Handling

Common Errors

Missing ffmpeg/ffplay:

try {
  await playAudio(audioResponse);
} catch (error) {
  console.error('Error:', error.message);
  // "ffplay process exited with code 1"
  // Ensure ffmpeg is installed: brew install ffmpeg
}

Browser Environment:

import { playAudio } from 'openai/helpers/audio';

try {
  await playAudio(audioResponse);
} catch (error) {
  console.error(error.message);
  // "Play audio is not supported in the browser yet.
  //  Check out https://npm.im/wavtools as an alternative."
}

Recording Errors:

try {
  const audio = await recordAudio({ device: 99 });
} catch (error) {
  console.error('Recording error:', error);
  // May indicate invalid device index or permission issues
}

Best Practices

Recording

Always set a timeout or use an AbortSignal to prevent infinite recording
Check microphone permissions before recording
Verify ffmpeg is installed with ffmpeg -version
Test device index - device 0 is usually the default microphone

Playback

Handle playback completion with async/await or promise chaining
Consider audio format - ffplay supports most formats but may have issues with exotic codecs
Volume control - users can't control volume through the API, consider system volume warnings

Platform Compatibility

Node.js only - these helpers will throw errors in browser environments
Server-side use - useful for CLI tools, demos, and testing
Browser alternative - use wavtools for browser-based audio handling

Complete Example: Voice Conversation

import OpenAI from 'openai';
import { recordAudio, playAudio } from 'openai/helpers/audio';

const client = new OpenAI();

async function voiceConversation() {
  // 1. Record user input
  console.log('Listening... (5 seconds)');
  const userAudio = await recordAudio({ timeout: 5000 });

  // 2. Transcribe to text
  console.log('Transcribing...');
  const transcription = await client.audio.transcriptions.create({
    file: userAudio,
    model: 'whisper-1',
  });

  console.log('You said:', transcription.text);

  // 3. Generate response with chat
  const completion = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: transcription.text },
    ],
  });

  const responseText = completion.choices[0].message.content;
  console.log('AI response:', responseText);

  // 4. Convert response to speech
  console.log('Generating speech...');
  const speech = await client.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: responseText,
  });

  // 5. Play the response
  console.log('Playing response...');
  await playAudio(speech);

  console.log('Conversation complete!');
}

voiceConversation();

tessl/npm-openai

helpers-audio.mddocs/

Audio Helpers

Package Information

Core Imports

Prerequisites

Capabilities

playAudio

recordAudio

Recording Configuration

Audio Format

Platform-Specific Providers

Options

RecordAudioOptions

Error Handling

Common Errors

Best Practices

Recording

Playback

Platform Compatibility

Complete Example: Voice Conversation

See Also

tessl/npm-openai

helpers-audio.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Audio Helpers

Package Information

Core Imports

Prerequisites

Capabilities

playAudio

recordAudio

Recording Configuration

Audio Format

Platform-Specific Providers

Options

RecordAudioOptions

Error Handling

Common Errors

Best Practices

Recording

Playback

Platform Compatibility

Complete Example: Voice Conversation

See Also

helpers-audio.mddocs/