tessl/npm-openai

The official TypeScript library for the OpenAI API

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Realtime API

Name: tessl/npm-openai
Author: tessl

The Realtime API provides WebSocket-based real-time voice conversations with OpenAI models. It supports bidirectional audio streaming, server-side voice activity detection (VAD), function calling, and full conversation management. The API is designed for live voice applications including phone calls, voice assistants, and interactive conversational experiences.

Package Information

Package Name: openai
Package Type: npm
Language: TypeScript
Installation: npm install openai

API Status

The Realtime API is now generally available (GA) at client.realtime.*.

Deprecation Notice: The legacy beta Realtime API at client.beta.realtime.* is deprecated. If you are using the beta API, migrate to the GA API documented here. The beta API includes:

client.beta.realtime.sessions.create() (deprecated - use client.realtime.clientSecrets.create() instead)
client.beta.realtime.transcriptionSessions.create() (deprecated)

All new projects should use the GA Realtime API (client.realtime.*) documented on this page.

Core Imports

import OpenAI from "openai";
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket"; // Browser
import { OpenAIRealtimeWS } from "openai/realtime/ws"; // Node.js (requires 'ws' package)

WebSocket Clients

The Realtime API provides two WebSocket client implementations for different runtime environments:

OpenAIRealtimeWebSocket (Browser)

For browser environments, use OpenAIRealtimeWebSocket which uses the native browser WebSocket API.

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ws = new OpenAIRealtimeWebSocket(
  {
    model: "gpt-realtime",
    dangerouslyAllowBrowser: true, // Required for browser use
  },
  client
);

// Event handling
ws.on("session.created", (event) => {
  console.log("Session started:", event.session.id);
});

ws.on("response.audio.delta", (event) => {
  // Handle audio deltas - event.delta is base64 encoded audio
  const audioData = atob(event.delta);
  playAudio(audioData);
});

ws.on("error", (error) => {
  console.error("WebSocket error:", error);
});

// Send audio to the server
function sendAudio(audioData: ArrayBuffer) {
  const base64Audio = btoa(String.fromCharCode(...new Uint8Array(audioData)));
  ws.send({
    type: "input_audio_buffer.append",
    audio: base64Audio,
  });
}

// Commit audio buffer to trigger processing
ws.send({
  type: "input_audio_buffer.commit",
});

// Close connection
ws.close();

Key features:

Uses native browser WebSocket API
Requires dangerouslyAllowBrowser: true in configuration
Audio must be base64 encoded
Automatic reconnection handling
Built-in event emitter for all realtime events

OpenAIRealtimeWS (Node.js)

For Node.js environments, use OpenAIRealtimeWS which uses the ws package for WebSocket support.

import { OpenAIRealtimeWS } from "openai/realtime/ws";
import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ws = new OpenAIRealtimeWS(
  {
    model: "gpt-realtime",
  },
  client
);

// Event handling (same interface as browser version)
ws.on("session.created", (event) => {
  console.log("Session started:", event.session.id);
});

ws.on("response.audio.delta", (event) => {
  // Handle audio deltas
  const audioBuffer = Buffer.from(event.delta, "base64");
  // Write to file or stream to audio output
  fs.appendFileSync("output.pcm", audioBuffer);
});

ws.on("response.done", (event) => {
  console.log("Response complete:", event.response.id);
});

// Send audio from file or buffer
function sendAudioFromFile(filePath: string) {
  const audioBuffer = fs.readFileSync(filePath);
  const base64Audio = audioBuffer.toString("base64");

  ws.send({
    type: "input_audio_buffer.append",
    audio: base64Audio,
  });
}

// Trigger response generation
ws.send({
  type: "input_audio_buffer.commit",
});

// Close connection
ws.close();

Key features:

Uses ws package for WebSocket support (add to dependencies: npm install ws @types/ws)
Same event interface as browser version for consistency
Better Node.js stream integration
Automatic reconnection handling
Suitable for server-side applications

Common Event Patterns

Both WebSocket clients support the same event handling interface:

// Connection events
ws.on("session.created", (event) => { /* Session initialization */ });
ws.on("session.updated", (event) => { /* Session configuration changed */ });

// Conversation events
ws.on("conversation.created", (event) => { /* New conversation */ });
ws.on("conversation.item.created", (event) => { /* New item added */ });
ws.on("conversation.item.deleted", (event) => { /* Item removed */ });

// Audio events (streaming)
ws.on("response.audio.delta", (event) => { /* Audio chunk received */ });
ws.on("response.audio.done", (event) => { /* Audio complete */ });
ws.on("response.audio_transcript.delta", (event) => { /* Transcript chunk */ });
ws.on("response.audio_transcript.done", (event) => { /* Transcript complete */ });

// Response events
ws.on("response.created", (event) => { /* Response started */ });
ws.on("response.done", (event) => { /* Response complete */ });
ws.on("response.cancelled", (event) => { /* Response cancelled */ });
ws.on("response.failed", (event) => { /* Response failed */ });

// Function calling events
ws.on("response.function_call_arguments.delta", (event) => { /* Function args streaming */ });
ws.on("response.function_call_arguments.done", (event) => { /* Function args complete */ });

// Error events
ws.on("error", (error) => { /* WebSocket or API error */ });
ws.on("close", (event) => { /* Connection closed */ });

Sending Commands

Both clients use the same .send() method for sending commands:

// Append audio to input buffer
ws.send({
  type: "input_audio_buffer.append",
  audio: base64AudioString,
});

// Commit audio buffer (triggers VAD or manual processing)
ws.send({
  type: "input_audio_buffer.commit",
});

// Clear audio buffer
ws.send({
  type: "input_audio_buffer.clear",
});

// Update session configuration
ws.send({
  type: "session.update",
  session: {
    instructions: "You are a helpful assistant.",
    turn_detection: { type: "server_vad" },
  },
});

// Create conversation item (text message)
ws.send({
  type: "conversation.item.create",
  item: {
    type: "message",
    role: "user",
    content: [{ type: "input_text", text: "Hello!" }],
  },
});

// Trigger response generation
ws.send({
  type: "response.create",
  response: {
    modalities: ["text", "audio"],
    instructions: "Respond briefly.",
  },
});

// Cancel in-progress response
ws.send({
  type: "response.cancel",
});

Connection Lifecycle

Both clients handle connection lifecycle automatically:

const ws = new OpenAIRealtimeWS({ model: "gpt-realtime" }, client);

// Connection opens automatically
ws.on("session.created", (event) => {
  console.log("Connected and ready");
});

// Handle disconnections
ws.on("close", (event) => {
  console.log("Connection closed:", event.code, event.reason);
});

// Handle errors
ws.on("error", (error) => {
  console.error("Connection error:", error);
});

// Manually close connection
ws.close();

Basic Usage

Creating a Session Token

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Create an ephemeral session token for client-side use
const response = await client.realtime.clientSecrets.create({
  session: {
    type: "realtime",
    model: "gpt-realtime",
    audio: {
      input: {
        format: { type: "audio/pcm", rate: 24000 },
        turn_detection: {
          type: "server_vad",
          threshold: 0.5,
          silence_duration_ms: 500,
        },
      },
      output: {
        format: { type: "audio/pcm", rate: 24000 },
        voice: "marin",
      },
    },
  },
});

const sessionToken = response.value;

Connecting via WebSocket

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

const ws = new OpenAIRealtimeWebSocket(
  {
    model: "gpt-realtime",
    dangerouslyAllowBrowser: false,
  },
  client
);

// Listen for events
ws.on("session.created", (event) => {
  console.log("Session created:", event);
});

ws.on("conversation.item.created", (event) => {
  console.log("Item created:", event.item);
});

ws.on("response.audio.delta", (event) => {
  // Handle audio delta
  const audioData = Buffer.from(event.delta, "base64");
  playAudio(audioData);
});

// Send audio
ws.send({
  type: "input_audio_buffer.append",
  audio: audioBase64String,
});

// Commit audio buffer
ws.send({
  type: "input_audio_buffer.commit",
});

Architecture

The Realtime API operates through a WebSocket connection with an event-driven architecture:

Session Management: Create ephemeral tokens server-side, connect from client
Audio Streaming: Bidirectional PCM16/G.711 audio at 24kHz
Event System: 50+ client-to-server and server-to-client events
VAD Integration: Server-side voice activity detection with configurable parameters
Conversation Context: Automatic conversation history management
Function Calling: Real-time tool execution during conversations
Phone Integration: SIP/WebRTC support for phone calls

Capabilities

Session Token Creation

Generate ephemeral session tokens for secure client-side WebSocket connections.

/**
 * Create a Realtime client secret with an associated session configuration.
 * Returns an ephemeral token with 1-minute default TTL (configurable up to 2 hours).
 */
function create(
  params: ClientSecretCreateParams
): Promise<ClientSecretCreateResponse>;

interface ClientSecretCreateParams {
  /** Configuration for the client secret expiration */
  expires_after?: {
    /** Anchor point for expiration (only 'created_at' is supported) */
    anchor?: "created_at";
    /** Seconds from anchor to expiration (10-7200, defaults to 600) */
    seconds?: number;
  };
  /** Session configuration (realtime or transcription session) */
  session?:
    | RealtimeSessionCreateRequest
    | RealtimeTranscriptionSessionCreateRequest;
}

interface ClientSecretCreateResponse {
  /** Expiration timestamp in seconds since epoch */
  expires_at: number;
  /** The session configuration */
  session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse;
  /** The generated client secret value */
  value: string;
}

interface RealtimeSessionCreateResponse {
  /** Ephemeral key for client environments */
  client_secret: {
    expires_at: number;
    value: string;
  };
  /** Session type: always 'realtime' */
  type: "realtime";
  /** Audio configuration */
  audio?: {
    input?: {
      format?: RealtimeAudioFormats;
      noise_reduction?: { type?: NoiseReductionType };
      transcription?: AudioTranscription;
      turn_detection?: ServerVad | SemanticVad | null;
    };
    output?: {
      format?: RealtimeAudioFormats;
      speed?: number;
      voice?: string;
    };
  };
  /** Fields to include in server outputs */
  include?: Array<"item.input_audio_transcription.logprobs">;
  /** System instructions for the model */
  instructions?: string;
  /** Max output tokens (1-4096 or 'inf') */
  max_output_tokens?: number | "inf";
  /** Realtime model to use */
  model?: string;
  /** Output modalities ('text' | 'audio') */
  output_modalities?: Array<"text" | "audio">;
  /** Prompt template reference */
  prompt?: ResponsePrompt | null;
  /** Tool choice configuration */
  tool_choice?: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp;
  /** Available tools */
  tools?: Array<RealtimeFunctionTool | McpTool>;
  /** Tracing configuration */
  tracing?: "auto" | TracingConfiguration | null;
  /** Truncation behavior */
  truncation?: RealtimeTruncation;
}

Session Token Creation

SIP Call Management

Manage incoming and outgoing SIP/WebRTC calls with the Realtime API.

/**
 * Accept an incoming SIP call and configure the realtime session that will handle it
 */
function accept(
  callID: string,
  params: CallAcceptParams,
  options?: RequestOptions
): Promise<void>;

/**
 * End an active Realtime API call, whether it was initiated over SIP or WebRTC
 */
function hangup(
  callID: string,
  options?: RequestOptions
): Promise<void>;

/**
 * Transfer an active SIP call to a new destination using the SIP REFER verb
 */
function refer(
  callID: string,
  params: CallReferParams,
  options?: RequestOptions
): Promise<void>;

/**
 * Decline an incoming SIP call by returning a SIP status code to the caller
 */
function reject(
  callID: string,
  params?: CallRejectParams,
  options?: RequestOptions
): Promise<void>;

interface CallAcceptParams {
  /** The type of session to create. Always 'realtime' for the Realtime API */
  type: "realtime";
  /** Configuration for input and output audio */
  audio?: RealtimeAudioConfig;
  /** Additional fields to include in server outputs */
  include?: Array<"item.input_audio_transcription.logprobs">;
  /** The default system instructions prepended to model calls */
  instructions?: string;
  /** Maximum number of output tokens for a single assistant response (1-4096 or 'inf') */
  max_output_tokens?: number | "inf";
  /** The Realtime model used for this session */
  model?: string;
  /** The set of modalities the model can respond with */
  output_modalities?: Array<"text" | "audio">;
  /** Reference to a prompt template and its variables */
  prompt?: ResponsePrompt | null;
  /** How the model chooses tools */
  tool_choice?: RealtimeToolChoiceConfig;
  /** Tools available to the model */
  tools?: RealtimeToolsConfig;
  /** Tracing configuration for the session */
  tracing?: RealtimeTracingConfig | null;
  /** Truncation behavior when conversation exceeds token limits */
  truncation?: RealtimeTruncation;
}

interface CallReferParams {
  /** URI that should appear in the SIP Refer-To header (e.g., 'tel:+14155550123' or 'sip:agent@example.com') */
  target_uri: string;
}

interface CallRejectParams {
  /** SIP response code to send back to the caller. Defaults to 603 (Decline) when omitted */
  status_code?: number;
}

Available at: client.realtime.calls

Usage Example:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Accept incoming call
await client.realtime.calls.accept("call-123", {
  type: "realtime",
  model: "gpt-realtime",
  audio: {
    input: { format: { type: "audio/pcm", rate: 24000 } },
    output: { format: { type: "audio/pcm", rate: 24000 }, voice: "marin" },
  },
  instructions: "You are a helpful phone assistant.",
});

// Hang up call
await client.realtime.calls.hangup("call-123");

// Reject incoming call
await client.realtime.calls.reject("call-123", {
  status_code: 603, // Decline
});

// Transfer call
await client.realtime.calls.refer("call-123", {
  target_uri: "tel:+14155550123",
});

WebSocket Connection

Connect to the Realtime API using WebSocket with the OpenAIRealtimeWebSocket class.

/**
 * WebSocket client for the Realtime API. Handles connection lifecycle,
 * event streaming, and message sending.
 */
class OpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {
  url: URL;
  socket: WebSocket;

  constructor(
    props: {
      model: string;
      dangerouslyAllowBrowser?: boolean;
      onURL?: (url: URL) => void;
      __resolvedApiKey?: boolean;
    },
    client?: Pick<OpenAI, "apiKey" | "baseURL">
  );

  /**
   * Factory method that resolves API key before connecting
   */
  static create(
    client: Pick<OpenAI, "apiKey" | "baseURL" | "_callApiKey">,
    props: { model: string; dangerouslyAllowBrowser?: boolean }
  ): Promise<OpenAIRealtimeWebSocket>;

  /**
   * Factory method for Azure OpenAI connections
   */
  static azure(
    client: Pick<
      AzureOpenAI,
      "_callApiKey" | "apiVersion" | "apiKey" | "baseURL" | "deploymentName"
    >,
    options?: {
      deploymentName?: string;
      dangerouslyAllowBrowser?: boolean;
    }
  ): Promise<OpenAIRealtimeWebSocket>;

  /**
   * Send a client event to the server
   */
  send(event: RealtimeClientEvent): void;

  /**
   * Close the WebSocket connection
   */
  close(props?: { code: number; reason: string }): void;

  /**
   * Register event listener
   */
  on(event: string, listener: (event: any) => void): void;
}

Usage:

// Standard connection
const ws = await OpenAIRealtimeWebSocket.create(client, {
  model: "gpt-realtime",
});

// Azure connection
const wsAzure = await OpenAIRealtimeWebSocket.azure(azureClient, {
  deploymentName: "my-realtime-deployment",
});

WebSocket Connection

Phone Call Methods

Accept, reject, transfer, and hang up phone calls via SIP integration.

/**
 * Accept an incoming SIP call and configure the realtime session
 */
function accept(callID: string, params: CallAcceptParams): Promise<void>;

/**
 * End an active Realtime API call (SIP or WebRTC)
 */
function hangup(callID: string): Promise<void>;

/**
 * Transfer an active SIP call to a new destination using SIP REFER
 */
function refer(callID: string, params: CallReferParams): Promise<void>;

/**
 * Decline an incoming SIP call with a SIP status code
 */
function reject(
  callID: string,
  params?: CallRejectParams
): Promise<void>;

interface CallAcceptParams {
  type: "realtime";
  audio?: RealtimeAudioConfig;
  include?: Array<"item.input_audio_transcription.logprobs">;
  instructions?: string;
  max_output_tokens?: number | "inf";
  model?: string;
  output_modalities?: Array<"text" | "audio">;
  prompt?: ResponsePrompt | null;
  tool_choice?: RealtimeToolChoiceConfig;
  tools?: RealtimeToolsConfig;
  tracing?: RealtimeTracingConfig | null;
  truncation?: RealtimeTruncation;
}

interface CallReferParams {
  /** URI in SIP Refer-To header (e.g., 'tel:+14155550123') */
  target_uri: string;
}

interface CallRejectParams {
  /** SIP response code (defaults to 603 Decline) */
  status_code?: number;
}

Usage:

// Accept incoming call
await client.realtime.calls.accept("call_abc123", {
  type: "realtime",
  model: "gpt-realtime",
  instructions: "You are a helpful assistant on a phone call.",
  audio: {
    output: { voice: "marin" },
  },
});

// Transfer call
await client.realtime.calls.refer("call_abc123", {
  target_uri: "tel:+14155550199",
});

// Reject call
await client.realtime.calls.reject("call_abc123", {
  status_code: 486, // Busy Here
});

// Hang up
await client.realtime.calls.hangup("call_abc123");

Phone Call Methods

Session Configuration

Configure session parameters including audio formats, VAD, and model settings.

interface RealtimeSession {
  id?: string;
  expires_at?: number;
  /** Fields to include in server outputs */
  include?: Array<"item.input_audio_transcription.logprobs"> | null;
  /** Input audio format: 'pcm16', 'g711_ulaw', or 'g711_alaw' */
  input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  /** Noise reduction configuration */
  input_audio_noise_reduction?: {
    type?: NoiseReductionType;
  };
  /** Transcription configuration */
  input_audio_transcription?: AudioTranscription | null;
  /** System instructions */
  instructions?: string;
  /** Max output tokens per response */
  max_response_output_tokens?: number | "inf";
  /** Response modalities */
  modalities?: Array<"text" | "audio">;
  /** Model identifier */
  model?: string;
  object?: "realtime.session";
  /** Output audio format */
  output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  /** Prompt template reference */
  prompt?: ResponsePrompt | null;
  /** Audio playback speed (0.25-1.5) */
  speed?: number;
  /** Sampling temperature (0.6-1.2) */
  temperature?: number;
  /** Tool choice mode */
  tool_choice?: string;
  /** Available tools */
  tools?: Array<RealtimeFunctionTool>;
  /** Tracing configuration */
  tracing?: "auto" | TracingConfiguration | null;
  /** Turn detection configuration */
  turn_detection?: RealtimeAudioInputTurnDetection | null;
  /** Truncation behavior */
  truncation?: RealtimeTruncation;
  /** Output voice */
  voice?: string;
}

interface AudioTranscription {
  /** Language code (ISO-639-1, e.g., 'en') */
  language?: string;
  /** Transcription model */
  model?:
    | "whisper-1"
    | "gpt-4o-mini-transcribe"
    | "gpt-4o-transcribe"
    | "gpt-4o-transcribe-diarize";
  /** Transcription guidance prompt */
  prompt?: string;
}

type NoiseReductionType = "near_field" | "far_field";

type RealtimeTruncation =
  | "auto"
  | "disabled"
  | {
      type: "retention_ratio";
      /** Fraction of max context to retain (0.0-1.0) */
      retention_ratio: number;
    };

Session Configuration

Turn Detection (VAD)

Configure voice activity detection for automatic turn taking.

/**
 * Server VAD: Simple volume-based voice activity detection
 */
interface ServerVad {
  type: "server_vad";
  /** Auto-generate response on VAD stop */
  create_response?: boolean;
  /** Timeout for prompting user to continue (ms) */
  idle_timeout_ms?: number | null;
  /** Auto-interrupt on VAD start */
  interrupt_response?: boolean;
  /** Audio prefix padding (ms, default: 300) */
  prefix_padding_ms?: number;
  /** Silence duration to detect stop (ms, default: 500) */
  silence_duration_ms?: number;
  /** VAD activation threshold (0.0-1.0, default: 0.5) */
  threshold?: number;
}

/**
 * Semantic VAD: Model-based turn detection with dynamic timeouts
 */
interface SemanticVad {
  type: "semantic_vad";
  /** Auto-generate response on VAD stop */
  create_response?: boolean;
  /** Eagerness: 'low' (8s), 'medium' (4s), 'high' (2s), 'auto' */
  eagerness?: "low" | "medium" | "high" | "auto";
  /** Auto-interrupt on VAD start */
  interrupt_response?: boolean;
}

type RealtimeAudioInputTurnDetection = ServerVad | SemanticVad;

Usage:

// Server VAD with custom settings
{
  type: "server_vad",
  threshold: 0.6,
  silence_duration_ms: 700,
  prefix_padding_ms: 300,
  interrupt_response: true,
  create_response: true,
  idle_timeout_ms: 30000
}

// Semantic VAD for natural conversations
{
  type: "semantic_vad",
  eagerness: "medium",
  interrupt_response: true,
  create_response: true
}

// Manual turn detection (no VAD)
{
  turn_detection: null
}

Turn Detection

Audio Formats

Configure input and output audio formats for the session.

/**
 * PCM 16-bit audio at 24kHz sample rate
 */
interface AudioPCM {
  type?: "audio/pcm";
  rate?: 24000;
}

/**
 * G.711 μ-law format (commonly used in telephony)
 */
interface AudioPCMU {
  type?: "audio/pcmu";
}

/**
 * G.711 A-law format (commonly used in telephony)
 */
interface AudioPCMA {
  type?: "audio/pcma";
}

type RealtimeAudioFormats = AudioPCM | AudioPCMU | AudioPCMA;

interface RealtimeAudioConfig {
  input?: {
    format?: RealtimeAudioFormats;
    noise_reduction?: { type?: NoiseReductionType };
    transcription?: AudioTranscription;
    turn_detection?: RealtimeAudioInputTurnDetection | null;
  };
  output?: {
    format?: RealtimeAudioFormats;
    /** Playback speed multiplier (0.25-1.5) */
    speed?: number;
    /** Voice selection */
    voice?:
      | string
      | "alloy"
      | "ash"
      | "ballad"
      | "coral"
      | "echo"
      | "sage"
      | "shimmer"
      | "verse"
      | "marin"
      | "cedar";
  };
}

Audio Formats

Client-to-Server Events

Events sent from client to server to control the conversation.

/**
 * Union of all client events
 */
type RealtimeClientEvent =
  | ConversationItemCreateEvent
  | ConversationItemDeleteEvent
  | ConversationItemRetrieveEvent
  | ConversationItemTruncateEvent
  | InputAudioBufferAppendEvent
  | InputAudioBufferClearEvent
  | OutputAudioBufferClearEvent
  | InputAudioBufferCommitEvent
  | ResponseCancelEvent
  | ResponseCreateEvent
  | SessionUpdateEvent;

/**
 * Add conversation item (message, function call, or output)
 */
interface ConversationItemCreateEvent {
  type: "conversation.item.create";
  item: ConversationItem;
  event_id?: string;
  /** Insert after this item ID ('root' for beginning) */
  previous_item_id?: string;
}

/**
 * Delete conversation item by ID
 */
interface ConversationItemDeleteEvent {
  type: "conversation.item.delete";
  item_id: string;
  event_id?: string;
}

/**
 * Retrieve full item including audio data
 */
interface ConversationItemRetrieveEvent {
  type: "conversation.item.retrieve";
  item_id: string;
  event_id?: string;
}

/**
 * Truncate assistant audio message
 */
interface ConversationItemTruncateEvent {
  type: "conversation.item.truncate";
  item_id: string;
  content_index: number;
  /** Duration to keep in milliseconds */
  audio_end_ms: number;
  event_id?: string;
}

/**
 * Append audio to input buffer
 */
interface InputAudioBufferAppendEvent {
  type: "input_audio_buffer.append";
  /** Base64-encoded audio bytes */
  audio: string;
  event_id?: string;
}

/**
 * Clear input audio buffer
 */
interface InputAudioBufferClearEvent {
  type: "input_audio_buffer.clear";
  event_id?: string;
}

/**
 * Commit input audio buffer to conversation
 */
interface InputAudioBufferCommitEvent {
  type: "input_audio_buffer.commit";
  event_id?: string;
}

/**
 * WebRTC only: Clear output audio buffer
 */
interface OutputAudioBufferClearEvent {
  type: "output_audio_buffer.clear";
  event_id?: string;
}

/**
 * Cancel in-progress response
 */
interface ResponseCancelEvent {
  type: "response.cancel";
  event_id?: string;
}

/**
 * Request model response
 */
interface ResponseCreateEvent {
  type: "response.create";
  response?: {
    modalities?: Array<"text" | "audio">;
    instructions?: string;
    voice?: string;
    output_audio_format?: string;
    tools?: Array<RealtimeFunctionTool>;
    tool_choice?: string;
    temperature?: number;
    max_output_tokens?: number | "inf";
    conversation?: "auto" | "none";
    metadata?: Record<string, string>;
    input?: Array<ConversationItemWithReference>;
  };
  event_id?: string;
}

/**
 * Update session configuration
 */
interface SessionUpdateEvent {
  type: "session.update";
  session: Partial<RealtimeSession>;
  event_id?: string;
}

Client Events

Server-to-Client Events

Events sent from server to client during the conversation.

/**
 * Union of all server events (50+ event types)
 */
type RealtimeServerEvent =
  | ConversationCreatedEvent
  | ConversationItemCreatedEvent
  | ConversationItemDeletedEvent
  | ConversationItemAdded
  | ConversationItemDone
  | ConversationItemRetrieved
  | ConversationItemTruncatedEvent
  | ConversationItemInputAudioTranscriptionCompletedEvent
  | ConversationItemInputAudioTranscriptionDeltaEvent
  | ConversationItemInputAudioTranscriptionFailedEvent
  | ConversationItemInputAudioTranscriptionSegment
  | InputAudioBufferClearedEvent
  | InputAudioBufferCommittedEvent
  | InputAudioBufferSpeechStartedEvent
  | InputAudioBufferSpeechStoppedEvent
  | InputAudioBufferTimeoutTriggered
  | OutputAudioBufferStarted
  | OutputAudioBufferStopped
  | OutputAudioBufferCleared
  | ResponseCreatedEvent
  | ResponseDoneEvent
  | ResponseOutputItemAddedEvent
  | ResponseOutputItemDoneEvent
  | ResponseContentPartAddedEvent
  | ResponseContentPartDoneEvent
  | ResponseAudioDeltaEvent
  | ResponseAudioDoneEvent
  | ResponseAudioTranscriptDeltaEvent
  | ResponseAudioTranscriptDoneEvent
  | ResponseTextDeltaEvent
  | ResponseTextDoneEvent
  | ResponseFunctionCallArgumentsDeltaEvent
  | ResponseFunctionCallArgumentsDoneEvent
  | ResponseMcpCallArgumentsDelta
  | ResponseMcpCallArgumentsDone
  | ResponseMcpCallInProgress
  | ResponseMcpCallCompleted
  | ResponseMcpCallFailed
  | McpListToolsInProgress
  | McpListToolsCompleted
  | McpListToolsFailed
  | SessionCreatedEvent
  | SessionUpdatedEvent
  | RateLimitsUpdatedEvent
  | RealtimeErrorEvent;

/**
 * Session created (first event after connection)
 */
interface SessionCreatedEvent {
  type: "session.created";
  event_id: string;
  session: RealtimeSession;
}

/**
 * Session updated after client session.update
 */
interface SessionUpdatedEvent {
  type: "session.updated";
  event_id: string;
  session: RealtimeSession;
}

/**
 * Conversation created
 */
interface ConversationCreatedEvent {
  type: "conversation.created";
  event_id: string;
  conversation: {
    id?: string;
    object?: "realtime.conversation";
  };
}

/**
 * Item created in conversation
 */
interface ConversationItemCreatedEvent {
  type: "conversation.item.created";
  event_id: string;
  item: ConversationItem;
  previous_item_id?: string | null;
}

/**
 * Item added to conversation (may have partial content)
 */
interface ConversationItemAdded {
  type: "conversation.item.added";
  event_id: string;
  item: ConversationItem;
  previous_item_id?: string | null;
}

/**
 * Item finalized with complete content
 */
interface ConversationItemDone {
  type: "conversation.item.done";
  event_id: string;
  item: ConversationItem;
  previous_item_id?: string | null;
}

/**
 * Input audio buffer committed
 */
interface InputAudioBufferCommittedEvent {
  type: "input_audio_buffer.committed";
  event_id: string;
  item_id: string;
  previous_item_id?: string | null;
}

/**
 * Speech detected in input buffer (VAD start)
 */
interface InputAudioBufferSpeechStartedEvent {
  type: "input_audio_buffer.speech_started";
  event_id: string;
  item_id: string;
  /** Milliseconds from session start */
  audio_start_ms: number;
}

/**
 * Speech ended in input buffer (VAD stop)
 */
interface InputAudioBufferSpeechStoppedEvent {
  type: "input_audio_buffer.speech_stopped";
  event_id: string;
  item_id: string;
  /** Milliseconds from session start */
  audio_end_ms: number;
}

/**
 * Response started
 */
interface ResponseCreatedEvent {
  type: "response.created";
  event_id: string;
  response: RealtimeResponse;
}

/**
 * Response completed
 */
interface ResponseDoneEvent {
  type: "response.done";
  event_id: string;
  response: RealtimeResponse;
}

/**
 * Audio delta (streaming audio chunk)
 */
interface ResponseAudioDeltaEvent {
  type: "response.audio.delta";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
  /** Base64-encoded audio bytes */
  delta: string;
}

/**
 * Audio generation completed
 */
interface ResponseAudioDoneEvent {
  type: "response.audio.done";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
}

/**
 * Text delta (streaming text chunk)
 */
interface ResponseTextDeltaEvent {
  type: "response.text.delta";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
  /** Text chunk */
  delta: string;
}

/**
 * Text generation completed
 */
interface ResponseTextDoneEvent {
  type: "response.text.done";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
  /** Complete text */
  text: string;
}

/**
 * Function call arguments delta
 */
interface ResponseFunctionCallArgumentsDeltaEvent {
  type: "response.function_call_arguments.delta";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  call_id: string;
  /** JSON arguments chunk */
  delta: string;
}

/**
 * Function call arguments completed
 */
interface ResponseFunctionCallArgumentsDoneEvent {
  type: "response.function_call_arguments.done";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  call_id: string;
  /** Complete JSON arguments */
  arguments: string;
}

/**
 * Error occurred
 */
interface RealtimeErrorEvent {
  type: "error";
  event_id: string;
  error: {
    type: string;
    code?: string | null;
    message: string;
    param?: string | null;
    event_id?: string | null;
  };
}

Server Events

Conversation Items

Items that make up the conversation history.

/**
 * Union of all conversation item types
 */
type ConversationItem =
  | RealtimeConversationItemSystemMessage
  | RealtimeConversationItemUserMessage
  | RealtimeConversationItemAssistantMessage
  | RealtimeConversationItemFunctionCall
  | RealtimeConversationItemFunctionCallOutput
  | RealtimeMcpApprovalResponse
  | RealtimeMcpListTools
  | RealtimeMcpToolCall
  | RealtimeMcpApprovalRequest;

/**
 * System message item
 */
interface RealtimeConversationItemSystemMessage {
  type: "message";
  role: "system";
  content: Array<{
    type?: "input_text";
    text?: string;
  }>;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * User message item (text, audio, or image)
 */
interface RealtimeConversationItemUserMessage {
  type: "message";
  role: "user";
  content: Array<{
    type?: "input_text" | "input_audio" | "input_image";
    text?: string;
    audio?: string; // Base64-encoded
    transcript?: string;
    image_url?: string; // Data URI
    detail?: "auto" | "low" | "high";
  }>;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * Assistant message item (text or audio)
 */
interface RealtimeConversationItemAssistantMessage {
  type: "message";
  role: "assistant";
  content: Array<{
    type?: "output_text" | "output_audio";
    text?: string;
    audio?: string; // Base64-encoded
    transcript?: string;
  }>;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * Function call item
 */
interface RealtimeConversationItemFunctionCall {
  type: "function_call";
  name: string;
  /** JSON-encoded arguments */
  arguments: string;
  call_id?: string;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * Function call output item
 */
interface RealtimeConversationItemFunctionCallOutput {
  type: "function_call_output";
  call_id: string;
  /** Function output (free text) */
  output: string;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * MCP tool call item
 */
interface RealtimeMcpToolCall {
  type: "mcp_call";
  id: string;
  server_label: string;
  name: string;
  arguments: string;
  output?: string | null;
  error?:
    | { type: "protocol_error"; code: number; message: string }
    | { type: "tool_execution_error"; message: string }
    | { type: "http_error"; code: number; message: string }
    | null;
  approval_request_id?: string | null;
}

/**
 * MCP approval request item
 */
interface RealtimeMcpApprovalRequest {
  type: "mcp_approval_request";
  id: string;
  server_label: string;
  name: string;
  arguments: string;
}

/**
 * MCP approval response item
 */
interface RealtimeMcpApprovalResponse {
  type: "mcp_approval_response";
  id: string;
  approval_request_id: string;
  approve: boolean;
  reason?: string | null;
}

Conversation Items

Function Calling

Define and use tools during real-time conversations.

/**
 * Function tool definition for realtime conversations
 */
interface RealtimeFunctionTool {
  type?: "function";
  /** Function name */
  name?: string;
  /** Description and usage guidance */
  description?: string;
  /** JSON Schema for function parameters */
  parameters?: unknown;
}

/**
 * MCP (Model Context Protocol) tool configuration
 */
interface McpTool {
  type: "mcp";
  /** Label identifying the MCP server */
  server_label: string;
  /** MCP server URL or connector ID */
  server_url?: string;
  connector_id?:
    | "connector_dropbox"
    | "connector_gmail"
    | "connector_googlecalendar"
    | "connector_googledrive"
    | "connector_microsoftteams"
    | "connector_outlookcalendar"
    | "connector_outlookemail"
    | "connector_sharepoint";
  /** Server description */
  server_description?: string;
  /** Allowed tools filter */
  allowed_tools?:
    | Array<string>
    | {
        tool_names?: Array<string>;
        read_only?: boolean;
      }
    | null;
  /** Approval requirements */
  require_approval?:
    | "always"
    | "never"
    | {
        always?: { tool_names?: Array<string>; read_only?: boolean };
        never?: { tool_names?: Array<string>; read_only?: boolean };
      }
    | null;
  /** OAuth access token */
  authorization?: string;
  /** HTTP headers */
  headers?: Record<string, string> | null;
}

type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;

type RealtimeToolChoiceConfig =
  | "auto"
  | "none"
  | "required"
  | { type: "function"; function: { name: string } }
  | { type: "mcp"; mcp: { server_label: string; name: string } };

Usage:

// Define tools
const tools: RealtimeToolsConfig = [
  {
    type: "function",
    name: "get_weather",
    description: "Get current weather for a location",
    parameters: {
      type: "object",
      properties: {
        location: { type: "string" },
        unit: { type: "string", enum: ["celsius", "fahrenheit"] },
      },
      required: ["location"],
    },
  },
  {
    type: "mcp",
    server_label: "calendar",
    connector_id: "connector_googlecalendar",
    allowed_tools: {
      tool_names: ["list_events", "create_event"],
    },
  },
];

// Update session with tools
ws.send({
  type: "session.update",
  session: {
    tools,
    tool_choice: "auto",
  },
});

// Handle function call
ws.on("response.function_call_arguments.done", async (event) => {
  const result = await executeFunction(event.call_id, event.arguments);

  // Send function output
  ws.send({
    type: "conversation.item.create",
    item: {
      type: "function_call_output",
      call_id: event.call_id,
      output: JSON.stringify(result),
    },
  });

  // Trigger new response
  ws.send({
    type: "response.create",
  });
});

Function Calling

Response Configuration

Configure individual response parameters.

/**
 * Response resource
 */
interface RealtimeResponse {
  id?: string;
  object?: "realtime.response";
  /** Conversation ID or null */
  conversation_id?: string;
  /** Status: 'in_progress', 'completed', 'cancelled', 'failed', 'incomplete' */
  status?: RealtimeResponseStatus;
  /** Usage statistics */
  usage?: RealtimeResponseUsage;
  /** Max output tokens */
  max_output_tokens?: number | "inf";
  /** Response modalities */
  modalities?: Array<"text" | "audio">;
  /** Instructions for this response */
  instructions?: string;
  /** Voice selection */
  voice?: string;
  /** Audio output configuration */
  audio?: {
    format?: RealtimeAudioFormats;
    speed?: number;
    voice?: string;
  };
  /** Response metadata */
  metadata?: Record<string, string> | null;
  /** Tool choice */
  tool_choice?: RealtimeToolChoiceConfig;
  /** Tools for this response */
  tools?: RealtimeToolsConfig;
  /** Temperature */
  temperature?: number;
  /** Output items */
  output?: Array<ConversationItem>;
  /** Status details */
  status_details?: {
    type?: "incomplete" | "failed" | "cancelled";
    reason?: string;
    error?: RealtimeError | null;
  } | null;
}

interface RealtimeResponseStatus {
  type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
  /** Additional status information */
  reason?: string;
}

interface RealtimeResponseUsage {
  /** Total tokens (input + output) */
  total_tokens?: number;
  /** Input tokens */
  input_tokens?: number;
  /** Output tokens */
  output_tokens?: number;
  /** Input token breakdown */
  input_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
    image_tokens?: number;
    cached_tokens?: number;
    cached_tokens_details?: {
      text_tokens?: number;
      audio_tokens?: number;
      image_tokens?: number;
    };
  };
  /** Output token breakdown */
  output_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
  };
}

Response Configuration

Transcription

Configure and receive audio transcription during conversations.

/**
 * Transcription configuration
 */
interface AudioTranscription {
  /** Language code (ISO-639-1) */
  language?: string;
  /** Transcription model */
  model?:
    | "whisper-1"
    | "gpt-4o-mini-transcribe"
    | "gpt-4o-transcribe"
    | "gpt-4o-transcribe-diarize";
  /** Guidance prompt */
  prompt?: string;
}

/**
 * Transcription completed event
 */
interface ConversationItemInputAudioTranscriptionCompletedEvent {
  type: "conversation.item.input_audio_transcription.completed";
  event_id: string;
  item_id: string;
  content_index: number;
  /** Transcribed text */
  transcript: string;
  /** Usage statistics */
  usage:
    | {
        type: "tokens";
        input_tokens: number;
        output_tokens: number;
        total_tokens: number;
        input_token_details?: {
          text_tokens?: number;
          audio_tokens?: number;
        };
      }
    | {
        type: "duration";
        /** Duration in seconds */
        seconds: number;
      };
  /** Log probabilities (if enabled) */
  logprobs?: Array<{
    token: string;
    logprob: number;
    bytes: Array<number>;
  }> | null;
}

/**
 * Transcription delta event (streaming)
 */
interface ConversationItemInputAudioTranscriptionDeltaEvent {
  type: "conversation.item.input_audio_transcription.delta";
  event_id: string;
  item_id: string;
  content_index?: number;
  /** Transcript chunk */
  delta?: string;
  /** Log probabilities (if enabled) */
  logprobs?: Array<{
    token: string;
    logprob: number;
    bytes: Array<number>;
  }> | null;
}

/**
 * Transcription segment (for diarization)
 */
interface ConversationItemInputAudioTranscriptionSegment {
  type: "conversation.item.input_audio_transcription.segment";
  event_id: string;
  item_id: string;
  content_index: number;
  id: string;
  /** Segment text */
  text: string;
  /** Speaker label */
  speaker: string;
  /** Start time in seconds */
  start: number;
  /** End time in seconds */
  end: number;
}

/**
 * Transcription failed event
 */
interface ConversationItemInputAudioTranscriptionFailedEvent {
  type: "conversation.item.input_audio_transcription.failed";
  event_id: string;
  item_id: string;
  content_index: number;
  error: {
    type?: string;
    code?: string;
    message?: string;
    param?: string;
  };
}

Usage:

// Enable transcription with log probabilities
ws.send({
  type: "session.update",
  session: {
    input_audio_transcription: {
      model: "gpt-4o-transcribe",
      language: "en",
    },
    include: ["item.input_audio_transcription.logprobs"],
  },
});

// Listen for transcription
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
  console.log("Transcript delta:", event.delta);
});

ws.on(
  "conversation.item.input_audio_transcription.completed",
  (event) => {
    console.log("Full transcript:", event.transcript);
    console.log("Usage:", event.usage);
  }
);

// Diarization support
ws.send({
  type: "session.update",
  session: {
    input_audio_transcription: {
      model: "gpt-4o-transcribe-diarize",
    },
  },
});

ws.on(
  "conversation.item.input_audio_transcription.segment",
  (event) => {
    console.log(
      `[${event.speaker}] ${event.text} (${event.start}s - ${event.end}s)`
    );
  }
);

Transcription

Error Handling

Handle errors and edge cases in real-time conversations.

/**
 * Error event from server
 */
interface RealtimeErrorEvent {
  type: "error";
  event_id: string;
  error: RealtimeError;
}

interface RealtimeError {
  /** Error type */
  type: string;
  /** Error code (optional) */
  code?: string | null;
  /** Human-readable message */
  message: string;
  /** Related parameter (optional) */
  param?: string | null;
  /** Client event ID that caused error (optional) */
  event_id?: string | null;
}

/**
 * OpenAI Realtime error class
 */
class OpenAIRealtimeError extends Error {
  constructor(message: string);
}

Common Error Types:

// Invalid request errors
{
  type: "invalid_request_error",
  code: "invalid_value",
  message: "Invalid value for 'audio_format'",
  param: "audio_format"
}

// Server errors
{
  type: "server_error",
  message: "Internal server error"
}

// Rate limit errors
{
  type: "rate_limit_error",
  message: "Rate limit exceeded"
}

Usage:

ws.on("error", (event: RealtimeErrorEvent) => {
  console.error("Realtime error:", event.error);

  if (event.error.type === "rate_limit_error") {
    // Handle rate limiting
  } else if (event.error.type === "invalid_request_error") {
    // Handle validation errors
    console.error("Invalid:", event.error.param, event.error.message);
  }
});

// WebSocket errors
ws.socket.addEventListener("error", (error) => {
  console.error("WebSocket error:", error);
});

Error Handling

Rate Limits

Monitor rate limits during conversations.

/**
 * Rate limits updated event
 */
interface RateLimitsUpdatedEvent {
  type: "rate_limits.updated";
  event_id: string;
  rate_limits: Array<{
    /** Rate limit name: 'requests' or 'tokens' */
    name?: "requests" | "tokens";
    /** Maximum allowed value */
    limit?: number;
    /** Remaining before limit reached */
    remaining?: number;
    /** Seconds until reset */
    reset_seconds?: number;
  }>;
}

Usage:

ws.on("rate_limits.updated", (event: RateLimitsUpdatedEvent) => {
  event.rate_limits.forEach((limit) => {
    console.log(`${limit.name}: ${limit.remaining}/${limit.limit}`);
    console.log(`Resets in ${limit.reset_seconds}s`);
  });
});

Rate Limits

Tracing

Configure distributed tracing for debugging and monitoring.

/**
 * Tracing configuration
 */
type RealtimeTracingConfig =
  | "auto"
  | {
      /** Workflow name in Traces Dashboard */
      workflow_name?: string;
      /** Group ID for filtering */
      group_id?: string;
      /** Arbitrary metadata */
      metadata?: unknown;
    }
  | null;

Usage:

// Auto tracing with defaults
{
  tracing: "auto";
}

// Custom tracing configuration
{
  tracing: {
    workflow_name: "customer-support-bot",
    group_id: "prod-us-west",
    metadata: {
      customer_id: "cust_123",
      agent_version: "2.1.0"
    }
  }
}

// Disable tracing
{
  tracing: null;
}

Tracing

Complete Example: Voice Assistant

import OpenAI from "openai";
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

const client = new OpenAI();

// Create session token
const secret = await client.realtime.clientSecrets.create({
  session: {
    type: "realtime",
    model: "gpt-realtime",
    audio: {
      input: {
        format: { type: "audio/pcm", rate: 24000 },
        turn_detection: {
          type: "server_vad",
          threshold: 0.5,
          silence_duration_ms: 500,
          interrupt_response: true,
        },
        transcription: {
          model: "gpt-4o-transcribe",
        },
      },
      output: {
        format: { type: "audio/pcm", rate: 24000 },
        voice: "marin",
      },
    },
    instructions:
      "You are a helpful voice assistant. Speak naturally and concisely.",
    tools: [
      {
        type: "function",
        name: "get_weather",
        description: "Get weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    ],
  },
});

// Connect WebSocket
const ws = await OpenAIRealtimeWebSocket.create(client, {
  model: "gpt-realtime",
});

// Handle session
ws.on("session.created", (event) => {
  console.log("Session created:", event.session.id);
});

// Handle conversation
ws.on("conversation.item.created", (event) => {
  console.log("Item created:", event.item.type);
});

// Handle audio output
ws.on("response.audio.delta", (event) => {
  const audioData = Buffer.from(event.delta, "base64");
  playAudio(audioData); // Play to speaker
});

// Handle transcripts
ws.on("conversation.item.input_audio_transcription.completed", (event) => {
  console.log("User said:", event.transcript);
});

ws.on("response.audio_transcript.delta", (event) => {
  process.stdout.write(event.delta);
});

// Handle VAD
ws.on("input_audio_buffer.speech_started", () => {
  console.log("User started speaking");
  stopAudioPlayback(); // Interrupt assistant
});

ws.on("input_audio_buffer.speech_stopped", () => {
  console.log("User stopped speaking");
});

// Handle function calls
ws.on("response.function_call_arguments.done", async (event) => {
  console.log("Function call:", event.call_id);

  const args = JSON.parse(event.arguments);
  const result = await getWeather(args.location);

  // Send result
  ws.send({
    type: "conversation.item.create",
    item: {
      type: "function_call_output",
      call_id: event.call_id,
      output: JSON.stringify(result),
    },
  });

  // Continue conversation
  ws.send({
    type: "response.create",
  });
});

// Handle errors
ws.on("error", (event) => {
  console.error("Error:", event.error.message);
});

// Capture and send microphone audio
const audioStream = captureMicrophone();
audioStream.on("data", (chunk) => {
  const base64 = chunk.toString("base64");
  ws.send({
    type: "input_audio_buffer.append",
    audio: base64,
  });
});

// Cleanup
process.on("SIGINT", () => {
  ws.close();
  process.exit(0);
});

Complete Example: Phone Call Handler

import OpenAI from "openai";
import express from "express";

const client = new OpenAI();
const app = express();

app.use(express.json());

// Webhook for incoming calls
app.post("/realtime/webhook/incoming_call", async (req, res) => {
  const event = req.body;

  if (event.type === "realtime.call.incoming") {
    const callId = event.data.id;

    // Accept the call
    await client.realtime.calls.accept(callId, {
      type: "realtime",
      model: "gpt-realtime",
      instructions:
        "You are a customer service agent. Be professional and helpful.",
      audio: {
        input: {
          format: { type: "audio/pcmu" }, // G.711 for telephony
          turn_detection: {
            type: "server_vad",
            silence_duration_ms: 700,
          },
        },
        output: {
          format: { type: "audio/pcmu" },
          voice: "marin",
        },
      },
      tools: [
        {
          type: "function",
          name: "transfer_to_agent",
          description: "Transfer to human agent",
          parameters: {
            type: "object",
            properties: {
              reason: { type: "string" },
            },
          },
        },
      ],
    });

    console.log(`Accepted call: ${callId}`);
  }

  res.sendStatus(200);
});

// Webhook for call events
app.post("/realtime/webhook/call_events", async (req, res) => {
  const event = req.body;

  if (event.type === "realtime.response.function_call_output.done") {
    const { call_id, function_name, arguments: args } = event.data;

    if (function_name === "transfer_to_agent") {
      // Transfer call
      await client.realtime.calls.refer(call_id, {
        target_uri: "sip:support@example.com",
      });
    }
  }

  res.sendStatus(200);
});

app.listen(3000, () => {
  console.log("Webhook server running on port 3000");
});

Type Reference

Core Types

type RealtimeClientEvent =
  | ConversationItemCreateEvent
  | ConversationItemDeleteEvent
  | ConversationItemRetrieveEvent
  | ConversationItemTruncateEvent
  | InputAudioBufferAppendEvent
  | InputAudioBufferClearEvent
  | OutputAudioBufferClearEvent
  | InputAudioBufferCommitEvent
  | ResponseCancelEvent
  | ResponseCreateEvent
  | SessionUpdateEvent;

type RealtimeServerEvent =
  | ConversationCreatedEvent
  | ConversationItemCreatedEvent
  | ConversationItemDeletedEvent
  | ConversationItemAdded
  | ConversationItemDone
  | ConversationItemRetrieved
  | ConversationItemTruncatedEvent
  | ConversationItemInputAudioTranscriptionCompletedEvent
  | ConversationItemInputAudioTranscriptionDeltaEvent
  | ConversationItemInputAudioTranscriptionFailedEvent
  | ConversationItemInputAudioTranscriptionSegment
  | InputAudioBufferClearedEvent
  | InputAudioBufferCommittedEvent
  | InputAudioBufferSpeechStartedEvent
  | InputAudioBufferSpeechStoppedEvent
  | InputAudioBufferTimeoutTriggered
  | OutputAudioBufferStarted
  | OutputAudioBufferStopped
  | OutputAudioBufferCleared
  | ResponseCreatedEvent
  | ResponseDoneEvent
  | ResponseOutputItemAddedEvent
  | ResponseOutputItemDoneEvent
  | ResponseContentPartAddedEvent
  | ResponseContentPartDoneEvent
  | ResponseAudioDeltaEvent
  | ResponseAudioDoneEvent
  | ResponseAudioTranscriptDeltaEvent
  | ResponseAudioTranscriptDoneEvent
  | ResponseTextDeltaEvent
  | ResponseTextDoneEvent
  | ResponseFunctionCallArgumentsDeltaEvent
  | ResponseFunctionCallArgumentsDoneEvent
  | ResponseMcpCallArgumentsDelta
  | ResponseMcpCallArgumentsDone
  | ResponseMcpCallInProgress
  | ResponseMcpCallCompleted
  | ResponseMcpCallFailed
  | McpListToolsInProgress
  | McpListToolsCompleted
  | McpListToolsFailed
  | SessionCreatedEvent
  | SessionUpdatedEvent
  | RateLimitsUpdatedEvent
  | RealtimeErrorEvent;

type ConversationItem =
  | RealtimeConversationItemSystemMessage
  | RealtimeConversationItemUserMessage
  | RealtimeConversationItemAssistantMessage
  | RealtimeConversationItemFunctionCall
  | RealtimeConversationItemFunctionCallOutput
  | RealtimeMcpApprovalResponse
  | RealtimeMcpListTools
  | RealtimeMcpToolCall
  | RealtimeMcpApprovalRequest;

interface RealtimeSession {
  id?: string;
  object?: "realtime.session";
  model?: string;
  expires_at?: number;
  modalities?: Array<"text" | "audio">;
  instructions?: string;
  voice?: string;
  input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  input_audio_transcription?: AudioTranscription | null;
  turn_detection?: RealtimeAudioInputTurnDetection | null;
  tools?: Array<RealtimeFunctionTool>;
  tool_choice?: string;
  temperature?: number;
  max_response_output_tokens?: number | "inf";
  speed?: number;
  input_audio_noise_reduction?: {
    type?: NoiseReductionType;
  };
  include?: Array<"item.input_audio_transcription.logprobs"> | null;
  prompt?: ResponsePrompt | null;
  tracing?: RealtimeTracingConfig | null;
  truncation?: RealtimeTruncation;
}

interface RealtimeResponse {
  id?: string;
  object?: "realtime.response";
  status?: RealtimeResponseStatus;
  conversation_id?: string;
  output?: Array<ConversationItem>;
  usage?: RealtimeResponseUsage;
  status_details?: {
    type?: "incomplete" | "failed" | "cancelled";
    reason?: string;
    error?: RealtimeError | null;
  } | null;
  max_output_tokens?: number | "inf";
  modalities?: Array<"text" | "audio">;
  instructions?: string;
  voice?: string;
  audio?: {
    format?: RealtimeAudioFormats;
    speed?: number;
    voice?: string;
  };
  metadata?: Record<string, string> | null;
  tool_choice?: RealtimeToolChoiceConfig;
  tools?: RealtimeToolsConfig;
  temperature?: number;
}

interface AudioTranscription {
  language?: string;
  model?:
    | "whisper-1"
    | "gpt-4o-mini-transcribe"
    | "gpt-4o-transcribe"
    | "gpt-4o-transcribe-diarize";
  prompt?: string;
}

type RealtimeAudioFormats =
  | { type?: "audio/pcm"; rate?: 24000 }
  | { type?: "audio/pcmu" }
  | { type?: "audio/pcma" };

type NoiseReductionType = "near_field" | "far_field";

type RealtimeAudioInputTurnDetection =
  | {
      type: "server_vad";
      threshold?: number;
      prefix_padding_ms?: number;
      silence_duration_ms?: number;
      create_response?: boolean;
      interrupt_response?: boolean;
      idle_timeout_ms?: number | null;
    }
  | {
      type: "semantic_vad";
      eagerness?: "low" | "medium" | "high" | "auto";
      create_response?: boolean;
      interrupt_response?: boolean;
    };

type RealtimeTruncation =
  | "auto"
  | "disabled"
  | { type: "retention_ratio"; retention_ratio: number };

type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;

type RealtimeToolChoiceConfig =
  | "auto"
  | "none"
  | "required"
  | { type: "function"; function: { name: string } }
  | { type: "mcp"; mcp: { server_label: string; name: string } };

type RealtimeTracingConfig =
  | "auto"
  | {
      workflow_name?: string;
      group_id?: string;
      metadata?: unknown;
    }
  | null;

interface RealtimeError {
  type: string;
  code?: string | null;
  message: string;
  param?: string | null;
  event_id?: string | null;
}

interface RealtimeResponseUsage {
  total_tokens?: number;
  input_tokens?: number;
  output_tokens?: number;
  input_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
    image_tokens?: number;
    cached_tokens?: number;
    cached_tokens_details?: {
      text_tokens?: number;
      audio_tokens?: number;
      image_tokens?: number;
    };
  };
  output_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
  };
}

interface RealtimeResponseStatus {
  type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
  reason?: string;
}

Models

Available Realtime API models:

gpt-realtime (latest)
gpt-realtime-2025-08-28
gpt-4o-realtime-preview
gpt-4o-realtime-preview-2024-10-01
gpt-4o-realtime-preview-2024-12-17
gpt-4o-realtime-preview-2025-06-03
gpt-4o-mini-realtime-preview
gpt-4o-mini-realtime-preview-2024-12-17
gpt-realtime-mini
gpt-realtime-mini-2025-10-06
gpt-audio-mini
gpt-audio-mini-2025-10-06

Best Practices

Security

Never expose API keys in browser: Always use ephemeral session tokens
Token expiration: Default 10 minutes, max 2 hours
Server-side validation: Validate all tool calls server-side
Rate limiting: Monitor rate limit events and handle gracefully

Performance

Audio chunking: Send audio in chunks (1-5 seconds recommended)
VAD tuning: Adjust threshold and silence duration for your environment
Voice selection: Use marin or cedar for best quality
Caching: Enable context caching for repeated conversations

Audio Quality

Noise reduction: Enable for far-field or noisy environments
Sample rate: Always use 24kHz for PCM audio
Format selection: Use G.711 (pcmu/pcma) for telephony, PCM for quality
Interrupt handling: Clear audio buffers on interruption

Conversation Management

Context length: Monitor token usage, configure truncation
Function calling: Keep tool outputs concise
System messages: Use for mid-conversation context updates
Item ordering: Use previous_item_id for precise insertion

Error Handling

Graceful degradation: Handle WebSocket disconnections
Retry logic: Implement exponential backoff for transient errors
Error logging: Log all error events for debugging
User feedback: Provide clear feedback on connection/processing status

Common Patterns

Voice-to-Voice Assistant

const ws = await OpenAIRealtimeWebSocket.create(client, {
  model: "gpt-realtime",
});

// Microphone → Input Buffer
micStream.on("data", (chunk) => {
  ws.send({
    type: "input_audio_buffer.append",
    audio: chunk.toString("base64"),
  });
});

// Output Audio → Speaker
ws.on("response.audio.delta", (event) => {
  playAudio(Buffer.from(event.delta, "base64"));
});

// VAD-based interruption
ws.on("input_audio_buffer.speech_started", () => {
  stopPlayback();
});

Text-to-Voice Assistant

// Send text message
ws.send({
  type: "conversation.item.create",
  item: {
    type: "message",
    role: "user",
    content: [{ type: "input_text", text: "Hello!" }],
  },
});

// Request audio response
ws.send({
  type: "response.create",
  response: {
    modalities: ["audio"],
  },
});

Streaming Transcripts

ws.on("response.audio_transcript.delta", (event) => {
  updateSubtitles(event.delta);
});

ws.on("conversation.item.input_audio_transcription.delta", (event) => {
  updateUserTranscript(event.delta);
});

Multi-Tool Assistant

const tools = [
  {
    type: "function",
    name: "search_database",
    description: "Search customer database",
    parameters: {
      /* ... */
    },
  },
  {
    type: "mcp",
    server_label: "calendar",
    connector_id: "connector_googlecalendar",
  },
];

ws.send({
  type: "session.update",
  session: { tools, tool_choice: "auto" },
});

Install with Tessl CLI