CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-openai

The official TypeScript library for the OpenAI API

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

realtime.mddocs/

Realtime API

The Realtime API provides WebSocket-based real-time voice conversations with OpenAI models. It supports bidirectional audio streaming, server-side voice activity detection (VAD), function calling, and full conversation management. The API is designed for live voice applications including phone calls, voice assistants, and interactive conversational experiences.

Package Information

  • Package Name: openai
  • Package Type: npm
  • Language: TypeScript
  • Installation: npm install openai

API Status

The Realtime API is now generally available (GA) at client.realtime.*.

Deprecation Notice: The legacy beta Realtime API at client.beta.realtime.* is deprecated. If you are using the beta API, migrate to the GA API documented here. The beta API includes:

  • client.beta.realtime.sessions.create() (deprecated - use client.realtime.clientSecrets.create() instead)
  • client.beta.realtime.transcriptionSessions.create() (deprecated)

All new projects should use the GA Realtime API (client.realtime.*) documented on this page.

Core Imports

import OpenAI from "openai";
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket"; // Browser
import { OpenAIRealtimeWS } from "openai/realtime/ws"; // Node.js (requires 'ws' package)

WebSocket Clients

The Realtime API provides two WebSocket client implementations for different runtime environments:

OpenAIRealtimeWebSocket (Browser)

For browser environments, use OpenAIRealtimeWebSocket which uses the native browser WebSocket API.

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ws = new OpenAIRealtimeWebSocket(
  {
    model: "gpt-realtime",
    dangerouslyAllowBrowser: true, // Required for browser use
  },
  client
);

// Event handling
ws.on("session.created", (event) => {
  console.log("Session started:", event.session.id);
});

ws.on("response.audio.delta", (event) => {
  // Handle audio deltas - event.delta is base64 encoded audio
  const audioData = atob(event.delta);
  playAudio(audioData);
});

ws.on("error", (error) => {
  console.error("WebSocket error:", error);
});

// Send audio to the server
function sendAudio(audioData: ArrayBuffer) {
  const base64Audio = btoa(String.fromCharCode(...new Uint8Array(audioData)));
  ws.send({
    type: "input_audio_buffer.append",
    audio: base64Audio,
  });
}

// Commit audio buffer to trigger processing
ws.send({
  type: "input_audio_buffer.commit",
});

// Close connection
ws.close();

Key features:

  • Uses native browser WebSocket API
  • Requires dangerouslyAllowBrowser: true in configuration
  • Audio must be base64 encoded
  • Automatic reconnection handling
  • Built-in event emitter for all realtime events

OpenAIRealtimeWS (Node.js)

For Node.js environments, use OpenAIRealtimeWS which uses the ws package for WebSocket support.

import { OpenAIRealtimeWS } from "openai/realtime/ws";
import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ws = new OpenAIRealtimeWS(
  {
    model: "gpt-realtime",
  },
  client
);

// Event handling (same interface as browser version)
ws.on("session.created", (event) => {
  console.log("Session started:", event.session.id);
});

ws.on("response.audio.delta", (event) => {
  // Handle audio deltas
  const audioBuffer = Buffer.from(event.delta, "base64");
  // Write to file or stream to audio output
  fs.appendFileSync("output.pcm", audioBuffer);
});

ws.on("response.done", (event) => {
  console.log("Response complete:", event.response.id);
});

// Send audio from file or buffer
function sendAudioFromFile(filePath: string) {
  const audioBuffer = fs.readFileSync(filePath);
  const base64Audio = audioBuffer.toString("base64");

  ws.send({
    type: "input_audio_buffer.append",
    audio: base64Audio,
  });
}

// Trigger response generation
ws.send({
  type: "input_audio_buffer.commit",
});

// Close connection
ws.close();

Key features:

  • Uses ws package for WebSocket support (add to dependencies: npm install ws @types/ws)
  • Same event interface as browser version for consistency
  • Better Node.js stream integration
  • Automatic reconnection handling
  • Suitable for server-side applications

Common Event Patterns

Both WebSocket clients support the same event handling interface:

// Connection events
ws.on("session.created", (event) => { /* Session initialization */ });
ws.on("session.updated", (event) => { /* Session configuration changed */ });

// Conversation events
ws.on("conversation.created", (event) => { /* New conversation */ });
ws.on("conversation.item.created", (event) => { /* New item added */ });
ws.on("conversation.item.deleted", (event) => { /* Item removed */ });

// Audio events (streaming)
ws.on("response.audio.delta", (event) => { /* Audio chunk received */ });
ws.on("response.audio.done", (event) => { /* Audio complete */ });
ws.on("response.audio_transcript.delta", (event) => { /* Transcript chunk */ });
ws.on("response.audio_transcript.done", (event) => { /* Transcript complete */ });

// Response events
ws.on("response.created", (event) => { /* Response started */ });
ws.on("response.done", (event) => { /* Response complete */ });
ws.on("response.cancelled", (event) => { /* Response cancelled */ });
ws.on("response.failed", (event) => { /* Response failed */ });

// Function calling events
ws.on("response.function_call_arguments.delta", (event) => { /* Function args streaming */ });
ws.on("response.function_call_arguments.done", (event) => { /* Function args complete */ });

// Error events
ws.on("error", (error) => { /* WebSocket or API error */ });
ws.on("close", (event) => { /* Connection closed */ });

Sending Commands

Both clients use the same .send() method for sending commands:

// Append audio to input buffer
ws.send({
  type: "input_audio_buffer.append",
  audio: base64AudioString,
});

// Commit audio buffer (triggers VAD or manual processing)
ws.send({
  type: "input_audio_buffer.commit",
});

// Clear audio buffer
ws.send({
  type: "input_audio_buffer.clear",
});

// Update session configuration
ws.send({
  type: "session.update",
  session: {
    instructions: "You are a helpful assistant.",
    turn_detection: { type: "server_vad" },
  },
});

// Create conversation item (text message)
ws.send({
  type: "conversation.item.create",
  item: {
    type: "message",
    role: "user",
    content: [{ type: "input_text", text: "Hello!" }],
  },
});

// Trigger response generation
ws.send({
  type: "response.create",
  response: {
    modalities: ["text", "audio"],
    instructions: "Respond briefly.",
  },
});

// Cancel in-progress response
ws.send({
  type: "response.cancel",
});

Connection Lifecycle

Both clients handle connection lifecycle automatically:

const ws = new OpenAIRealtimeWS({ model: "gpt-realtime" }, client);

// Connection opens automatically
ws.on("session.created", (event) => {
  console.log("Connected and ready");
});

// Handle disconnections
ws.on("close", (event) => {
  console.log("Connection closed:", event.code, event.reason);
});

// Handle errors
ws.on("error", (error) => {
  console.error("Connection error:", error);
});

// Manually close connection
ws.close();

Basic Usage

Creating a Session Token

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Create an ephemeral session token for client-side use
const response = await client.realtime.clientSecrets.create({
  session: {
    type: "realtime",
    model: "gpt-realtime",
    audio: {
      input: {
        format: { type: "audio/pcm", rate: 24000 },
        turn_detection: {
          type: "server_vad",
          threshold: 0.5,
          silence_duration_ms: 500,
        },
      },
      output: {
        format: { type: "audio/pcm", rate: 24000 },
        voice: "marin",
      },
    },
  },
});

const sessionToken = response.value;

Connecting via WebSocket

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

const ws = new OpenAIRealtimeWebSocket(
  {
    model: "gpt-realtime",
    dangerouslyAllowBrowser: false,
  },
  client
);

// Listen for events
ws.on("session.created", (event) => {
  console.log("Session created:", event);
});

ws.on("conversation.item.created", (event) => {
  console.log("Item created:", event.item);
});

ws.on("response.audio.delta", (event) => {
  // Handle audio delta
  const audioData = Buffer.from(event.delta, "base64");
  playAudio(audioData);
});

// Send audio
ws.send({
  type: "input_audio_buffer.append",
  audio: audioBase64String,
});

// Commit audio buffer
ws.send({
  type: "input_audio_buffer.commit",
});

Architecture

The Realtime API operates through a WebSocket connection with an event-driven architecture:

  • Session Management: Create ephemeral tokens server-side, connect from client
  • Audio Streaming: Bidirectional PCM16/G.711 audio at 24kHz
  • Event System: 50+ client-to-server and server-to-client events
  • VAD Integration: Server-side voice activity detection with configurable parameters
  • Conversation Context: Automatic conversation history management
  • Function Calling: Real-time tool execution during conversations
  • Phone Integration: SIP/WebRTC support for phone calls

Capabilities

Session Token Creation

Generate ephemeral session tokens for secure client-side WebSocket connections.

/**
 * Create a Realtime client secret with an associated session configuration.
 * Returns an ephemeral token with 1-minute default TTL (configurable up to 2 hours).
 */
function create(
  params: ClientSecretCreateParams
): Promise<ClientSecretCreateResponse>;

interface ClientSecretCreateParams {
  /** Configuration for the client secret expiration */
  expires_after?: {
    /** Anchor point for expiration (only 'created_at' is supported) */
    anchor?: "created_at";
    /** Seconds from anchor to expiration (10-7200, defaults to 600) */
    seconds?: number;
  };
  /** Session configuration (realtime or transcription session) */
  session?:
    | RealtimeSessionCreateRequest
    | RealtimeTranscriptionSessionCreateRequest;
}

interface ClientSecretCreateResponse {
  /** Expiration timestamp in seconds since epoch */
  expires_at: number;
  /** The session configuration */
  session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse;
  /** The generated client secret value */
  value: string;
}

interface RealtimeSessionCreateResponse {
  /** Ephemeral key for client environments */
  client_secret: {
    expires_at: number;
    value: string;
  };
  /** Session type: always 'realtime' */
  type: "realtime";
  /** Audio configuration */
  audio?: {
    input?: {
      format?: RealtimeAudioFormats;
      noise_reduction?: { type?: NoiseReductionType };
      transcription?: AudioTranscription;
      turn_detection?: ServerVad | SemanticVad | null;
    };
    output?: {
      format?: RealtimeAudioFormats;
      speed?: number;
      voice?: string;
    };
  };
  /** Fields to include in server outputs */
  include?: Array<"item.input_audio_transcription.logprobs">;
  /** System instructions for the model */
  instructions?: string;
  /** Max output tokens (1-4096 or 'inf') */
  max_output_tokens?: number | "inf";
  /** Realtime model to use */
  model?: string;
  /** Output modalities ('text' | 'audio') */
  output_modalities?: Array<"text" | "audio">;
  /** Prompt template reference */
  prompt?: ResponsePrompt | null;
  /** Tool choice configuration */
  tool_choice?: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp;
  /** Available tools */
  tools?: Array<RealtimeFunctionTool | McpTool>;
  /** Tracing configuration */
  tracing?: "auto" | TracingConfiguration | null;
  /** Truncation behavior */
  truncation?: RealtimeTruncation;
}

Session Token Creation

SIP Call Management

Manage incoming and outgoing SIP/WebRTC calls with the Realtime API.

/**
 * Accept an incoming SIP call and configure the realtime session that will handle it
 */
function accept(
  callID: string,
  params: CallAcceptParams,
  options?: RequestOptions
): Promise<void>;

/**
 * End an active Realtime API call, whether it was initiated over SIP or WebRTC
 */
function hangup(
  callID: string,
  options?: RequestOptions
): Promise<void>;

/**
 * Transfer an active SIP call to a new destination using the SIP REFER verb
 */
function refer(
  callID: string,
  params: CallReferParams,
  options?: RequestOptions
): Promise<void>;

/**
 * Decline an incoming SIP call by returning a SIP status code to the caller
 */
function reject(
  callID: string,
  params?: CallRejectParams,
  options?: RequestOptions
): Promise<void>;

interface CallAcceptParams {
  /** The type of session to create. Always 'realtime' for the Realtime API */
  type: "realtime";
  /** Configuration for input and output audio */
  audio?: RealtimeAudioConfig;
  /** Additional fields to include in server outputs */
  include?: Array<"item.input_audio_transcription.logprobs">;
  /** The default system instructions prepended to model calls */
  instructions?: string;
  /** Maximum number of output tokens for a single assistant response (1-4096 or 'inf') */
  max_output_tokens?: number | "inf";
  /** The Realtime model used for this session */
  model?: string;
  /** The set of modalities the model can respond with */
  output_modalities?: Array<"text" | "audio">;
  /** Reference to a prompt template and its variables */
  prompt?: ResponsePrompt | null;
  /** How the model chooses tools */
  tool_choice?: RealtimeToolChoiceConfig;
  /** Tools available to the model */
  tools?: RealtimeToolsConfig;
  /** Tracing configuration for the session */
  tracing?: RealtimeTracingConfig | null;
  /** Truncation behavior when conversation exceeds token limits */
  truncation?: RealtimeTruncation;
}

interface CallReferParams {
  /** URI that should appear in the SIP Refer-To header (e.g., 'tel:+14155550123' or 'sip:agent@example.com') */
  target_uri: string;
}

interface CallRejectParams {
  /** SIP response code to send back to the caller. Defaults to 603 (Decline) when omitted */
  status_code?: number;
}

Available at: client.realtime.calls

Usage Example:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Accept incoming call
await client.realtime.calls.accept("call-123", {
  type: "realtime",
  model: "gpt-realtime",
  audio: {
    input: { format: { type: "audio/pcm", rate: 24000 } },
    output: { format: { type: "audio/pcm", rate: 24000 }, voice: "marin" },
  },
  instructions: "You are a helpful phone assistant.",
});

// Hang up call
await client.realtime.calls.hangup("call-123");

// Reject incoming call
await client.realtime.calls.reject("call-123", {
  status_code: 603, // Decline
});

// Transfer call
await client.realtime.calls.refer("call-123", {
  target_uri: "tel:+14155550123",
});

WebSocket Connection

Connect to the Realtime API using WebSocket with the OpenAIRealtimeWebSocket class.

/**
 * WebSocket client for the Realtime API. Handles connection lifecycle,
 * event streaming, and message sending.
 */
class OpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {
  url: URL;
  socket: WebSocket;

  constructor(
    props: {
      model: string;
      dangerouslyAllowBrowser?: boolean;
      onURL?: (url: URL) => void;
      __resolvedApiKey?: boolean;
    },
    client?: Pick<OpenAI, "apiKey" | "baseURL">
  );

  /**
   * Factory method that resolves API key before connecting
   */
  static create(
    client: Pick<OpenAI, "apiKey" | "baseURL" | "_callApiKey">,
    props: { model: string; dangerouslyAllowBrowser?: boolean }
  ): Promise<OpenAIRealtimeWebSocket>;

  /**
   * Factory method for Azure OpenAI connections
   */
  static azure(
    client: Pick<
      AzureOpenAI,
      "_callApiKey" | "apiVersion" | "apiKey" | "baseURL" | "deploymentName"
    >,
    options?: {
      deploymentName?: string;
      dangerouslyAllowBrowser?: boolean;
    }
  ): Promise<OpenAIRealtimeWebSocket>;

  /**
   * Send a client event to the server
   */
  send(event: RealtimeClientEvent): void;

  /**
   * Close the WebSocket connection
   */
  close(props?: { code: number; reason: string }): void;

  /**
   * Register event listener
   */
  on(event: string, listener: (event: any) => void): void;
}

Usage:

// Standard connection
const ws = await OpenAIRealtimeWebSocket.create(client, {
  model: "gpt-realtime",
});

// Azure connection
const wsAzure = await OpenAIRealtimeWebSocket.azure(azureClient, {
  deploymentName: "my-realtime-deployment",
});

WebSocket Connection

Phone Call Methods

Accept, reject, transfer, and hang up phone calls via SIP integration.

/**
 * Accept an incoming SIP call and configure the realtime session
 */
function accept(callID: string, params: CallAcceptParams): Promise<void>;

/**
 * End an active Realtime API call (SIP or WebRTC)
 */
function hangup(callID: string): Promise<void>;

/**
 * Transfer an active SIP call to a new destination using SIP REFER
 */
function refer(callID: string, params: CallReferParams): Promise<void>;

/**
 * Decline an incoming SIP call with a SIP status code
 */
function reject(
  callID: string,
  params?: CallRejectParams
): Promise<void>;

interface CallAcceptParams {
  type: "realtime";
  audio?: RealtimeAudioConfig;
  include?: Array<"item.input_audio_transcription.logprobs">;
  instructions?: string;
  max_output_tokens?: number | "inf";
  model?: string;
  output_modalities?: Array<"text" | "audio">;
  prompt?: ResponsePrompt | null;
  tool_choice?: RealtimeToolChoiceConfig;
  tools?: RealtimeToolsConfig;
  tracing?: RealtimeTracingConfig | null;
  truncation?: RealtimeTruncation;
}

interface CallReferParams {
  /** URI in SIP Refer-To header (e.g., 'tel:+14155550123') */
  target_uri: string;
}

interface CallRejectParams {
  /** SIP response code (defaults to 603 Decline) */
  status_code?: number;
}

Usage:

// Accept incoming call
await client.realtime.calls.accept("call_abc123", {
  type: "realtime",
  model: "gpt-realtime",
  instructions: "You are a helpful assistant on a phone call.",
  audio: {
    output: { voice: "marin" },
  },
});

// Transfer call
await client.realtime.calls.refer("call_abc123", {
  target_uri: "tel:+14155550199",
});

// Reject call
await client.realtime.calls.reject("call_abc123", {
  status_code: 486, // Busy Here
});

// Hang up
await client.realtime.calls.hangup("call_abc123");

Phone Call Methods

Session Configuration

Configure session parameters including audio formats, VAD, and model settings.

interface RealtimeSession {
  id?: string;
  expires_at?: number;
  /** Fields to include in server outputs */
  include?: Array<"item.input_audio_transcription.logprobs"> | null;
  /** Input audio format: 'pcm16', 'g711_ulaw', or 'g711_alaw' */
  input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  /** Noise reduction configuration */
  input_audio_noise_reduction?: {
    type?: NoiseReductionType;
  };
  /** Transcription configuration */
  input_audio_transcription?: AudioTranscription | null;
  /** System instructions */
  instructions?: string;
  /** Max output tokens per response */
  max_response_output_tokens?: number | "inf";
  /** Response modalities */
  modalities?: Array<"text" | "audio">;
  /** Model identifier */
  model?: string;
  object?: "realtime.session";
  /** Output audio format */
  output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  /** Prompt template reference */
  prompt?: ResponsePrompt | null;
  /** Audio playback speed (0.25-1.5) */
  speed?: number;
  /** Sampling temperature (0.6-1.2) */
  temperature?: number;
  /** Tool choice mode */
  tool_choice?: string;
  /** Available tools */
  tools?: Array<RealtimeFunctionTool>;
  /** Tracing configuration */
  tracing?: "auto" | TracingConfiguration | null;
  /** Turn detection configuration */
  turn_detection?: RealtimeAudioInputTurnDetection | null;
  /** Truncation behavior */
  truncation?: RealtimeTruncation;
  /** Output voice */
  voice?: string;
}

interface AudioTranscription {
  /** Language code (ISO-639-1, e.g., 'en') */
  language?: string;
  /** Transcription model */
  model?:
    | "whisper-1"
    | "gpt-4o-mini-transcribe"
    | "gpt-4o-transcribe"
    | "gpt-4o-transcribe-diarize";
  /** Transcription guidance prompt */
  prompt?: string;
}

type NoiseReductionType = "near_field" | "far_field";

type RealtimeTruncation =
  | "auto"
  | "disabled"
  | {
      type: "retention_ratio";
      /** Fraction of max context to retain (0.0-1.0) */
      retention_ratio: number;
    };

Session Configuration

Turn Detection (VAD)

Configure voice activity detection for automatic turn taking.

/**
 * Server VAD: Simple volume-based voice activity detection
 */
interface ServerVad {
  type: "server_vad";
  /** Auto-generate response on VAD stop */
  create_response?: boolean;
  /** Timeout for prompting user to continue (ms) */
  idle_timeout_ms?: number | null;
  /** Auto-interrupt on VAD start */
  interrupt_response?: boolean;
  /** Audio prefix padding (ms, default: 300) */
  prefix_padding_ms?: number;
  /** Silence duration to detect stop (ms, default: 500) */
  silence_duration_ms?: number;
  /** VAD activation threshold (0.0-1.0, default: 0.5) */
  threshold?: number;
}

/**
 * Semantic VAD: Model-based turn detection with dynamic timeouts
 */
interface SemanticVad {
  type: "semantic_vad";
  /** Auto-generate response on VAD stop */
  create_response?: boolean;
  /** Eagerness: 'low' (8s), 'medium' (4s), 'high' (2s), 'auto' */
  eagerness?: "low" | "medium" | "high" | "auto";
  /** Auto-interrupt on VAD start */
  interrupt_response?: boolean;
}

type RealtimeAudioInputTurnDetection = ServerVad | SemanticVad;

Usage:

// Server VAD with custom settings
{
  type: "server_vad",
  threshold: 0.6,
  silence_duration_ms: 700,
  prefix_padding_ms: 300,
  interrupt_response: true,
  create_response: true,
  idle_timeout_ms: 30000
}

// Semantic VAD for natural conversations
{
  type: "semantic_vad",
  eagerness: "medium",
  interrupt_response: true,
  create_response: true
}

// Manual turn detection (no VAD)
{
  turn_detection: null
}

Turn Detection

Audio Formats

Configure input and output audio formats for the session.

/**
 * PCM 16-bit audio at 24kHz sample rate
 */
interface AudioPCM {
  type?: "audio/pcm";
  rate?: 24000;
}

/**
 * G.711 μ-law format (commonly used in telephony)
 */
interface AudioPCMU {
  type?: "audio/pcmu";
}

/**
 * G.711 A-law format (commonly used in telephony)
 */
interface AudioPCMA {
  type?: "audio/pcma";
}

type RealtimeAudioFormats = AudioPCM | AudioPCMU | AudioPCMA;

interface RealtimeAudioConfig {
  input?: {
    format?: RealtimeAudioFormats;
    noise_reduction?: { type?: NoiseReductionType };
    transcription?: AudioTranscription;
    turn_detection?: RealtimeAudioInputTurnDetection | null;
  };
  output?: {
    format?: RealtimeAudioFormats;
    /** Playback speed multiplier (0.25-1.5) */
    speed?: number;
    /** Voice selection */
    voice?:
      | string
      | "alloy"
      | "ash"
      | "ballad"
      | "coral"
      | "echo"
      | "sage"
      | "shimmer"
      | "verse"
      | "marin"
      | "cedar";
  };
}

Audio Formats

Client-to-Server Events

Events sent from client to server to control the conversation.

/**
 * Union of all client events
 */
type RealtimeClientEvent =
  | ConversationItemCreateEvent
  | ConversationItemDeleteEvent
  | ConversationItemRetrieveEvent
  | ConversationItemTruncateEvent
  | InputAudioBufferAppendEvent
  | InputAudioBufferClearEvent
  | OutputAudioBufferClearEvent
  | InputAudioBufferCommitEvent
  | ResponseCancelEvent
  | ResponseCreateEvent
  | SessionUpdateEvent;

/**
 * Add conversation item (message, function call, or output)
 */
interface ConversationItemCreateEvent {
  type: "conversation.item.create";
  item: ConversationItem;
  event_id?: string;
  /** Insert after this item ID ('root' for beginning) */
  previous_item_id?: string;
}

/**
 * Delete conversation item by ID
 */
interface ConversationItemDeleteEvent {
  type: "conversation.item.delete";
  item_id: string;
  event_id?: string;
}

/**
 * Retrieve full item including audio data
 */
interface ConversationItemRetrieveEvent {
  type: "conversation.item.retrieve";
  item_id: string;
  event_id?: string;
}

/**
 * Truncate assistant audio message
 */
interface ConversationItemTruncateEvent {
  type: "conversation.item.truncate";
  item_id: string;
  content_index: number;
  /** Duration to keep in milliseconds */
  audio_end_ms: number;
  event_id?: string;
}

/**
 * Append audio to input buffer
 */
interface InputAudioBufferAppendEvent {
  type: "input_audio_buffer.append";
  /** Base64-encoded audio bytes */
  audio: string;
  event_id?: string;
}

/**
 * Clear input audio buffer
 */
interface InputAudioBufferClearEvent {
  type: "input_audio_buffer.clear";
  event_id?: string;
}

/**
 * Commit input audio buffer to conversation
 */
interface InputAudioBufferCommitEvent {
  type: "input_audio_buffer.commit";
  event_id?: string;
}

/**
 * WebRTC only: Clear output audio buffer
 */
interface OutputAudioBufferClearEvent {
  type: "output_audio_buffer.clear";
  event_id?: string;
}

/**
 * Cancel in-progress response
 */
interface ResponseCancelEvent {
  type: "response.cancel";
  event_id?: string;
}

/**
 * Request model response
 */
interface ResponseCreateEvent {
  type: "response.create";
  response?: {
    modalities?: Array<"text" | "audio">;
    instructions?: string;
    voice?: string;
    output_audio_format?: string;
    tools?: Array<RealtimeFunctionTool>;
    tool_choice?: string;
    temperature?: number;
    max_output_tokens?: number | "inf";
    conversation?: "auto" | "none";
    metadata?: Record<string, string>;
    input?: Array<ConversationItemWithReference>;
  };
  event_id?: string;
}

/**
 * Update session configuration
 */
interface SessionUpdateEvent {
  type: "session.update";
  session: Partial<RealtimeSession>;
  event_id?: string;
}

Client Events

Server-to-Client Events

Events sent from server to client during the conversation.

/**
 * Union of all server events (50+ event types)
 */
type RealtimeServerEvent =
  | ConversationCreatedEvent
  | ConversationItemCreatedEvent
  | ConversationItemDeletedEvent
  | ConversationItemAdded
  | ConversationItemDone
  | ConversationItemRetrieved
  | ConversationItemTruncatedEvent
  | ConversationItemInputAudioTranscriptionCompletedEvent
  | ConversationItemInputAudioTranscriptionDeltaEvent
  | ConversationItemInputAudioTranscriptionFailedEvent
  | ConversationItemInputAudioTranscriptionSegment
  | InputAudioBufferClearedEvent
  | InputAudioBufferCommittedEvent
  | InputAudioBufferSpeechStartedEvent
  | InputAudioBufferSpeechStoppedEvent
  | InputAudioBufferTimeoutTriggered
  | OutputAudioBufferStarted
  | OutputAudioBufferStopped
  | OutputAudioBufferCleared
  | ResponseCreatedEvent
  | ResponseDoneEvent
  | ResponseOutputItemAddedEvent
  | ResponseOutputItemDoneEvent
  | ResponseContentPartAddedEvent
  | ResponseContentPartDoneEvent
  | ResponseAudioDeltaEvent
  | ResponseAudioDoneEvent
  | ResponseAudioTranscriptDeltaEvent
  | ResponseAudioTranscriptDoneEvent
  | ResponseTextDeltaEvent
  | ResponseTextDoneEvent
  | ResponseFunctionCallArgumentsDeltaEvent
  | ResponseFunctionCallArgumentsDoneEvent
  | ResponseMcpCallArgumentsDelta
  | ResponseMcpCallArgumentsDone
  | ResponseMcpCallInProgress
  | ResponseMcpCallCompleted
  | ResponseMcpCallFailed
  | McpListToolsInProgress
  | McpListToolsCompleted
  | McpListToolsFailed
  | SessionCreatedEvent
  | SessionUpdatedEvent
  | RateLimitsUpdatedEvent
  | RealtimeErrorEvent;

/**
 * Session created (first event after connection)
 */
interface SessionCreatedEvent {
  type: "session.created";
  event_id: string;
  session: RealtimeSession;
}

/**
 * Session updated after client session.update
 */
interface SessionUpdatedEvent {
  type: "session.updated";
  event_id: string;
  session: RealtimeSession;
}

/**
 * Conversation created
 */
interface ConversationCreatedEvent {
  type: "conversation.created";
  event_id: string;
  conversation: {
    id?: string;
    object?: "realtime.conversation";
  };
}

/**
 * Item created in conversation
 */
interface ConversationItemCreatedEvent {
  type: "conversation.item.created";
  event_id: string;
  item: ConversationItem;
  previous_item_id?: string | null;
}

/**
 * Item added to conversation (may have partial content)
 */
interface ConversationItemAdded {
  type: "conversation.item.added";
  event_id: string;
  item: ConversationItem;
  previous_item_id?: string | null;
}

/**
 * Item finalized with complete content
 */
interface ConversationItemDone {
  type: "conversation.item.done";
  event_id: string;
  item: ConversationItem;
  previous_item_id?: string | null;
}

/**
 * Input audio buffer committed
 */
interface InputAudioBufferCommittedEvent {
  type: "input_audio_buffer.committed";
  event_id: string;
  item_id: string;
  previous_item_id?: string | null;
}

/**
 * Speech detected in input buffer (VAD start)
 */
interface InputAudioBufferSpeechStartedEvent {
  type: "input_audio_buffer.speech_started";
  event_id: string;
  item_id: string;
  /** Milliseconds from session start */
  audio_start_ms: number;
}

/**
 * Speech ended in input buffer (VAD stop)
 */
interface InputAudioBufferSpeechStoppedEvent {
  type: "input_audio_buffer.speech_stopped";
  event_id: string;
  item_id: string;
  /** Milliseconds from session start */
  audio_end_ms: number;
}

/**
 * Response started
 */
interface ResponseCreatedEvent {
  type: "response.created";
  event_id: string;
  response: RealtimeResponse;
}

/**
 * Response completed
 */
interface ResponseDoneEvent {
  type: "response.done";
  event_id: string;
  response: RealtimeResponse;
}

/**
 * Audio delta (streaming audio chunk)
 */
interface ResponseAudioDeltaEvent {
  type: "response.audio.delta";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
  /** Base64-encoded audio bytes */
  delta: string;
}

/**
 * Audio generation completed
 */
interface ResponseAudioDoneEvent {
  type: "response.audio.done";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
}

/**
 * Text delta (streaming text chunk)
 */
interface ResponseTextDeltaEvent {
  type: "response.text.delta";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
  /** Text chunk */
  delta: string;
}

/**
 * Text generation completed
 */
interface ResponseTextDoneEvent {
  type: "response.text.done";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  content_index: number;
  /** Complete text */
  text: string;
}

/**
 * Function call arguments delta
 */
interface ResponseFunctionCallArgumentsDeltaEvent {
  type: "response.function_call_arguments.delta";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  call_id: string;
  /** JSON arguments chunk */
  delta: string;
}

/**
 * Function call arguments completed
 */
interface ResponseFunctionCallArgumentsDoneEvent {
  type: "response.function_call_arguments.done";
  event_id: string;
  response_id: string;
  item_id: string;
  output_index: number;
  call_id: string;
  /** Complete JSON arguments */
  arguments: string;
}

/**
 * Error occurred
 */
interface RealtimeErrorEvent {
  type: "error";
  event_id: string;
  error: {
    type: string;
    code?: string | null;
    message: string;
    param?: string | null;
    event_id?: string | null;
  };
}

Server Events

Conversation Items

Items that make up the conversation history.

/**
 * Union of all conversation item types
 */
type ConversationItem =
  | RealtimeConversationItemSystemMessage
  | RealtimeConversationItemUserMessage
  | RealtimeConversationItemAssistantMessage
  | RealtimeConversationItemFunctionCall
  | RealtimeConversationItemFunctionCallOutput
  | RealtimeMcpApprovalResponse
  | RealtimeMcpListTools
  | RealtimeMcpToolCall
  | RealtimeMcpApprovalRequest;

/**
 * System message item
 */
interface RealtimeConversationItemSystemMessage {
  type: "message";
  role: "system";
  content: Array<{
    type?: "input_text";
    text?: string;
  }>;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * User message item (text, audio, or image)
 */
interface RealtimeConversationItemUserMessage {
  type: "message";
  role: "user";
  content: Array<{
    type?: "input_text" | "input_audio" | "input_image";
    text?: string;
    audio?: string; // Base64-encoded
    transcript?: string;
    image_url?: string; // Data URI
    detail?: "auto" | "low" | "high";
  }>;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * Assistant message item (text or audio)
 */
interface RealtimeConversationItemAssistantMessage {
  type: "message";
  role: "assistant";
  content: Array<{
    type?: "output_text" | "output_audio";
    text?: string;
    audio?: string; // Base64-encoded
    transcript?: string;
  }>;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * Function call item
 */
interface RealtimeConversationItemFunctionCall {
  type: "function_call";
  name: string;
  /** JSON-encoded arguments */
  arguments: string;
  call_id?: string;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * Function call output item
 */
interface RealtimeConversationItemFunctionCallOutput {
  type: "function_call_output";
  call_id: string;
  /** Function output (free text) */
  output: string;
  id?: string;
  object?: "realtime.item";
  status?: "completed" | "incomplete" | "in_progress";
}

/**
 * MCP tool call item
 */
interface RealtimeMcpToolCall {
  type: "mcp_call";
  id: string;
  server_label: string;
  name: string;
  arguments: string;
  output?: string | null;
  error?:
    | { type: "protocol_error"; code: number; message: string }
    | { type: "tool_execution_error"; message: string }
    | { type: "http_error"; code: number; message: string }
    | null;
  approval_request_id?: string | null;
}

/**
 * MCP approval request item
 */
interface RealtimeMcpApprovalRequest {
  type: "mcp_approval_request";
  id: string;
  server_label: string;
  name: string;
  arguments: string;
}

/**
 * MCP approval response item
 */
interface RealtimeMcpApprovalResponse {
  type: "mcp_approval_response";
  id: string;
  approval_request_id: string;
  approve: boolean;
  reason?: string | null;
}

Conversation Items

Function Calling

Define and use tools during real-time conversations.

/**
 * Function tool definition for realtime conversations
 */
interface RealtimeFunctionTool {
  type?: "function";
  /** Function name */
  name?: string;
  /** Description and usage guidance */
  description?: string;
  /** JSON Schema for function parameters */
  parameters?: unknown;
}

/**
 * MCP (Model Context Protocol) tool configuration
 */
interface McpTool {
  type: "mcp";
  /** Label identifying the MCP server */
  server_label: string;
  /** MCP server URL or connector ID */
  server_url?: string;
  connector_id?:
    | "connector_dropbox"
    | "connector_gmail"
    | "connector_googlecalendar"
    | "connector_googledrive"
    | "connector_microsoftteams"
    | "connector_outlookcalendar"
    | "connector_outlookemail"
    | "connector_sharepoint";
  /** Server description */
  server_description?: string;
  /** Allowed tools filter */
  allowed_tools?:
    | Array<string>
    | {
        tool_names?: Array<string>;
        read_only?: boolean;
      }
    | null;
  /** Approval requirements */
  require_approval?:
    | "always"
    | "never"
    | {
        always?: { tool_names?: Array<string>; read_only?: boolean };
        never?: { tool_names?: Array<string>; read_only?: boolean };
      }
    | null;
  /** OAuth access token */
  authorization?: string;
  /** HTTP headers */
  headers?: Record<string, string> | null;
}

type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;

type RealtimeToolChoiceConfig =
  | "auto"
  | "none"
  | "required"
  | { type: "function"; function: { name: string } }
  | { type: "mcp"; mcp: { server_label: string; name: string } };

Usage:

// Define tools
const tools: RealtimeToolsConfig = [
  {
    type: "function",
    name: "get_weather",
    description: "Get current weather for a location",
    parameters: {
      type: "object",
      properties: {
        location: { type: "string" },
        unit: { type: "string", enum: ["celsius", "fahrenheit"] },
      },
      required: ["location"],
    },
  },
  {
    type: "mcp",
    server_label: "calendar",
    connector_id: "connector_googlecalendar",
    allowed_tools: {
      tool_names: ["list_events", "create_event"],
    },
  },
];

// Update session with tools
ws.send({
  type: "session.update",
  session: {
    tools,
    tool_choice: "auto",
  },
});

// Handle function call
ws.on("response.function_call_arguments.done", async (event) => {
  const result = await executeFunction(event.call_id, event.arguments);

  // Send function output
  ws.send({
    type: "conversation.item.create",
    item: {
      type: "function_call_output",
      call_id: event.call_id,
      output: JSON.stringify(result),
    },
  });

  // Trigger new response
  ws.send({
    type: "response.create",
  });
});

Function Calling

Response Configuration

Configure individual response parameters.

/**
 * Response resource
 */
interface RealtimeResponse {
  id?: string;
  object?: "realtime.response";
  /** Conversation ID or null */
  conversation_id?: string;
  /** Status: 'in_progress', 'completed', 'cancelled', 'failed', 'incomplete' */
  status?: RealtimeResponseStatus;
  /** Usage statistics */
  usage?: RealtimeResponseUsage;
  /** Max output tokens */
  max_output_tokens?: number | "inf";
  /** Response modalities */
  modalities?: Array<"text" | "audio">;
  /** Instructions for this response */
  instructions?: string;
  /** Voice selection */
  voice?: string;
  /** Audio output configuration */
  audio?: {
    format?: RealtimeAudioFormats;
    speed?: number;
    voice?: string;
  };
  /** Response metadata */
  metadata?: Record<string, string> | null;
  /** Tool choice */
  tool_choice?: RealtimeToolChoiceConfig;
  /** Tools for this response */
  tools?: RealtimeToolsConfig;
  /** Temperature */
  temperature?: number;
  /** Output items */
  output?: Array<ConversationItem>;
  /** Status details */
  status_details?: {
    type?: "incomplete" | "failed" | "cancelled";
    reason?: string;
    error?: RealtimeError | null;
  } | null;
}

interface RealtimeResponseStatus {
  type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
  /** Additional status information */
  reason?: string;
}

interface RealtimeResponseUsage {
  /** Total tokens (input + output) */
  total_tokens?: number;
  /** Input tokens */
  input_tokens?: number;
  /** Output tokens */
  output_tokens?: number;
  /** Input token breakdown */
  input_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
    image_tokens?: number;
    cached_tokens?: number;
    cached_tokens_details?: {
      text_tokens?: number;
      audio_tokens?: number;
      image_tokens?: number;
    };
  };
  /** Output token breakdown */
  output_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
  };
}

Response Configuration

Transcription

Configure and receive audio transcription during conversations.

/**
 * Transcription configuration
 */
interface AudioTranscription {
  /** Language code (ISO-639-1) */
  language?: string;
  /** Transcription model */
  model?:
    | "whisper-1"
    | "gpt-4o-mini-transcribe"
    | "gpt-4o-transcribe"
    | "gpt-4o-transcribe-diarize";
  /** Guidance prompt */
  prompt?: string;
}

/**
 * Transcription completed event
 */
interface ConversationItemInputAudioTranscriptionCompletedEvent {
  type: "conversation.item.input_audio_transcription.completed";
  event_id: string;
  item_id: string;
  content_index: number;
  /** Transcribed text */
  transcript: string;
  /** Usage statistics */
  usage:
    | {
        type: "tokens";
        input_tokens: number;
        output_tokens: number;
        total_tokens: number;
        input_token_details?: {
          text_tokens?: number;
          audio_tokens?: number;
        };
      }
    | {
        type: "duration";
        /** Duration in seconds */
        seconds: number;
      };
  /** Log probabilities (if enabled) */
  logprobs?: Array<{
    token: string;
    logprob: number;
    bytes: Array<number>;
  }> | null;
}

/**
 * Transcription delta event (streaming)
 */
interface ConversationItemInputAudioTranscriptionDeltaEvent {
  type: "conversation.item.input_audio_transcription.delta";
  event_id: string;
  item_id: string;
  content_index?: number;
  /** Transcript chunk */
  delta?: string;
  /** Log probabilities (if enabled) */
  logprobs?: Array<{
    token: string;
    logprob: number;
    bytes: Array<number>;
  }> | null;
}

/**
 * Transcription segment (for diarization)
 */
interface ConversationItemInputAudioTranscriptionSegment {
  type: "conversation.item.input_audio_transcription.segment";
  event_id: string;
  item_id: string;
  content_index: number;
  id: string;
  /** Segment text */
  text: string;
  /** Speaker label */
  speaker: string;
  /** Start time in seconds */
  start: number;
  /** End time in seconds */
  end: number;
}

/**
 * Transcription failed event
 */
interface ConversationItemInputAudioTranscriptionFailedEvent {
  type: "conversation.item.input_audio_transcription.failed";
  event_id: string;
  item_id: string;
  content_index: number;
  error: {
    type?: string;
    code?: string;
    message?: string;
    param?: string;
  };
}

Usage:

// Enable transcription with log probabilities
ws.send({
  type: "session.update",
  session: {
    input_audio_transcription: {
      model: "gpt-4o-transcribe",
      language: "en",
    },
    include: ["item.input_audio_transcription.logprobs"],
  },
});

// Listen for transcription
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
  console.log("Transcript delta:", event.delta);
});

ws.on(
  "conversation.item.input_audio_transcription.completed",
  (event) => {
    console.log("Full transcript:", event.transcript);
    console.log("Usage:", event.usage);
  }
);

// Diarization support
ws.send({
  type: "session.update",
  session: {
    input_audio_transcription: {
      model: "gpt-4o-transcribe-diarize",
    },
  },
});

ws.on(
  "conversation.item.input_audio_transcription.segment",
  (event) => {
    console.log(
      `[${event.speaker}] ${event.text} (${event.start}s - ${event.end}s)`
    );
  }
);

Transcription

Error Handling

Handle errors and edge cases in real-time conversations.

/**
 * Error event from server
 */
interface RealtimeErrorEvent {
  type: "error";
  event_id: string;
  error: RealtimeError;
}

interface RealtimeError {
  /** Error type */
  type: string;
  /** Error code (optional) */
  code?: string | null;
  /** Human-readable message */
  message: string;
  /** Related parameter (optional) */
  param?: string | null;
  /** Client event ID that caused error (optional) */
  event_id?: string | null;
}

/**
 * OpenAI Realtime error class
 */
class OpenAIRealtimeError extends Error {
  constructor(message: string);
}

Common Error Types:

// Invalid request errors
{
  type: "invalid_request_error",
  code: "invalid_value",
  message: "Invalid value for 'audio_format'",
  param: "audio_format"
}

// Server errors
{
  type: "server_error",
  message: "Internal server error"
}

// Rate limit errors
{
  type: "rate_limit_error",
  message: "Rate limit exceeded"
}

Usage:

ws.on("error", (event: RealtimeErrorEvent) => {
  console.error("Realtime error:", event.error);

  if (event.error.type === "rate_limit_error") {
    // Handle rate limiting
  } else if (event.error.type === "invalid_request_error") {
    // Handle validation errors
    console.error("Invalid:", event.error.param, event.error.message);
  }
});

// WebSocket errors
ws.socket.addEventListener("error", (error) => {
  console.error("WebSocket error:", error);
});

Error Handling

Rate Limits

Monitor rate limits during conversations.

/**
 * Rate limits updated event
 */
interface RateLimitsUpdatedEvent {
  type: "rate_limits.updated";
  event_id: string;
  rate_limits: Array<{
    /** Rate limit name: 'requests' or 'tokens' */
    name?: "requests" | "tokens";
    /** Maximum allowed value */
    limit?: number;
    /** Remaining before limit reached */
    remaining?: number;
    /** Seconds until reset */
    reset_seconds?: number;
  }>;
}

Usage:

ws.on("rate_limits.updated", (event: RateLimitsUpdatedEvent) => {
  event.rate_limits.forEach((limit) => {
    console.log(`${limit.name}: ${limit.remaining}/${limit.limit}`);
    console.log(`Resets in ${limit.reset_seconds}s`);
  });
});

Rate Limits

Tracing

Configure distributed tracing for debugging and monitoring.

/**
 * Tracing configuration
 */
type RealtimeTracingConfig =
  | "auto"
  | {
      /** Workflow name in Traces Dashboard */
      workflow_name?: string;
      /** Group ID for filtering */
      group_id?: string;
      /** Arbitrary metadata */
      metadata?: unknown;
    }
  | null;

Usage:

// Auto tracing with defaults
{
  tracing: "auto";
}

// Custom tracing configuration
{
  tracing: {
    workflow_name: "customer-support-bot",
    group_id: "prod-us-west",
    metadata: {
      customer_id: "cust_123",
      agent_version: "2.1.0"
    }
  }
}

// Disable tracing
{
  tracing: null;
}

Tracing

Complete Example: Voice Assistant

import OpenAI from "openai";
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

const client = new OpenAI();

// Create session token
const secret = await client.realtime.clientSecrets.create({
  session: {
    type: "realtime",
    model: "gpt-realtime",
    audio: {
      input: {
        format: { type: "audio/pcm", rate: 24000 },
        turn_detection: {
          type: "server_vad",
          threshold: 0.5,
          silence_duration_ms: 500,
          interrupt_response: true,
        },
        transcription: {
          model: "gpt-4o-transcribe",
        },
      },
      output: {
        format: { type: "audio/pcm", rate: 24000 },
        voice: "marin",
      },
    },
    instructions:
      "You are a helpful voice assistant. Speak naturally and concisely.",
    tools: [
      {
        type: "function",
        name: "get_weather",
        description: "Get weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
          },
          required: ["location"],
        },
      },
    ],
  },
});

// Connect WebSocket
const ws = await OpenAIRealtimeWebSocket.create(client, {
  model: "gpt-realtime",
});

// Handle session
ws.on("session.created", (event) => {
  console.log("Session created:", event.session.id);
});

// Handle conversation
ws.on("conversation.item.created", (event) => {
  console.log("Item created:", event.item.type);
});

// Handle audio output
ws.on("response.audio.delta", (event) => {
  const audioData = Buffer.from(event.delta, "base64");
  playAudio(audioData); // Play to speaker
});

// Handle transcripts
ws.on("conversation.item.input_audio_transcription.completed", (event) => {
  console.log("User said:", event.transcript);
});

ws.on("response.audio_transcript.delta", (event) => {
  process.stdout.write(event.delta);
});

// Handle VAD
ws.on("input_audio_buffer.speech_started", () => {
  console.log("User started speaking");
  stopAudioPlayback(); // Interrupt assistant
});

ws.on("input_audio_buffer.speech_stopped", () => {
  console.log("User stopped speaking");
});

// Handle function calls
ws.on("response.function_call_arguments.done", async (event) => {
  console.log("Function call:", event.call_id);

  const args = JSON.parse(event.arguments);
  const result = await getWeather(args.location);

  // Send result
  ws.send({
    type: "conversation.item.create",
    item: {
      type: "function_call_output",
      call_id: event.call_id,
      output: JSON.stringify(result),
    },
  });

  // Continue conversation
  ws.send({
    type: "response.create",
  });
});

// Handle errors
ws.on("error", (event) => {
  console.error("Error:", event.error.message);
});

// Capture and send microphone audio
const audioStream = captureMicrophone();
audioStream.on("data", (chunk) => {
  const base64 = chunk.toString("base64");
  ws.send({
    type: "input_audio_buffer.append",
    audio: base64,
  });
});

// Cleanup
process.on("SIGINT", () => {
  ws.close();
  process.exit(0);
});

Complete Example: Phone Call Handler

import OpenAI from "openai";
import express from "express";

const client = new OpenAI();
const app = express();

app.use(express.json());

// Webhook for incoming calls
app.post("/realtime/webhook/incoming_call", async (req, res) => {
  const event = req.body;

  if (event.type === "realtime.call.incoming") {
    const callId = event.data.id;

    // Accept the call
    await client.realtime.calls.accept(callId, {
      type: "realtime",
      model: "gpt-realtime",
      instructions:
        "You are a customer service agent. Be professional and helpful.",
      audio: {
        input: {
          format: { type: "audio/pcmu" }, // G.711 for telephony
          turn_detection: {
            type: "server_vad",
            silence_duration_ms: 700,
          },
        },
        output: {
          format: { type: "audio/pcmu" },
          voice: "marin",
        },
      },
      tools: [
        {
          type: "function",
          name: "transfer_to_agent",
          description: "Transfer to human agent",
          parameters: {
            type: "object",
            properties: {
              reason: { type: "string" },
            },
          },
        },
      ],
    });

    console.log(`Accepted call: ${callId}`);
  }

  res.sendStatus(200);
});

// Webhook for call events
app.post("/realtime/webhook/call_events", async (req, res) => {
  const event = req.body;

  if (event.type === "realtime.response.function_call_output.done") {
    const { call_id, function_name, arguments: args } = event.data;

    if (function_name === "transfer_to_agent") {
      // Transfer call
      await client.realtime.calls.refer(call_id, {
        target_uri: "sip:support@example.com",
      });
    }
  }

  res.sendStatus(200);
});

app.listen(3000, () => {
  console.log("Webhook server running on port 3000");
});

Type Reference

Core Types

type RealtimeClientEvent =
  | ConversationItemCreateEvent
  | ConversationItemDeleteEvent
  | ConversationItemRetrieveEvent
  | ConversationItemTruncateEvent
  | InputAudioBufferAppendEvent
  | InputAudioBufferClearEvent
  | OutputAudioBufferClearEvent
  | InputAudioBufferCommitEvent
  | ResponseCancelEvent
  | ResponseCreateEvent
  | SessionUpdateEvent;

type RealtimeServerEvent =
  | ConversationCreatedEvent
  | ConversationItemCreatedEvent
  | ConversationItemDeletedEvent
  | ConversationItemAdded
  | ConversationItemDone
  | ConversationItemRetrieved
  | ConversationItemTruncatedEvent
  | ConversationItemInputAudioTranscriptionCompletedEvent
  | ConversationItemInputAudioTranscriptionDeltaEvent
  | ConversationItemInputAudioTranscriptionFailedEvent
  | ConversationItemInputAudioTranscriptionSegment
  | InputAudioBufferClearedEvent
  | InputAudioBufferCommittedEvent
  | InputAudioBufferSpeechStartedEvent
  | InputAudioBufferSpeechStoppedEvent
  | InputAudioBufferTimeoutTriggered
  | OutputAudioBufferStarted
  | OutputAudioBufferStopped
  | OutputAudioBufferCleared
  | ResponseCreatedEvent
  | ResponseDoneEvent
  | ResponseOutputItemAddedEvent
  | ResponseOutputItemDoneEvent
  | ResponseContentPartAddedEvent
  | ResponseContentPartDoneEvent
  | ResponseAudioDeltaEvent
  | ResponseAudioDoneEvent
  | ResponseAudioTranscriptDeltaEvent
  | ResponseAudioTranscriptDoneEvent
  | ResponseTextDeltaEvent
  | ResponseTextDoneEvent
  | ResponseFunctionCallArgumentsDeltaEvent
  | ResponseFunctionCallArgumentsDoneEvent
  | ResponseMcpCallArgumentsDelta
  | ResponseMcpCallArgumentsDone
  | ResponseMcpCallInProgress
  | ResponseMcpCallCompleted
  | ResponseMcpCallFailed
  | McpListToolsInProgress
  | McpListToolsCompleted
  | McpListToolsFailed
  | SessionCreatedEvent
  | SessionUpdatedEvent
  | RateLimitsUpdatedEvent
  | RealtimeErrorEvent;

type ConversationItem =
  | RealtimeConversationItemSystemMessage
  | RealtimeConversationItemUserMessage
  | RealtimeConversationItemAssistantMessage
  | RealtimeConversationItemFunctionCall
  | RealtimeConversationItemFunctionCallOutput
  | RealtimeMcpApprovalResponse
  | RealtimeMcpListTools
  | RealtimeMcpToolCall
  | RealtimeMcpApprovalRequest;

interface RealtimeSession {
  id?: string;
  object?: "realtime.session";
  model?: string;
  expires_at?: number;
  modalities?: Array<"text" | "audio">;
  instructions?: string;
  voice?: string;
  input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
  input_audio_transcription?: AudioTranscription | null;
  turn_detection?: RealtimeAudioInputTurnDetection | null;
  tools?: Array<RealtimeFunctionTool>;
  tool_choice?: string;
  temperature?: number;
  max_response_output_tokens?: number | "inf";
  speed?: number;
  input_audio_noise_reduction?: {
    type?: NoiseReductionType;
  };
  include?: Array<"item.input_audio_transcription.logprobs"> | null;
  prompt?: ResponsePrompt | null;
  tracing?: RealtimeTracingConfig | null;
  truncation?: RealtimeTruncation;
}

interface RealtimeResponse {
  id?: string;
  object?: "realtime.response";
  status?: RealtimeResponseStatus;
  conversation_id?: string;
  output?: Array<ConversationItem>;
  usage?: RealtimeResponseUsage;
  status_details?: {
    type?: "incomplete" | "failed" | "cancelled";
    reason?: string;
    error?: RealtimeError | null;
  } | null;
  max_output_tokens?: number | "inf";
  modalities?: Array<"text" | "audio">;
  instructions?: string;
  voice?: string;
  audio?: {
    format?: RealtimeAudioFormats;
    speed?: number;
    voice?: string;
  };
  metadata?: Record<string, string> | null;
  tool_choice?: RealtimeToolChoiceConfig;
  tools?: RealtimeToolsConfig;
  temperature?: number;
}

interface AudioTranscription {
  language?: string;
  model?:
    | "whisper-1"
    | "gpt-4o-mini-transcribe"
    | "gpt-4o-transcribe"
    | "gpt-4o-transcribe-diarize";
  prompt?: string;
}

type RealtimeAudioFormats =
  | { type?: "audio/pcm"; rate?: 24000 }
  | { type?: "audio/pcmu" }
  | { type?: "audio/pcma" };

type NoiseReductionType = "near_field" | "far_field";

type RealtimeAudioInputTurnDetection =
  | {
      type: "server_vad";
      threshold?: number;
      prefix_padding_ms?: number;
      silence_duration_ms?: number;
      create_response?: boolean;
      interrupt_response?: boolean;
      idle_timeout_ms?: number | null;
    }
  | {
      type: "semantic_vad";
      eagerness?: "low" | "medium" | "high" | "auto";
      create_response?: boolean;
      interrupt_response?: boolean;
    };

type RealtimeTruncation =
  | "auto"
  | "disabled"
  | { type: "retention_ratio"; retention_ratio: number };

type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;

type RealtimeToolChoiceConfig =
  | "auto"
  | "none"
  | "required"
  | { type: "function"; function: { name: string } }
  | { type: "mcp"; mcp: { server_label: string; name: string } };

type RealtimeTracingConfig =
  | "auto"
  | {
      workflow_name?: string;
      group_id?: string;
      metadata?: unknown;
    }
  | null;

interface RealtimeError {
  type: string;
  code?: string | null;
  message: string;
  param?: string | null;
  event_id?: string | null;
}

interface RealtimeResponseUsage {
  total_tokens?: number;
  input_tokens?: number;
  output_tokens?: number;
  input_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
    image_tokens?: number;
    cached_tokens?: number;
    cached_tokens_details?: {
      text_tokens?: number;
      audio_tokens?: number;
      image_tokens?: number;
    };
  };
  output_token_details?: {
    text_tokens?: number;
    audio_tokens?: number;
  };
}

interface RealtimeResponseStatus {
  type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
  reason?: string;
}

Models

Available Realtime API models:

  • gpt-realtime (latest)
  • gpt-realtime-2025-08-28
  • gpt-4o-realtime-preview
  • gpt-4o-realtime-preview-2024-10-01
  • gpt-4o-realtime-preview-2024-12-17
  • gpt-4o-realtime-preview-2025-06-03
  • gpt-4o-mini-realtime-preview
  • gpt-4o-mini-realtime-preview-2024-12-17
  • gpt-realtime-mini
  • gpt-realtime-mini-2025-10-06
  • gpt-audio-mini
  • gpt-audio-mini-2025-10-06

Best Practices

Security

  • Never expose API keys in browser: Always use ephemeral session tokens
  • Token expiration: Default 10 minutes, max 2 hours
  • Server-side validation: Validate all tool calls server-side
  • Rate limiting: Monitor rate limit events and handle gracefully

Performance

  • Audio chunking: Send audio in chunks (1-5 seconds recommended)
  • VAD tuning: Adjust threshold and silence duration for your environment
  • Voice selection: Use marin or cedar for best quality
  • Caching: Enable context caching for repeated conversations

Audio Quality

  • Noise reduction: Enable for far-field or noisy environments
  • Sample rate: Always use 24kHz for PCM audio
  • Format selection: Use G.711 (pcmu/pcma) for telephony, PCM for quality
  • Interrupt handling: Clear audio buffers on interruption

Conversation Management

  • Context length: Monitor token usage, configure truncation
  • Function calling: Keep tool outputs concise
  • System messages: Use for mid-conversation context updates
  • Item ordering: Use previous_item_id for precise insertion

Error Handling

  • Graceful degradation: Handle WebSocket disconnections
  • Retry logic: Implement exponential backoff for transient errors
  • Error logging: Log all error events for debugging
  • User feedback: Provide clear feedback on connection/processing status

Common Patterns

Voice-to-Voice Assistant

const ws = await OpenAIRealtimeWebSocket.create(client, {
  model: "gpt-realtime",
});

// Microphone → Input Buffer
micStream.on("data", (chunk) => {
  ws.send({
    type: "input_audio_buffer.append",
    audio: chunk.toString("base64"),
  });
});

// Output Audio → Speaker
ws.on("response.audio.delta", (event) => {
  playAudio(Buffer.from(event.delta, "base64"));
});

// VAD-based interruption
ws.on("input_audio_buffer.speech_started", () => {
  stopPlayback();
});

Text-to-Voice Assistant

// Send text message
ws.send({
  type: "conversation.item.create",
  item: {
    type: "message",
    role: "user",
    content: [{ type: "input_text", text: "Hello!" }],
  },
});

// Request audio response
ws.send({
  type: "response.create",
  response: {
    modalities: ["audio"],
  },
});

Streaming Transcripts

ws.on("response.audio_transcript.delta", (event) => {
  updateSubtitles(event.delta);
});

ws.on("conversation.item.input_audio_transcription.delta", (event) => {
  updateUserTranscript(event.delta);
});

Multi-Tool Assistant

const tools = [
  {
    type: "function",
    name: "search_database",
    description: "Search customer database",
    parameters: {
      /* ... */
    },
  },
  {
    type: "mcp",
    server_label: "calendar",
    connector_id: "connector_googlecalendar",
  },
];

ws.send({
  type: "session.update",
  session: { tools, tool_choice: "auto" },
});

Install with Tessl CLI

npx tessl i tessl/npm-openai

docs

assistants.md

audio.md

batches-evals.md

chat-completions.md

client-configuration.md

containers.md

conversations.md

embeddings.md

files-uploads.md

fine-tuning.md

helpers-audio.md

helpers-zod.md

images.md

index.md

realtime.md

responses-api.md

vector-stores.md

videos.md

tile.json