The official TypeScript library for the OpenAI API
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
The Realtime API provides WebSocket-based real-time voice conversations with OpenAI models. It supports bidirectional audio streaming, server-side voice activity detection (VAD), function calling, and full conversation management. The API is designed for live voice applications including phone calls, voice assistants, and interactive conversational experiences.
npm install openaiThe Realtime API is now generally available (GA) at client.realtime.*.
Deprecation Notice: The legacy beta Realtime API at client.beta.realtime.* is deprecated. If you are using the beta API, migrate to the GA API documented here. The beta API includes:
client.beta.realtime.sessions.create() (deprecated - use client.realtime.clientSecrets.create() instead)client.beta.realtime.transcriptionSessions.create() (deprecated)All new projects should use the GA Realtime API (client.realtime.*) documented on this page.
import OpenAI from "openai";
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket"; // Browser
import { OpenAIRealtimeWS } from "openai/realtime/ws"; // Node.js (requires 'ws' package)The Realtime API provides two WebSocket client implementations for different runtime environments:
For browser environments, use OpenAIRealtimeWebSocket which uses the native browser WebSocket API.
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const ws = new OpenAIRealtimeWebSocket(
{
model: "gpt-realtime",
dangerouslyAllowBrowser: true, // Required for browser use
},
client
);
// Event handling
ws.on("session.created", (event) => {
console.log("Session started:", event.session.id);
});
ws.on("response.audio.delta", (event) => {
// Handle audio deltas - event.delta is base64 encoded audio
const audioData = atob(event.delta);
playAudio(audioData);
});
ws.on("error", (error) => {
console.error("WebSocket error:", error);
});
// Send audio to the server
function sendAudio(audioData: ArrayBuffer) {
const base64Audio = btoa(String.fromCharCode(...new Uint8Array(audioData)));
ws.send({
type: "input_audio_buffer.append",
audio: base64Audio,
});
}
// Commit audio buffer to trigger processing
ws.send({
type: "input_audio_buffer.commit",
});
// Close connection
ws.close();Key features:
dangerouslyAllowBrowser: true in configurationFor Node.js environments, use OpenAIRealtimeWS which uses the ws package for WebSocket support.
import { OpenAIRealtimeWS } from "openai/realtime/ws";
import OpenAI from "openai";
import fs from "fs";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const ws = new OpenAIRealtimeWS(
{
model: "gpt-realtime",
},
client
);
// Event handling (same interface as browser version)
ws.on("session.created", (event) => {
console.log("Session started:", event.session.id);
});
ws.on("response.audio.delta", (event) => {
// Handle audio deltas
const audioBuffer = Buffer.from(event.delta, "base64");
// Write to file or stream to audio output
fs.appendFileSync("output.pcm", audioBuffer);
});
ws.on("response.done", (event) => {
console.log("Response complete:", event.response.id);
});
// Send audio from file or buffer
function sendAudioFromFile(filePath: string) {
const audioBuffer = fs.readFileSync(filePath);
const base64Audio = audioBuffer.toString("base64");
ws.send({
type: "input_audio_buffer.append",
audio: base64Audio,
});
}
// Trigger response generation
ws.send({
type: "input_audio_buffer.commit",
});
// Close connection
ws.close();Key features:
ws package for WebSocket support (add to dependencies: npm install ws @types/ws)Both WebSocket clients support the same event handling interface:
// Connection events
ws.on("session.created", (event) => { /* Session initialization */ });
ws.on("session.updated", (event) => { /* Session configuration changed */ });
// Conversation events
ws.on("conversation.created", (event) => { /* New conversation */ });
ws.on("conversation.item.created", (event) => { /* New item added */ });
ws.on("conversation.item.deleted", (event) => { /* Item removed */ });
// Audio events (streaming)
ws.on("response.audio.delta", (event) => { /* Audio chunk received */ });
ws.on("response.audio.done", (event) => { /* Audio complete */ });
ws.on("response.audio_transcript.delta", (event) => { /* Transcript chunk */ });
ws.on("response.audio_transcript.done", (event) => { /* Transcript complete */ });
// Response events
ws.on("response.created", (event) => { /* Response started */ });
ws.on("response.done", (event) => { /* Response complete */ });
ws.on("response.cancelled", (event) => { /* Response cancelled */ });
ws.on("response.failed", (event) => { /* Response failed */ });
// Function calling events
ws.on("response.function_call_arguments.delta", (event) => { /* Function args streaming */ });
ws.on("response.function_call_arguments.done", (event) => { /* Function args complete */ });
// Error events
ws.on("error", (error) => { /* WebSocket or API error */ });
ws.on("close", (event) => { /* Connection closed */ });Both clients use the same .send() method for sending commands:
// Append audio to input buffer
ws.send({
type: "input_audio_buffer.append",
audio: base64AudioString,
});
// Commit audio buffer (triggers VAD or manual processing)
ws.send({
type: "input_audio_buffer.commit",
});
// Clear audio buffer
ws.send({
type: "input_audio_buffer.clear",
});
// Update session configuration
ws.send({
type: "session.update",
session: {
instructions: "You are a helpful assistant.",
turn_detection: { type: "server_vad" },
},
});
// Create conversation item (text message)
ws.send({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [{ type: "input_text", text: "Hello!" }],
},
});
// Trigger response generation
ws.send({
type: "response.create",
response: {
modalities: ["text", "audio"],
instructions: "Respond briefly.",
},
});
// Cancel in-progress response
ws.send({
type: "response.cancel",
});Both clients handle connection lifecycle automatically:
const ws = new OpenAIRealtimeWS({ model: "gpt-realtime" }, client);
// Connection opens automatically
ws.on("session.created", (event) => {
console.log("Connected and ready");
});
// Handle disconnections
ws.on("close", (event) => {
console.log("Connection closed:", event.code, event.reason);
});
// Handle errors
ws.on("error", (error) => {
console.error("Connection error:", error);
});
// Manually close connection
ws.close();import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Create an ephemeral session token for client-side use
const response = await client.realtime.clientSecrets.create({
session: {
type: "realtime",
model: "gpt-realtime",
audio: {
input: {
format: { type: "audio/pcm", rate: 24000 },
turn_detection: {
type: "server_vad",
threshold: 0.5,
silence_duration_ms: 500,
},
},
output: {
format: { type: "audio/pcm", rate: 24000 },
voice: "marin",
},
},
},
});
const sessionToken = response.value;import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
const ws = new OpenAIRealtimeWebSocket(
{
model: "gpt-realtime",
dangerouslyAllowBrowser: false,
},
client
);
// Listen for events
ws.on("session.created", (event) => {
console.log("Session created:", event);
});
ws.on("conversation.item.created", (event) => {
console.log("Item created:", event.item);
});
ws.on("response.audio.delta", (event) => {
// Handle audio delta
const audioData = Buffer.from(event.delta, "base64");
playAudio(audioData);
});
// Send audio
ws.send({
type: "input_audio_buffer.append",
audio: audioBase64String,
});
// Commit audio buffer
ws.send({
type: "input_audio_buffer.commit",
});The Realtime API operates through a WebSocket connection with an event-driven architecture:
Generate ephemeral session tokens for secure client-side WebSocket connections.
/**
* Create a Realtime client secret with an associated session configuration.
* Returns an ephemeral token with 1-minute default TTL (configurable up to 2 hours).
*/
function create(
params: ClientSecretCreateParams
): Promise<ClientSecretCreateResponse>;
interface ClientSecretCreateParams {
/** Configuration for the client secret expiration */
expires_after?: {
/** Anchor point for expiration (only 'created_at' is supported) */
anchor?: "created_at";
/** Seconds from anchor to expiration (10-7200, defaults to 600) */
seconds?: number;
};
/** Session configuration (realtime or transcription session) */
session?:
| RealtimeSessionCreateRequest
| RealtimeTranscriptionSessionCreateRequest;
}
interface ClientSecretCreateResponse {
/** Expiration timestamp in seconds since epoch */
expires_at: number;
/** The session configuration */
session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse;
/** The generated client secret value */
value: string;
}
interface RealtimeSessionCreateResponse {
/** Ephemeral key for client environments */
client_secret: {
expires_at: number;
value: string;
};
/** Session type: always 'realtime' */
type: "realtime";
/** Audio configuration */
audio?: {
input?: {
format?: RealtimeAudioFormats;
noise_reduction?: { type?: NoiseReductionType };
transcription?: AudioTranscription;
turn_detection?: ServerVad | SemanticVad | null;
};
output?: {
format?: RealtimeAudioFormats;
speed?: number;
voice?: string;
};
};
/** Fields to include in server outputs */
include?: Array<"item.input_audio_transcription.logprobs">;
/** System instructions for the model */
instructions?: string;
/** Max output tokens (1-4096 or 'inf') */
max_output_tokens?: number | "inf";
/** Realtime model to use */
model?: string;
/** Output modalities ('text' | 'audio') */
output_modalities?: Array<"text" | "audio">;
/** Prompt template reference */
prompt?: ResponsePrompt | null;
/** Tool choice configuration */
tool_choice?: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp;
/** Available tools */
tools?: Array<RealtimeFunctionTool | McpTool>;
/** Tracing configuration */
tracing?: "auto" | TracingConfiguration | null;
/** Truncation behavior */
truncation?: RealtimeTruncation;
}Manage incoming and outgoing SIP/WebRTC calls with the Realtime API.
/**
* Accept an incoming SIP call and configure the realtime session that will handle it
*/
function accept(
callID: string,
params: CallAcceptParams,
options?: RequestOptions
): Promise<void>;
/**
* End an active Realtime API call, whether it was initiated over SIP or WebRTC
*/
function hangup(
callID: string,
options?: RequestOptions
): Promise<void>;
/**
* Transfer an active SIP call to a new destination using the SIP REFER verb
*/
function refer(
callID: string,
params: CallReferParams,
options?: RequestOptions
): Promise<void>;
/**
* Decline an incoming SIP call by returning a SIP status code to the caller
*/
function reject(
callID: string,
params?: CallRejectParams,
options?: RequestOptions
): Promise<void>;
interface CallAcceptParams {
/** The type of session to create. Always 'realtime' for the Realtime API */
type: "realtime";
/** Configuration for input and output audio */
audio?: RealtimeAudioConfig;
/** Additional fields to include in server outputs */
include?: Array<"item.input_audio_transcription.logprobs">;
/** The default system instructions prepended to model calls */
instructions?: string;
/** Maximum number of output tokens for a single assistant response (1-4096 or 'inf') */
max_output_tokens?: number | "inf";
/** The Realtime model used for this session */
model?: string;
/** The set of modalities the model can respond with */
output_modalities?: Array<"text" | "audio">;
/** Reference to a prompt template and its variables */
prompt?: ResponsePrompt | null;
/** How the model chooses tools */
tool_choice?: RealtimeToolChoiceConfig;
/** Tools available to the model */
tools?: RealtimeToolsConfig;
/** Tracing configuration for the session */
tracing?: RealtimeTracingConfig | null;
/** Truncation behavior when conversation exceeds token limits */
truncation?: RealtimeTruncation;
}
interface CallReferParams {
/** URI that should appear in the SIP Refer-To header (e.g., 'tel:+14155550123' or 'sip:agent@example.com') */
target_uri: string;
}
interface CallRejectParams {
/** SIP response code to send back to the caller. Defaults to 603 (Decline) when omitted */
status_code?: number;
}Available at: client.realtime.calls
Usage Example:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Accept incoming call
await client.realtime.calls.accept("call-123", {
type: "realtime",
model: "gpt-realtime",
audio: {
input: { format: { type: "audio/pcm", rate: 24000 } },
output: { format: { type: "audio/pcm", rate: 24000 }, voice: "marin" },
},
instructions: "You are a helpful phone assistant.",
});
// Hang up call
await client.realtime.calls.hangup("call-123");
// Reject incoming call
await client.realtime.calls.reject("call-123", {
status_code: 603, // Decline
});
// Transfer call
await client.realtime.calls.refer("call-123", {
target_uri: "tel:+14155550123",
});Connect to the Realtime API using WebSocket with the OpenAIRealtimeWebSocket class.
/**
* WebSocket client for the Realtime API. Handles connection lifecycle,
* event streaming, and message sending.
*/
class OpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {
url: URL;
socket: WebSocket;
constructor(
props: {
model: string;
dangerouslyAllowBrowser?: boolean;
onURL?: (url: URL) => void;
__resolvedApiKey?: boolean;
},
client?: Pick<OpenAI, "apiKey" | "baseURL">
);
/**
* Factory method that resolves API key before connecting
*/
static create(
client: Pick<OpenAI, "apiKey" | "baseURL" | "_callApiKey">,
props: { model: string; dangerouslyAllowBrowser?: boolean }
): Promise<OpenAIRealtimeWebSocket>;
/**
* Factory method for Azure OpenAI connections
*/
static azure(
client: Pick<
AzureOpenAI,
"_callApiKey" | "apiVersion" | "apiKey" | "baseURL" | "deploymentName"
>,
options?: {
deploymentName?: string;
dangerouslyAllowBrowser?: boolean;
}
): Promise<OpenAIRealtimeWebSocket>;
/**
* Send a client event to the server
*/
send(event: RealtimeClientEvent): void;
/**
* Close the WebSocket connection
*/
close(props?: { code: number; reason: string }): void;
/**
* Register event listener
*/
on(event: string, listener: (event: any) => void): void;
}Usage:
// Standard connection
const ws = await OpenAIRealtimeWebSocket.create(client, {
model: "gpt-realtime",
});
// Azure connection
const wsAzure = await OpenAIRealtimeWebSocket.azure(azureClient, {
deploymentName: "my-realtime-deployment",
});Accept, reject, transfer, and hang up phone calls via SIP integration.
/**
* Accept an incoming SIP call and configure the realtime session
*/
function accept(callID: string, params: CallAcceptParams): Promise<void>;
/**
* End an active Realtime API call (SIP or WebRTC)
*/
function hangup(callID: string): Promise<void>;
/**
* Transfer an active SIP call to a new destination using SIP REFER
*/
function refer(callID: string, params: CallReferParams): Promise<void>;
/**
* Decline an incoming SIP call with a SIP status code
*/
function reject(
callID: string,
params?: CallRejectParams
): Promise<void>;
interface CallAcceptParams {
type: "realtime";
audio?: RealtimeAudioConfig;
include?: Array<"item.input_audio_transcription.logprobs">;
instructions?: string;
max_output_tokens?: number | "inf";
model?: string;
output_modalities?: Array<"text" | "audio">;
prompt?: ResponsePrompt | null;
tool_choice?: RealtimeToolChoiceConfig;
tools?: RealtimeToolsConfig;
tracing?: RealtimeTracingConfig | null;
truncation?: RealtimeTruncation;
}
interface CallReferParams {
/** URI in SIP Refer-To header (e.g., 'tel:+14155550123') */
target_uri: string;
}
interface CallRejectParams {
/** SIP response code (defaults to 603 Decline) */
status_code?: number;
}Usage:
// Accept incoming call
await client.realtime.calls.accept("call_abc123", {
type: "realtime",
model: "gpt-realtime",
instructions: "You are a helpful assistant on a phone call.",
audio: {
output: { voice: "marin" },
},
});
// Transfer call
await client.realtime.calls.refer("call_abc123", {
target_uri: "tel:+14155550199",
});
// Reject call
await client.realtime.calls.reject("call_abc123", {
status_code: 486, // Busy Here
});
// Hang up
await client.realtime.calls.hangup("call_abc123");Configure session parameters including audio formats, VAD, and model settings.
interface RealtimeSession {
id?: string;
expires_at?: number;
/** Fields to include in server outputs */
include?: Array<"item.input_audio_transcription.logprobs"> | null;
/** Input audio format: 'pcm16', 'g711_ulaw', or 'g711_alaw' */
input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
/** Noise reduction configuration */
input_audio_noise_reduction?: {
type?: NoiseReductionType;
};
/** Transcription configuration */
input_audio_transcription?: AudioTranscription | null;
/** System instructions */
instructions?: string;
/** Max output tokens per response */
max_response_output_tokens?: number | "inf";
/** Response modalities */
modalities?: Array<"text" | "audio">;
/** Model identifier */
model?: string;
object?: "realtime.session";
/** Output audio format */
output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
/** Prompt template reference */
prompt?: ResponsePrompt | null;
/** Audio playback speed (0.25-1.5) */
speed?: number;
/** Sampling temperature (0.6-1.2) */
temperature?: number;
/** Tool choice mode */
tool_choice?: string;
/** Available tools */
tools?: Array<RealtimeFunctionTool>;
/** Tracing configuration */
tracing?: "auto" | TracingConfiguration | null;
/** Turn detection configuration */
turn_detection?: RealtimeAudioInputTurnDetection | null;
/** Truncation behavior */
truncation?: RealtimeTruncation;
/** Output voice */
voice?: string;
}
interface AudioTranscription {
/** Language code (ISO-639-1, e.g., 'en') */
language?: string;
/** Transcription model */
model?:
| "whisper-1"
| "gpt-4o-mini-transcribe"
| "gpt-4o-transcribe"
| "gpt-4o-transcribe-diarize";
/** Transcription guidance prompt */
prompt?: string;
}
type NoiseReductionType = "near_field" | "far_field";
type RealtimeTruncation =
| "auto"
| "disabled"
| {
type: "retention_ratio";
/** Fraction of max context to retain (0.0-1.0) */
retention_ratio: number;
};Configure voice activity detection for automatic turn taking.
/**
* Server VAD: Simple volume-based voice activity detection
*/
interface ServerVad {
type: "server_vad";
/** Auto-generate response on VAD stop */
create_response?: boolean;
/** Timeout for prompting user to continue (ms) */
idle_timeout_ms?: number | null;
/** Auto-interrupt on VAD start */
interrupt_response?: boolean;
/** Audio prefix padding (ms, default: 300) */
prefix_padding_ms?: number;
/** Silence duration to detect stop (ms, default: 500) */
silence_duration_ms?: number;
/** VAD activation threshold (0.0-1.0, default: 0.5) */
threshold?: number;
}
/**
* Semantic VAD: Model-based turn detection with dynamic timeouts
*/
interface SemanticVad {
type: "semantic_vad";
/** Auto-generate response on VAD stop */
create_response?: boolean;
/** Eagerness: 'low' (8s), 'medium' (4s), 'high' (2s), 'auto' */
eagerness?: "low" | "medium" | "high" | "auto";
/** Auto-interrupt on VAD start */
interrupt_response?: boolean;
}
type RealtimeAudioInputTurnDetection = ServerVad | SemanticVad;Usage:
// Server VAD with custom settings
{
type: "server_vad",
threshold: 0.6,
silence_duration_ms: 700,
prefix_padding_ms: 300,
interrupt_response: true,
create_response: true,
idle_timeout_ms: 30000
}
// Semantic VAD for natural conversations
{
type: "semantic_vad",
eagerness: "medium",
interrupt_response: true,
create_response: true
}
// Manual turn detection (no VAD)
{
turn_detection: null
}Configure input and output audio formats for the session.
/**
* PCM 16-bit audio at 24kHz sample rate
*/
interface AudioPCM {
type?: "audio/pcm";
rate?: 24000;
}
/**
* G.711 μ-law format (commonly used in telephony)
*/
interface AudioPCMU {
type?: "audio/pcmu";
}
/**
* G.711 A-law format (commonly used in telephony)
*/
interface AudioPCMA {
type?: "audio/pcma";
}
type RealtimeAudioFormats = AudioPCM | AudioPCMU | AudioPCMA;
interface RealtimeAudioConfig {
input?: {
format?: RealtimeAudioFormats;
noise_reduction?: { type?: NoiseReductionType };
transcription?: AudioTranscription;
turn_detection?: RealtimeAudioInputTurnDetection | null;
};
output?: {
format?: RealtimeAudioFormats;
/** Playback speed multiplier (0.25-1.5) */
speed?: number;
/** Voice selection */
voice?:
| string
| "alloy"
| "ash"
| "ballad"
| "coral"
| "echo"
| "sage"
| "shimmer"
| "verse"
| "marin"
| "cedar";
};
}Events sent from client to server to control the conversation.
/**
* Union of all client events
*/
type RealtimeClientEvent =
| ConversationItemCreateEvent
| ConversationItemDeleteEvent
| ConversationItemRetrieveEvent
| ConversationItemTruncateEvent
| InputAudioBufferAppendEvent
| InputAudioBufferClearEvent
| OutputAudioBufferClearEvent
| InputAudioBufferCommitEvent
| ResponseCancelEvent
| ResponseCreateEvent
| SessionUpdateEvent;
/**
* Add conversation item (message, function call, or output)
*/
interface ConversationItemCreateEvent {
type: "conversation.item.create";
item: ConversationItem;
event_id?: string;
/** Insert after this item ID ('root' for beginning) */
previous_item_id?: string;
}
/**
* Delete conversation item by ID
*/
interface ConversationItemDeleteEvent {
type: "conversation.item.delete";
item_id: string;
event_id?: string;
}
/**
* Retrieve full item including audio data
*/
interface ConversationItemRetrieveEvent {
type: "conversation.item.retrieve";
item_id: string;
event_id?: string;
}
/**
* Truncate assistant audio message
*/
interface ConversationItemTruncateEvent {
type: "conversation.item.truncate";
item_id: string;
content_index: number;
/** Duration to keep in milliseconds */
audio_end_ms: number;
event_id?: string;
}
/**
* Append audio to input buffer
*/
interface InputAudioBufferAppendEvent {
type: "input_audio_buffer.append";
/** Base64-encoded audio bytes */
audio: string;
event_id?: string;
}
/**
* Clear input audio buffer
*/
interface InputAudioBufferClearEvent {
type: "input_audio_buffer.clear";
event_id?: string;
}
/**
* Commit input audio buffer to conversation
*/
interface InputAudioBufferCommitEvent {
type: "input_audio_buffer.commit";
event_id?: string;
}
/**
* WebRTC only: Clear output audio buffer
*/
interface OutputAudioBufferClearEvent {
type: "output_audio_buffer.clear";
event_id?: string;
}
/**
* Cancel in-progress response
*/
interface ResponseCancelEvent {
type: "response.cancel";
event_id?: string;
}
/**
* Request model response
*/
interface ResponseCreateEvent {
type: "response.create";
response?: {
modalities?: Array<"text" | "audio">;
instructions?: string;
voice?: string;
output_audio_format?: string;
tools?: Array<RealtimeFunctionTool>;
tool_choice?: string;
temperature?: number;
max_output_tokens?: number | "inf";
conversation?: "auto" | "none";
metadata?: Record<string, string>;
input?: Array<ConversationItemWithReference>;
};
event_id?: string;
}
/**
* Update session configuration
*/
interface SessionUpdateEvent {
type: "session.update";
session: Partial<RealtimeSession>;
event_id?: string;
}Events sent from server to client during the conversation.
/**
* Union of all server events (50+ event types)
*/
type RealtimeServerEvent =
| ConversationCreatedEvent
| ConversationItemCreatedEvent
| ConversationItemDeletedEvent
| ConversationItemAdded
| ConversationItemDone
| ConversationItemRetrieved
| ConversationItemTruncatedEvent
| ConversationItemInputAudioTranscriptionCompletedEvent
| ConversationItemInputAudioTranscriptionDeltaEvent
| ConversationItemInputAudioTranscriptionFailedEvent
| ConversationItemInputAudioTranscriptionSegment
| InputAudioBufferClearedEvent
| InputAudioBufferCommittedEvent
| InputAudioBufferSpeechStartedEvent
| InputAudioBufferSpeechStoppedEvent
| InputAudioBufferTimeoutTriggered
| OutputAudioBufferStarted
| OutputAudioBufferStopped
| OutputAudioBufferCleared
| ResponseCreatedEvent
| ResponseDoneEvent
| ResponseOutputItemAddedEvent
| ResponseOutputItemDoneEvent
| ResponseContentPartAddedEvent
| ResponseContentPartDoneEvent
| ResponseAudioDeltaEvent
| ResponseAudioDoneEvent
| ResponseAudioTranscriptDeltaEvent
| ResponseAudioTranscriptDoneEvent
| ResponseTextDeltaEvent
| ResponseTextDoneEvent
| ResponseFunctionCallArgumentsDeltaEvent
| ResponseFunctionCallArgumentsDoneEvent
| ResponseMcpCallArgumentsDelta
| ResponseMcpCallArgumentsDone
| ResponseMcpCallInProgress
| ResponseMcpCallCompleted
| ResponseMcpCallFailed
| McpListToolsInProgress
| McpListToolsCompleted
| McpListToolsFailed
| SessionCreatedEvent
| SessionUpdatedEvent
| RateLimitsUpdatedEvent
| RealtimeErrorEvent;
/**
* Session created (first event after connection)
*/
interface SessionCreatedEvent {
type: "session.created";
event_id: string;
session: RealtimeSession;
}
/**
* Session updated after client session.update
*/
interface SessionUpdatedEvent {
type: "session.updated";
event_id: string;
session: RealtimeSession;
}
/**
* Conversation created
*/
interface ConversationCreatedEvent {
type: "conversation.created";
event_id: string;
conversation: {
id?: string;
object?: "realtime.conversation";
};
}
/**
* Item created in conversation
*/
interface ConversationItemCreatedEvent {
type: "conversation.item.created";
event_id: string;
item: ConversationItem;
previous_item_id?: string | null;
}
/**
* Item added to conversation (may have partial content)
*/
interface ConversationItemAdded {
type: "conversation.item.added";
event_id: string;
item: ConversationItem;
previous_item_id?: string | null;
}
/**
* Item finalized with complete content
*/
interface ConversationItemDone {
type: "conversation.item.done";
event_id: string;
item: ConversationItem;
previous_item_id?: string | null;
}
/**
* Input audio buffer committed
*/
interface InputAudioBufferCommittedEvent {
type: "input_audio_buffer.committed";
event_id: string;
item_id: string;
previous_item_id?: string | null;
}
/**
* Speech detected in input buffer (VAD start)
*/
interface InputAudioBufferSpeechStartedEvent {
type: "input_audio_buffer.speech_started";
event_id: string;
item_id: string;
/** Milliseconds from session start */
audio_start_ms: number;
}
/**
* Speech ended in input buffer (VAD stop)
*/
interface InputAudioBufferSpeechStoppedEvent {
type: "input_audio_buffer.speech_stopped";
event_id: string;
item_id: string;
/** Milliseconds from session start */
audio_end_ms: number;
}
/**
* Response started
*/
interface ResponseCreatedEvent {
type: "response.created";
event_id: string;
response: RealtimeResponse;
}
/**
* Response completed
*/
interface ResponseDoneEvent {
type: "response.done";
event_id: string;
response: RealtimeResponse;
}
/**
* Audio delta (streaming audio chunk)
*/
interface ResponseAudioDeltaEvent {
type: "response.audio.delta";
event_id: string;
response_id: string;
item_id: string;
output_index: number;
content_index: number;
/** Base64-encoded audio bytes */
delta: string;
}
/**
* Audio generation completed
*/
interface ResponseAudioDoneEvent {
type: "response.audio.done";
event_id: string;
response_id: string;
item_id: string;
output_index: number;
content_index: number;
}
/**
* Text delta (streaming text chunk)
*/
interface ResponseTextDeltaEvent {
type: "response.text.delta";
event_id: string;
response_id: string;
item_id: string;
output_index: number;
content_index: number;
/** Text chunk */
delta: string;
}
/**
* Text generation completed
*/
interface ResponseTextDoneEvent {
type: "response.text.done";
event_id: string;
response_id: string;
item_id: string;
output_index: number;
content_index: number;
/** Complete text */
text: string;
}
/**
* Function call arguments delta
*/
interface ResponseFunctionCallArgumentsDeltaEvent {
type: "response.function_call_arguments.delta";
event_id: string;
response_id: string;
item_id: string;
output_index: number;
call_id: string;
/** JSON arguments chunk */
delta: string;
}
/**
* Function call arguments completed
*/
interface ResponseFunctionCallArgumentsDoneEvent {
type: "response.function_call_arguments.done";
event_id: string;
response_id: string;
item_id: string;
output_index: number;
call_id: string;
/** Complete JSON arguments */
arguments: string;
}
/**
* Error occurred
*/
interface RealtimeErrorEvent {
type: "error";
event_id: string;
error: {
type: string;
code?: string | null;
message: string;
param?: string | null;
event_id?: string | null;
};
}Items that make up the conversation history.
/**
* Union of all conversation item types
*/
type ConversationItem =
| RealtimeConversationItemSystemMessage
| RealtimeConversationItemUserMessage
| RealtimeConversationItemAssistantMessage
| RealtimeConversationItemFunctionCall
| RealtimeConversationItemFunctionCallOutput
| RealtimeMcpApprovalResponse
| RealtimeMcpListTools
| RealtimeMcpToolCall
| RealtimeMcpApprovalRequest;
/**
* System message item
*/
interface RealtimeConversationItemSystemMessage {
type: "message";
role: "system";
content: Array<{
type?: "input_text";
text?: string;
}>;
id?: string;
object?: "realtime.item";
status?: "completed" | "incomplete" | "in_progress";
}
/**
* User message item (text, audio, or image)
*/
interface RealtimeConversationItemUserMessage {
type: "message";
role: "user";
content: Array<{
type?: "input_text" | "input_audio" | "input_image";
text?: string;
audio?: string; // Base64-encoded
transcript?: string;
image_url?: string; // Data URI
detail?: "auto" | "low" | "high";
}>;
id?: string;
object?: "realtime.item";
status?: "completed" | "incomplete" | "in_progress";
}
/**
* Assistant message item (text or audio)
*/
interface RealtimeConversationItemAssistantMessage {
type: "message";
role: "assistant";
content: Array<{
type?: "output_text" | "output_audio";
text?: string;
audio?: string; // Base64-encoded
transcript?: string;
}>;
id?: string;
object?: "realtime.item";
status?: "completed" | "incomplete" | "in_progress";
}
/**
* Function call item
*/
interface RealtimeConversationItemFunctionCall {
type: "function_call";
name: string;
/** JSON-encoded arguments */
arguments: string;
call_id?: string;
id?: string;
object?: "realtime.item";
status?: "completed" | "incomplete" | "in_progress";
}
/**
* Function call output item
*/
interface RealtimeConversationItemFunctionCallOutput {
type: "function_call_output";
call_id: string;
/** Function output (free text) */
output: string;
id?: string;
object?: "realtime.item";
status?: "completed" | "incomplete" | "in_progress";
}
/**
* MCP tool call item
*/
interface RealtimeMcpToolCall {
type: "mcp_call";
id: string;
server_label: string;
name: string;
arguments: string;
output?: string | null;
error?:
| { type: "protocol_error"; code: number; message: string }
| { type: "tool_execution_error"; message: string }
| { type: "http_error"; code: number; message: string }
| null;
approval_request_id?: string | null;
}
/**
* MCP approval request item
*/
interface RealtimeMcpApprovalRequest {
type: "mcp_approval_request";
id: string;
server_label: string;
name: string;
arguments: string;
}
/**
* MCP approval response item
*/
interface RealtimeMcpApprovalResponse {
type: "mcp_approval_response";
id: string;
approval_request_id: string;
approve: boolean;
reason?: string | null;
}Define and use tools during real-time conversations.
/**
* Function tool definition for realtime conversations
*/
interface RealtimeFunctionTool {
type?: "function";
/** Function name */
name?: string;
/** Description and usage guidance */
description?: string;
/** JSON Schema for function parameters */
parameters?: unknown;
}
/**
* MCP (Model Context Protocol) tool configuration
*/
interface McpTool {
type: "mcp";
/** Label identifying the MCP server */
server_label: string;
/** MCP server URL or connector ID */
server_url?: string;
connector_id?:
| "connector_dropbox"
| "connector_gmail"
| "connector_googlecalendar"
| "connector_googledrive"
| "connector_microsoftteams"
| "connector_outlookcalendar"
| "connector_outlookemail"
| "connector_sharepoint";
/** Server description */
server_description?: string;
/** Allowed tools filter */
allowed_tools?:
| Array<string>
| {
tool_names?: Array<string>;
read_only?: boolean;
}
| null;
/** Approval requirements */
require_approval?:
| "always"
| "never"
| {
always?: { tool_names?: Array<string>; read_only?: boolean };
never?: { tool_names?: Array<string>; read_only?: boolean };
}
| null;
/** OAuth access token */
authorization?: string;
/** HTTP headers */
headers?: Record<string, string> | null;
}
type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;
type RealtimeToolChoiceConfig =
| "auto"
| "none"
| "required"
| { type: "function"; function: { name: string } }
| { type: "mcp"; mcp: { server_label: string; name: string } };Usage:
// Define tools
const tools: RealtimeToolsConfig = [
{
type: "function",
name: "get_weather",
description: "Get current weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string" },
unit: { type: "string", enum: ["celsius", "fahrenheit"] },
},
required: ["location"],
},
},
{
type: "mcp",
server_label: "calendar",
connector_id: "connector_googlecalendar",
allowed_tools: {
tool_names: ["list_events", "create_event"],
},
},
];
// Update session with tools
ws.send({
type: "session.update",
session: {
tools,
tool_choice: "auto",
},
});
// Handle function call
ws.on("response.function_call_arguments.done", async (event) => {
const result = await executeFunction(event.call_id, event.arguments);
// Send function output
ws.send({
type: "conversation.item.create",
item: {
type: "function_call_output",
call_id: event.call_id,
output: JSON.stringify(result),
},
});
// Trigger new response
ws.send({
type: "response.create",
});
});Configure individual response parameters.
/**
* Response resource
*/
interface RealtimeResponse {
id?: string;
object?: "realtime.response";
/** Conversation ID or null */
conversation_id?: string;
/** Status: 'in_progress', 'completed', 'cancelled', 'failed', 'incomplete' */
status?: RealtimeResponseStatus;
/** Usage statistics */
usage?: RealtimeResponseUsage;
/** Max output tokens */
max_output_tokens?: number | "inf";
/** Response modalities */
modalities?: Array<"text" | "audio">;
/** Instructions for this response */
instructions?: string;
/** Voice selection */
voice?: string;
/** Audio output configuration */
audio?: {
format?: RealtimeAudioFormats;
speed?: number;
voice?: string;
};
/** Response metadata */
metadata?: Record<string, string> | null;
/** Tool choice */
tool_choice?: RealtimeToolChoiceConfig;
/** Tools for this response */
tools?: RealtimeToolsConfig;
/** Temperature */
temperature?: number;
/** Output items */
output?: Array<ConversationItem>;
/** Status details */
status_details?: {
type?: "incomplete" | "failed" | "cancelled";
reason?: string;
error?: RealtimeError | null;
} | null;
}
interface RealtimeResponseStatus {
type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
/** Additional status information */
reason?: string;
}
interface RealtimeResponseUsage {
/** Total tokens (input + output) */
total_tokens?: number;
/** Input tokens */
input_tokens?: number;
/** Output tokens */
output_tokens?: number;
/** Input token breakdown */
input_token_details?: {
text_tokens?: number;
audio_tokens?: number;
image_tokens?: number;
cached_tokens?: number;
cached_tokens_details?: {
text_tokens?: number;
audio_tokens?: number;
image_tokens?: number;
};
};
/** Output token breakdown */
output_token_details?: {
text_tokens?: number;
audio_tokens?: number;
};
}Configure and receive audio transcription during conversations.
/**
* Transcription configuration
*/
interface AudioTranscription {
/** Language code (ISO-639-1) */
language?: string;
/** Transcription model */
model?:
| "whisper-1"
| "gpt-4o-mini-transcribe"
| "gpt-4o-transcribe"
| "gpt-4o-transcribe-diarize";
/** Guidance prompt */
prompt?: string;
}
/**
* Transcription completed event
*/
interface ConversationItemInputAudioTranscriptionCompletedEvent {
type: "conversation.item.input_audio_transcription.completed";
event_id: string;
item_id: string;
content_index: number;
/** Transcribed text */
transcript: string;
/** Usage statistics */
usage:
| {
type: "tokens";
input_tokens: number;
output_tokens: number;
total_tokens: number;
input_token_details?: {
text_tokens?: number;
audio_tokens?: number;
};
}
| {
type: "duration";
/** Duration in seconds */
seconds: number;
};
/** Log probabilities (if enabled) */
logprobs?: Array<{
token: string;
logprob: number;
bytes: Array<number>;
}> | null;
}
/**
* Transcription delta event (streaming)
*/
interface ConversationItemInputAudioTranscriptionDeltaEvent {
type: "conversation.item.input_audio_transcription.delta";
event_id: string;
item_id: string;
content_index?: number;
/** Transcript chunk */
delta?: string;
/** Log probabilities (if enabled) */
logprobs?: Array<{
token: string;
logprob: number;
bytes: Array<number>;
}> | null;
}
/**
* Transcription segment (for diarization)
*/
interface ConversationItemInputAudioTranscriptionSegment {
type: "conversation.item.input_audio_transcription.segment";
event_id: string;
item_id: string;
content_index: number;
id: string;
/** Segment text */
text: string;
/** Speaker label */
speaker: string;
/** Start time in seconds */
start: number;
/** End time in seconds */
end: number;
}
/**
* Transcription failed event
*/
interface ConversationItemInputAudioTranscriptionFailedEvent {
type: "conversation.item.input_audio_transcription.failed";
event_id: string;
item_id: string;
content_index: number;
error: {
type?: string;
code?: string;
message?: string;
param?: string;
};
}Usage:
// Enable transcription with log probabilities
ws.send({
type: "session.update",
session: {
input_audio_transcription: {
model: "gpt-4o-transcribe",
language: "en",
},
include: ["item.input_audio_transcription.logprobs"],
},
});
// Listen for transcription
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
console.log("Transcript delta:", event.delta);
});
ws.on(
"conversation.item.input_audio_transcription.completed",
(event) => {
console.log("Full transcript:", event.transcript);
console.log("Usage:", event.usage);
}
);
// Diarization support
ws.send({
type: "session.update",
session: {
input_audio_transcription: {
model: "gpt-4o-transcribe-diarize",
},
},
});
ws.on(
"conversation.item.input_audio_transcription.segment",
(event) => {
console.log(
`[${event.speaker}] ${event.text} (${event.start}s - ${event.end}s)`
);
}
);Handle errors and edge cases in real-time conversations.
/**
* Error event from server
*/
interface RealtimeErrorEvent {
type: "error";
event_id: string;
error: RealtimeError;
}
interface RealtimeError {
/** Error type */
type: string;
/** Error code (optional) */
code?: string | null;
/** Human-readable message */
message: string;
/** Related parameter (optional) */
param?: string | null;
/** Client event ID that caused error (optional) */
event_id?: string | null;
}
/**
* OpenAI Realtime error class
*/
class OpenAIRealtimeError extends Error {
constructor(message: string);
}Common Error Types:
// Invalid request errors
{
type: "invalid_request_error",
code: "invalid_value",
message: "Invalid value for 'audio_format'",
param: "audio_format"
}
// Server errors
{
type: "server_error",
message: "Internal server error"
}
// Rate limit errors
{
type: "rate_limit_error",
message: "Rate limit exceeded"
}Usage:
ws.on("error", (event: RealtimeErrorEvent) => {
console.error("Realtime error:", event.error);
if (event.error.type === "rate_limit_error") {
// Handle rate limiting
} else if (event.error.type === "invalid_request_error") {
// Handle validation errors
console.error("Invalid:", event.error.param, event.error.message);
}
});
// WebSocket errors
ws.socket.addEventListener("error", (error) => {
console.error("WebSocket error:", error);
});Monitor rate limits during conversations.
/**
* Rate limits updated event
*/
interface RateLimitsUpdatedEvent {
type: "rate_limits.updated";
event_id: string;
rate_limits: Array<{
/** Rate limit name: 'requests' or 'tokens' */
name?: "requests" | "tokens";
/** Maximum allowed value */
limit?: number;
/** Remaining before limit reached */
remaining?: number;
/** Seconds until reset */
reset_seconds?: number;
}>;
}Usage:
ws.on("rate_limits.updated", (event: RateLimitsUpdatedEvent) => {
event.rate_limits.forEach((limit) => {
console.log(`${limit.name}: ${limit.remaining}/${limit.limit}`);
console.log(`Resets in ${limit.reset_seconds}s`);
});
});Configure distributed tracing for debugging and monitoring.
/**
* Tracing configuration
*/
type RealtimeTracingConfig =
| "auto"
| {
/** Workflow name in Traces Dashboard */
workflow_name?: string;
/** Group ID for filtering */
group_id?: string;
/** Arbitrary metadata */
metadata?: unknown;
}
| null;Usage:
// Auto tracing with defaults
{
tracing: "auto";
}
// Custom tracing configuration
{
tracing: {
workflow_name: "customer-support-bot",
group_id: "prod-us-west",
metadata: {
customer_id: "cust_123",
agent_version: "2.1.0"
}
}
}
// Disable tracing
{
tracing: null;
}import OpenAI from "openai";
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
const client = new OpenAI();
// Create session token
const secret = await client.realtime.clientSecrets.create({
session: {
type: "realtime",
model: "gpt-realtime",
audio: {
input: {
format: { type: "audio/pcm", rate: 24000 },
turn_detection: {
type: "server_vad",
threshold: 0.5,
silence_duration_ms: 500,
interrupt_response: true,
},
transcription: {
model: "gpt-4o-transcribe",
},
},
output: {
format: { type: "audio/pcm", rate: 24000 },
voice: "marin",
},
},
instructions:
"You are a helpful voice assistant. Speak naturally and concisely.",
tools: [
{
type: "function",
name: "get_weather",
description: "Get weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string" },
},
required: ["location"],
},
},
],
},
});
// Connect WebSocket
const ws = await OpenAIRealtimeWebSocket.create(client, {
model: "gpt-realtime",
});
// Handle session
ws.on("session.created", (event) => {
console.log("Session created:", event.session.id);
});
// Handle conversation
ws.on("conversation.item.created", (event) => {
console.log("Item created:", event.item.type);
});
// Handle audio output
ws.on("response.audio.delta", (event) => {
const audioData = Buffer.from(event.delta, "base64");
playAudio(audioData); // Play to speaker
});
// Handle transcripts
ws.on("conversation.item.input_audio_transcription.completed", (event) => {
console.log("User said:", event.transcript);
});
ws.on("response.audio_transcript.delta", (event) => {
process.stdout.write(event.delta);
});
// Handle VAD
ws.on("input_audio_buffer.speech_started", () => {
console.log("User started speaking");
stopAudioPlayback(); // Interrupt assistant
});
ws.on("input_audio_buffer.speech_stopped", () => {
console.log("User stopped speaking");
});
// Handle function calls
ws.on("response.function_call_arguments.done", async (event) => {
console.log("Function call:", event.call_id);
const args = JSON.parse(event.arguments);
const result = await getWeather(args.location);
// Send result
ws.send({
type: "conversation.item.create",
item: {
type: "function_call_output",
call_id: event.call_id,
output: JSON.stringify(result),
},
});
// Continue conversation
ws.send({
type: "response.create",
});
});
// Handle errors
ws.on("error", (event) => {
console.error("Error:", event.error.message);
});
// Capture and send microphone audio
const audioStream = captureMicrophone();
audioStream.on("data", (chunk) => {
const base64 = chunk.toString("base64");
ws.send({
type: "input_audio_buffer.append",
audio: base64,
});
});
// Cleanup
process.on("SIGINT", () => {
ws.close();
process.exit(0);
});import OpenAI from "openai";
import express from "express";
const client = new OpenAI();
const app = express();
app.use(express.json());
// Webhook for incoming calls
app.post("/realtime/webhook/incoming_call", async (req, res) => {
const event = req.body;
if (event.type === "realtime.call.incoming") {
const callId = event.data.id;
// Accept the call
await client.realtime.calls.accept(callId, {
type: "realtime",
model: "gpt-realtime",
instructions:
"You are a customer service agent. Be professional and helpful.",
audio: {
input: {
format: { type: "audio/pcmu" }, // G.711 for telephony
turn_detection: {
type: "server_vad",
silence_duration_ms: 700,
},
},
output: {
format: { type: "audio/pcmu" },
voice: "marin",
},
},
tools: [
{
type: "function",
name: "transfer_to_agent",
description: "Transfer to human agent",
parameters: {
type: "object",
properties: {
reason: { type: "string" },
},
},
},
],
});
console.log(`Accepted call: ${callId}`);
}
res.sendStatus(200);
});
// Webhook for call events
app.post("/realtime/webhook/call_events", async (req, res) => {
const event = req.body;
if (event.type === "realtime.response.function_call_output.done") {
const { call_id, function_name, arguments: args } = event.data;
if (function_name === "transfer_to_agent") {
// Transfer call
await client.realtime.calls.refer(call_id, {
target_uri: "sip:support@example.com",
});
}
}
res.sendStatus(200);
});
app.listen(3000, () => {
console.log("Webhook server running on port 3000");
});type RealtimeClientEvent =
| ConversationItemCreateEvent
| ConversationItemDeleteEvent
| ConversationItemRetrieveEvent
| ConversationItemTruncateEvent
| InputAudioBufferAppendEvent
| InputAudioBufferClearEvent
| OutputAudioBufferClearEvent
| InputAudioBufferCommitEvent
| ResponseCancelEvent
| ResponseCreateEvent
| SessionUpdateEvent;
type RealtimeServerEvent =
| ConversationCreatedEvent
| ConversationItemCreatedEvent
| ConversationItemDeletedEvent
| ConversationItemAdded
| ConversationItemDone
| ConversationItemRetrieved
| ConversationItemTruncatedEvent
| ConversationItemInputAudioTranscriptionCompletedEvent
| ConversationItemInputAudioTranscriptionDeltaEvent
| ConversationItemInputAudioTranscriptionFailedEvent
| ConversationItemInputAudioTranscriptionSegment
| InputAudioBufferClearedEvent
| InputAudioBufferCommittedEvent
| InputAudioBufferSpeechStartedEvent
| InputAudioBufferSpeechStoppedEvent
| InputAudioBufferTimeoutTriggered
| OutputAudioBufferStarted
| OutputAudioBufferStopped
| OutputAudioBufferCleared
| ResponseCreatedEvent
| ResponseDoneEvent
| ResponseOutputItemAddedEvent
| ResponseOutputItemDoneEvent
| ResponseContentPartAddedEvent
| ResponseContentPartDoneEvent
| ResponseAudioDeltaEvent
| ResponseAudioDoneEvent
| ResponseAudioTranscriptDeltaEvent
| ResponseAudioTranscriptDoneEvent
| ResponseTextDeltaEvent
| ResponseTextDoneEvent
| ResponseFunctionCallArgumentsDeltaEvent
| ResponseFunctionCallArgumentsDoneEvent
| ResponseMcpCallArgumentsDelta
| ResponseMcpCallArgumentsDone
| ResponseMcpCallInProgress
| ResponseMcpCallCompleted
| ResponseMcpCallFailed
| McpListToolsInProgress
| McpListToolsCompleted
| McpListToolsFailed
| SessionCreatedEvent
| SessionUpdatedEvent
| RateLimitsUpdatedEvent
| RealtimeErrorEvent;
type ConversationItem =
| RealtimeConversationItemSystemMessage
| RealtimeConversationItemUserMessage
| RealtimeConversationItemAssistantMessage
| RealtimeConversationItemFunctionCall
| RealtimeConversationItemFunctionCallOutput
| RealtimeMcpApprovalResponse
| RealtimeMcpListTools
| RealtimeMcpToolCall
| RealtimeMcpApprovalRequest;
interface RealtimeSession {
id?: string;
object?: "realtime.session";
model?: string;
expires_at?: number;
modalities?: Array<"text" | "audio">;
instructions?: string;
voice?: string;
input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
input_audio_transcription?: AudioTranscription | null;
turn_detection?: RealtimeAudioInputTurnDetection | null;
tools?: Array<RealtimeFunctionTool>;
tool_choice?: string;
temperature?: number;
max_response_output_tokens?: number | "inf";
speed?: number;
input_audio_noise_reduction?: {
type?: NoiseReductionType;
};
include?: Array<"item.input_audio_transcription.logprobs"> | null;
prompt?: ResponsePrompt | null;
tracing?: RealtimeTracingConfig | null;
truncation?: RealtimeTruncation;
}
interface RealtimeResponse {
id?: string;
object?: "realtime.response";
status?: RealtimeResponseStatus;
conversation_id?: string;
output?: Array<ConversationItem>;
usage?: RealtimeResponseUsage;
status_details?: {
type?: "incomplete" | "failed" | "cancelled";
reason?: string;
error?: RealtimeError | null;
} | null;
max_output_tokens?: number | "inf";
modalities?: Array<"text" | "audio">;
instructions?: string;
voice?: string;
audio?: {
format?: RealtimeAudioFormats;
speed?: number;
voice?: string;
};
metadata?: Record<string, string> | null;
tool_choice?: RealtimeToolChoiceConfig;
tools?: RealtimeToolsConfig;
temperature?: number;
}
interface AudioTranscription {
language?: string;
model?:
| "whisper-1"
| "gpt-4o-mini-transcribe"
| "gpt-4o-transcribe"
| "gpt-4o-transcribe-diarize";
prompt?: string;
}
type RealtimeAudioFormats =
| { type?: "audio/pcm"; rate?: 24000 }
| { type?: "audio/pcmu" }
| { type?: "audio/pcma" };
type NoiseReductionType = "near_field" | "far_field";
type RealtimeAudioInputTurnDetection =
| {
type: "server_vad";
threshold?: number;
prefix_padding_ms?: number;
silence_duration_ms?: number;
create_response?: boolean;
interrupt_response?: boolean;
idle_timeout_ms?: number | null;
}
| {
type: "semantic_vad";
eagerness?: "low" | "medium" | "high" | "auto";
create_response?: boolean;
interrupt_response?: boolean;
};
type RealtimeTruncation =
| "auto"
| "disabled"
| { type: "retention_ratio"; retention_ratio: number };
type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;
type RealtimeToolChoiceConfig =
| "auto"
| "none"
| "required"
| { type: "function"; function: { name: string } }
| { type: "mcp"; mcp: { server_label: string; name: string } };
type RealtimeTracingConfig =
| "auto"
| {
workflow_name?: string;
group_id?: string;
metadata?: unknown;
}
| null;
interface RealtimeError {
type: string;
code?: string | null;
message: string;
param?: string | null;
event_id?: string | null;
}
interface RealtimeResponseUsage {
total_tokens?: number;
input_tokens?: number;
output_tokens?: number;
input_token_details?: {
text_tokens?: number;
audio_tokens?: number;
image_tokens?: number;
cached_tokens?: number;
cached_tokens_details?: {
text_tokens?: number;
audio_tokens?: number;
image_tokens?: number;
};
};
output_token_details?: {
text_tokens?: number;
audio_tokens?: number;
};
}
interface RealtimeResponseStatus {
type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
reason?: string;
}Available Realtime API models:
gpt-realtime (latest)gpt-realtime-2025-08-28gpt-4o-realtime-previewgpt-4o-realtime-preview-2024-10-01gpt-4o-realtime-preview-2024-12-17gpt-4o-realtime-preview-2025-06-03gpt-4o-mini-realtime-previewgpt-4o-mini-realtime-preview-2024-12-17gpt-realtime-minigpt-realtime-mini-2025-10-06gpt-audio-minigpt-audio-mini-2025-10-06marin or cedar for best qualityprevious_item_id for precise insertionconst ws = await OpenAIRealtimeWebSocket.create(client, {
model: "gpt-realtime",
});
// Microphone → Input Buffer
micStream.on("data", (chunk) => {
ws.send({
type: "input_audio_buffer.append",
audio: chunk.toString("base64"),
});
});
// Output Audio → Speaker
ws.on("response.audio.delta", (event) => {
playAudio(Buffer.from(event.delta, "base64"));
});
// VAD-based interruption
ws.on("input_audio_buffer.speech_started", () => {
stopPlayback();
});// Send text message
ws.send({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [{ type: "input_text", text: "Hello!" }],
},
});
// Request audio response
ws.send({
type: "response.create",
response: {
modalities: ["audio"],
},
});ws.on("response.audio_transcript.delta", (event) => {
updateSubtitles(event.delta);
});
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
updateUserTranscript(event.delta);
});const tools = [
{
type: "function",
name: "search_database",
description: "Search customer database",
parameters: {
/* ... */
},
},
{
type: "mcp",
server_label: "calendar",
connector_id: "connector_googlecalendar",
},
];
ws.send({
type: "session.update",
session: { tools, tool_choice: "auto" },
});Install with Tessl CLI
npx tessl i tessl/npm-openai