Tessl Tile for npm/openai@6.9.1

or run

npx @tessl/cli init

realtime.mddocs/

0
# Realtime API
1

2
The Realtime API provides WebSocket-based real-time voice conversations with OpenAI models. It supports bidirectional audio streaming, server-side voice activity detection (VAD), function calling, and full conversation management. The API is designed for live voice applications including phone calls, voice assistants, and interactive conversational experiences.
3

4
## Package Information
5

6
- **Package Name**: openai
7
- **Package Type**: npm
8
- **Language**: TypeScript
9
- **Installation**: `npm install openai`
10

11
## API Status
12

13
The Realtime API is now generally available (GA) at `client.realtime.*`.
14

15
**Deprecation Notice**: The legacy beta Realtime API at `client.beta.realtime.*` is deprecated. If you are using the beta API, migrate to the GA API documented here. The beta API includes:
16
- `client.beta.realtime.sessions.create()` (deprecated - use `client.realtime.clientSecrets.create()` instead)
17
- `client.beta.realtime.transcriptionSessions.create()` (deprecated)
18

19
All new projects should use the GA Realtime API (`client.realtime.*`) documented on this page.
20

21
## Core Imports
22

23
```typescript
24
import OpenAI from "openai";
25
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket"; // Browser
26
import { OpenAIRealtimeWS } from "openai/realtime/ws"; // Node.js (requires 'ws' package)
27
```
28

29
## WebSocket Clients
30

31
The Realtime API provides two WebSocket client implementations for different runtime environments:
32

33
### OpenAIRealtimeWebSocket (Browser)
34

35
For browser environments, use `OpenAIRealtimeWebSocket` which uses the native browser WebSocket API.
36

37
```typescript
38
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
39
import OpenAI from "openai";
40

41
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
42

43
const ws = new OpenAIRealtimeWebSocket(
44
  {
45
    model: "gpt-realtime",
46
    dangerouslyAllowBrowser: true, // Required for browser use
47
  },
48
  client
49
);
50

51
// Event handling
52
ws.on("session.created", (event) => {
53
  console.log("Session started:", event.session.id);
54
});
55

56
ws.on("response.audio.delta", (event) => {
57
  // Handle audio deltas - event.delta is base64 encoded audio
58
  const audioData = atob(event.delta);
59
  playAudio(audioData);
60
});
61

62
ws.on("error", (error) => {
63
  console.error("WebSocket error:", error);
64
});
65

66
// Send audio to the server
67
function sendAudio(audioData: ArrayBuffer) {
68
  const base64Audio = btoa(String.fromCharCode(...new Uint8Array(audioData)));
69
  ws.send({
70
    type: "input_audio_buffer.append",
71
    audio: base64Audio,
72
  });
73
}
74

75
// Commit audio buffer to trigger processing
76
ws.send({
77
  type: "input_audio_buffer.commit",
78
});
79

80
// Close connection
81
ws.close();
82
```
83

84
**Key features:**
85
- Uses native browser WebSocket API
86
- Requires `dangerouslyAllowBrowser: true` in configuration
87
- Audio must be base64 encoded
88
- Automatic reconnection handling
89
- Built-in event emitter for all realtime events
90

91
### OpenAIRealtimeWS (Node.js)
92

93
For Node.js environments, use `OpenAIRealtimeWS` which uses the `ws` package for WebSocket support.
94

95
```typescript
96
import { OpenAIRealtimeWS } from "openai/realtime/ws";
97
import OpenAI from "openai";
98
import fs from "fs";
99

100
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
101

102
const ws = new OpenAIRealtimeWS(
103
  {
104
    model: "gpt-realtime",
105
  },
106
  client
107
);
108

109
// Event handling (same interface as browser version)
110
ws.on("session.created", (event) => {
111
  console.log("Session started:", event.session.id);
112
});
113

114
ws.on("response.audio.delta", (event) => {
115
  // Handle audio deltas
116
  const audioBuffer = Buffer.from(event.delta, "base64");
117
  // Write to file or stream to audio output
118
  fs.appendFileSync("output.pcm", audioBuffer);
119
});
120

121
ws.on("response.done", (event) => {
122
  console.log("Response complete:", event.response.id);
123
});
124

125
// Send audio from file or buffer
126
function sendAudioFromFile(filePath: string) {
127
  const audioBuffer = fs.readFileSync(filePath);
128
  const base64Audio = audioBuffer.toString("base64");
129

130
  ws.send({
131
    type: "input_audio_buffer.append",
132
    audio: base64Audio,
133
  });
134
}
135

136
// Trigger response generation
137
ws.send({
138
  type: "input_audio_buffer.commit",
139
});
140

141
// Close connection
142
ws.close();
143
```
144

145
**Key features:**
146
- Uses `ws` package for WebSocket support (add to dependencies: `npm install ws @types/ws`)
147
- Same event interface as browser version for consistency
148
- Better Node.js stream integration
149
- Automatic reconnection handling
150
- Suitable for server-side applications
151

152
### Common Event Patterns
153

154
Both WebSocket clients support the same event handling interface:
155

156
```typescript
157
// Connection events
158
ws.on("session.created", (event) => { /* Session initialization */ });
159
ws.on("session.updated", (event) => { /* Session configuration changed */ });
160

161
// Conversation events
162
ws.on("conversation.created", (event) => { /* New conversation */ });
163
ws.on("conversation.item.created", (event) => { /* New item added */ });
164
ws.on("conversation.item.deleted", (event) => { /* Item removed */ });
165

166
// Audio events (streaming)
167
ws.on("response.audio.delta", (event) => { /* Audio chunk received */ });
168
ws.on("response.audio.done", (event) => { /* Audio complete */ });
169
ws.on("response.audio_transcript.delta", (event) => { /* Transcript chunk */ });
170
ws.on("response.audio_transcript.done", (event) => { /* Transcript complete */ });
171

172
// Response events
173
ws.on("response.created", (event) => { /* Response started */ });
174
ws.on("response.done", (event) => { /* Response complete */ });
175
ws.on("response.cancelled", (event) => { /* Response cancelled */ });
176
ws.on("response.failed", (event) => { /* Response failed */ });
177

178
// Function calling events
179
ws.on("response.function_call_arguments.delta", (event) => { /* Function args streaming */ });
180
ws.on("response.function_call_arguments.done", (event) => { /* Function args complete */ });
181

182
// Error events
183
ws.on("error", (error) => { /* WebSocket or API error */ });
184
ws.on("close", (event) => { /* Connection closed */ });
185
```
186

187
### Sending Commands
188

189
Both clients use the same `.send()` method for sending commands:
190

191
```typescript
192
// Append audio to input buffer
193
ws.send({
194
  type: "input_audio_buffer.append",
195
  audio: base64AudioString,
196
});
197

198
// Commit audio buffer (triggers VAD or manual processing)
199
ws.send({
200
  type: "input_audio_buffer.commit",
201
});
202

203
// Clear audio buffer
204
ws.send({
205
  type: "input_audio_buffer.clear",
206
});
207

208
// Update session configuration
209
ws.send({
210
  type: "session.update",
211
  session: {
212
    instructions: "You are a helpful assistant.",
213
    turn_detection: { type: "server_vad" },
214
  },
215
});
216

217
// Create conversation item (text message)
218
ws.send({
219
  type: "conversation.item.create",
220
  item: {
221
    type: "message",
222
    role: "user",
223
    content: [{ type: "input_text", text: "Hello!" }],
224
  },
225
});
226

227
// Trigger response generation
228
ws.send({
229
  type: "response.create",
230
  response: {
231
    modalities: ["text", "audio"],
232
    instructions: "Respond briefly.",
233
  },
234
});
235

236
// Cancel in-progress response
237
ws.send({
238
  type: "response.cancel",
239
});
240
```
241

242
### Connection Lifecycle
243

244
Both clients handle connection lifecycle automatically:
245

246
```typescript
247
const ws = new OpenAIRealtimeWS({ model: "gpt-realtime" }, client);
248

249
// Connection opens automatically
250
ws.on("session.created", (event) => {
251
  console.log("Connected and ready");
252
});
253

254
// Handle disconnections
255
ws.on("close", (event) => {
256
  console.log("Connection closed:", event.code, event.reason);
257
});
258

259
// Handle errors
260
ws.on("error", (error) => {
261
  console.error("Connection error:", error);
262
});
263

264
// Manually close connection
265
ws.close();
266
```
267

268
## Basic Usage
269

270
### Creating a Session Token
271

272
```typescript
273
import OpenAI from "openai";
274

275
const client = new OpenAI({
276
  apiKey: process.env.OPENAI_API_KEY,
277
});
278

279
// Create an ephemeral session token for client-side use
280
const response = await client.realtime.clientSecrets.create({
281
  session: {
282
    type: "realtime",
283
    model: "gpt-realtime",
284
    audio: {
285
      input: {
286
        format: { type: "audio/pcm", rate: 24000 },
287
        turn_detection: {
288
          type: "server_vad",
289
          threshold: 0.5,
290
          silence_duration_ms: 500,
291
        },
292
      },
293
      output: {
294
        format: { type: "audio/pcm", rate: 24000 },
295
        voice: "marin",
296
      },
297
    },
298
  },
299
});
300

301
const sessionToken = response.value;
302
```
303

304
### Connecting via WebSocket
305

306
```typescript
307
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
308

309
const ws = new OpenAIRealtimeWebSocket(
310
  {
311
    model: "gpt-realtime",
312
    dangerouslyAllowBrowser: false,
313
  },
314
  client
315
);
316

317
// Listen for events
318
ws.on("session.created", (event) => {
319
  console.log("Session created:", event);
320
});
321

322
ws.on("conversation.item.created", (event) => {
323
  console.log("Item created:", event.item);
324
});
325

326
ws.on("response.audio.delta", (event) => {
327
  // Handle audio delta
328
  const audioData = Buffer.from(event.delta, "base64");
329
  playAudio(audioData);
330
});
331

332
// Send audio
333
ws.send({
334
  type: "input_audio_buffer.append",
335
  audio: audioBase64String,
336
});
337

338
// Commit audio buffer
339
ws.send({
340
  type: "input_audio_buffer.commit",
341
});
342
```
343

344
## Architecture
345

346
The Realtime API operates through a WebSocket connection with an event-driven architecture:
347

348
- **Session Management**: Create ephemeral tokens server-side, connect from client
349
- **Audio Streaming**: Bidirectional PCM16/G.711 audio at 24kHz
350
- **Event System**: 50+ client-to-server and server-to-client events
351
- **VAD Integration**: Server-side voice activity detection with configurable parameters
352
- **Conversation Context**: Automatic conversation history management
353
- **Function Calling**: Real-time tool execution during conversations
354
- **Phone Integration**: SIP/WebRTC support for phone calls
355

356
## Capabilities
357

358
### Session Token Creation
359

360
Generate ephemeral session tokens for secure client-side WebSocket connections.
361

362
```typescript { .api }
363
/**
364
 * Create a Realtime client secret with an associated session configuration.
365
 * Returns an ephemeral token with 1-minute default TTL (configurable up to 2 hours).
366
 */
367
function create(
368
  params: ClientSecretCreateParams
369
): Promise<ClientSecretCreateResponse>;
370

371
interface ClientSecretCreateParams {
372
  /** Configuration for the client secret expiration */
373
  expires_after?: {
374
    /** Anchor point for expiration (only 'created_at' is supported) */
375
    anchor?: "created_at";
376
    /** Seconds from anchor to expiration (10-7200, defaults to 600) */
377
    seconds?: number;
378
  };
379
  /** Session configuration (realtime or transcription session) */
380
  session?:
381
    | RealtimeSessionCreateRequest
382
    | RealtimeTranscriptionSessionCreateRequest;
383
}
384

385
interface ClientSecretCreateResponse {
386
  /** Expiration timestamp in seconds since epoch */
387
  expires_at: number;
388
  /** The session configuration */
389
  session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse;
390
  /** The generated client secret value */
391
  value: string;
392
}
393

394
interface RealtimeSessionCreateResponse {
395
  /** Ephemeral key for client environments */
396
  client_secret: {
397
    expires_at: number;
398
    value: string;
399
  };
400
  /** Session type: always 'realtime' */
401
  type: "realtime";
402
  /** Audio configuration */
403
  audio?: {
404
    input?: {
405
      format?: RealtimeAudioFormats;
406
      noise_reduction?: { type?: NoiseReductionType };
407
      transcription?: AudioTranscription;
408
      turn_detection?: ServerVad | SemanticVad | null;
409
    };
410
    output?: {
411
      format?: RealtimeAudioFormats;
412
      speed?: number;
413
      voice?: string;
414
    };
415
  };
416
  /** Fields to include in server outputs */
417
  include?: Array<"item.input_audio_transcription.logprobs">;
418
  /** System instructions for the model */
419
  instructions?: string;
420
  /** Max output tokens (1-4096 or 'inf') */
421
  max_output_tokens?: number | "inf";
422
  /** Realtime model to use */
423
  model?: string;
424
  /** Output modalities ('text' | 'audio') */
425
  output_modalities?: Array<"text" | "audio">;
426
  /** Prompt template reference */
427
  prompt?: ResponsePrompt | null;
428
  /** Tool choice configuration */
429
  tool_choice?: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp;
430
  /** Available tools */
431
  tools?: Array<RealtimeFunctionTool | McpTool>;
432
  /** Tracing configuration */
433
  tracing?: "auto" | TracingConfiguration | null;
434
  /** Truncation behavior */
435
  truncation?: RealtimeTruncation;
436
}
437
```
438

439
[Session Token Creation](./realtime.md#session-token-creation)
440

441
### SIP Call Management
442

443
Manage incoming and outgoing SIP/WebRTC calls with the Realtime API.
444

445
```typescript { .api }
446
/**
447
 * Accept an incoming SIP call and configure the realtime session that will handle it
448
 */
449
function accept(
450
  callID: string,
451
  params: CallAcceptParams,
452
  options?: RequestOptions
453
): Promise<void>;
454

455
/**
456
 * End an active Realtime API call, whether it was initiated over SIP or WebRTC
457
 */
458
function hangup(
459
  callID: string,
460
  options?: RequestOptions
461
): Promise<void>;
462

463
/**
464
 * Transfer an active SIP call to a new destination using the SIP REFER verb
465
 */
466
function refer(
467
  callID: string,
468
  params: CallReferParams,
469
  options?: RequestOptions
470
): Promise<void>;
471

472
/**
473
 * Decline an incoming SIP call by returning a SIP status code to the caller
474
 */
475
function reject(
476
  callID: string,
477
  params?: CallRejectParams,
478
  options?: RequestOptions
479
): Promise<void>;
480

481
interface CallAcceptParams {
482
  /** The type of session to create. Always 'realtime' for the Realtime API */
483
  type: "realtime";
484
  /** Configuration for input and output audio */
485
  audio?: RealtimeAudioConfig;
486
  /** Additional fields to include in server outputs */
487
  include?: Array<"item.input_audio_transcription.logprobs">;
488
  /** The default system instructions prepended to model calls */
489
  instructions?: string;
490
  /** Maximum number of output tokens for a single assistant response (1-4096 or 'inf') */
491
  max_output_tokens?: number | "inf";
492
  /** The Realtime model used for this session */
493
  model?: string;
494
  /** The set of modalities the model can respond with */
495
  output_modalities?: Array<"text" | "audio">;
496
  /** Reference to a prompt template and its variables */
497
  prompt?: ResponsePrompt | null;
498
  /** How the model chooses tools */
499
  tool_choice?: RealtimeToolChoiceConfig;
500
  /** Tools available to the model */
501
  tools?: RealtimeToolsConfig;
502
  /** Tracing configuration for the session */
503
  tracing?: RealtimeTracingConfig | null;
504
  /** Truncation behavior when conversation exceeds token limits */
505
  truncation?: RealtimeTruncation;
506
}
507

508
interface CallReferParams {
509
  /** URI that should appear in the SIP Refer-To header (e.g., 'tel:+14155550123' or 'sip:agent@example.com') */
510
  target_uri: string;
511
}
512

513
interface CallRejectParams {
514
  /** SIP response code to send back to the caller. Defaults to 603 (Decline) when omitted */
515
  status_code?: number;
516
}
517
```
518

519
**Available at:** `client.realtime.calls`
520

521
**Usage Example:**
522

523
```typescript
524
import OpenAI from "openai";
525

526
const client = new OpenAI({
527
  apiKey: process.env.OPENAI_API_KEY,
528
});
529

530
// Accept incoming call
531
await client.realtime.calls.accept("call-123", {
532
  type: "realtime",
533
  model: "gpt-realtime",
534
  audio: {
535
    input: { format: { type: "audio/pcm", rate: 24000 } },
536
    output: { format: { type: "audio/pcm", rate: 24000 }, voice: "marin" },
537
  },
538
  instructions: "You are a helpful phone assistant.",
539
});
540

541
// Hang up call
542
await client.realtime.calls.hangup("call-123");
543

544
// Reject incoming call
545
await client.realtime.calls.reject("call-123", {
546
  status_code: 603, // Decline
547
});
548

549
// Transfer call
550
await client.realtime.calls.refer("call-123", {
551
  target_uri: "tel:+14155550123",
552
});
553
```
554

555
### WebSocket Connection
556

557
Connect to the Realtime API using WebSocket with the OpenAIRealtimeWebSocket class.
558

559
```typescript { .api }
560
/**
561
 * WebSocket client for the Realtime API. Handles connection lifecycle,
562
 * event streaming, and message sending.
563
 */
564
class OpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {
565
  url: URL;
566
  socket: WebSocket;
567

568
  constructor(
569
    props: {
570
      model: string;
571
      dangerouslyAllowBrowser?: boolean;
572
      onURL?: (url: URL) => void;
573
      __resolvedApiKey?: boolean;
574
    },
575
    client?: Pick<OpenAI, "apiKey" | "baseURL">
576
  );
577

578
  /**
579
   * Factory method that resolves API key before connecting
580
   */
581
  static create(
582
    client: Pick<OpenAI, "apiKey" | "baseURL" | "_callApiKey">,
583
    props: { model: string; dangerouslyAllowBrowser?: boolean }
584
  ): Promise<OpenAIRealtimeWebSocket>;
585

586
  /**
587
   * Factory method for Azure OpenAI connections
588
   */
589
  static azure(
590
    client: Pick<
591
      AzureOpenAI,
592
      "_callApiKey" | "apiVersion" | "apiKey" | "baseURL" | "deploymentName"
593
    >,
594
    options?: {
595
      deploymentName?: string;
596
      dangerouslyAllowBrowser?: boolean;
597
    }
598
  ): Promise<OpenAIRealtimeWebSocket>;
599

600
  /**
601
   * Send a client event to the server
602
   */
603
  send(event: RealtimeClientEvent): void;
604

605
  /**
606
   * Close the WebSocket connection
607
   */
608
  close(props?: { code: number; reason: string }): void;
609

610
  /**
611
   * Register event listener
612
   */
613
  on(event: string, listener: (event: any) => void): void;
614
}
615
```
616

617
**Usage:**
618

619
```typescript
620
// Standard connection
621
const ws = await OpenAIRealtimeWebSocket.create(client, {
622
  model: "gpt-realtime",
623
});
624

625
// Azure connection
626
const wsAzure = await OpenAIRealtimeWebSocket.azure(azureClient, {
627
  deploymentName: "my-realtime-deployment",
628
});
629
```
630

631
[WebSocket Connection](./realtime.md#websocket-connection)
632

633
### Phone Call Methods
634

635
Accept, reject, transfer, and hang up phone calls via SIP integration.
636

637
```typescript { .api }
638
/**
639
 * Accept an incoming SIP call and configure the realtime session
640
 */
641
function accept(callID: string, params: CallAcceptParams): Promise<void>;
642

643
/**
644
 * End an active Realtime API call (SIP or WebRTC)
645
 */
646
function hangup(callID: string): Promise<void>;
647

648
/**
649
 * Transfer an active SIP call to a new destination using SIP REFER
650
 */
651
function refer(callID: string, params: CallReferParams): Promise<void>;
652

653
/**
654
 * Decline an incoming SIP call with a SIP status code
655
 */
656
function reject(
657
  callID: string,
658
  params?: CallRejectParams
659
): Promise<void>;
660

661
interface CallAcceptParams {
662
  type: "realtime";
663
  audio?: RealtimeAudioConfig;
664
  include?: Array<"item.input_audio_transcription.logprobs">;
665
  instructions?: string;
666
  max_output_tokens?: number | "inf";
667
  model?: string;
668
  output_modalities?: Array<"text" | "audio">;
669
  prompt?: ResponsePrompt | null;
670
  tool_choice?: RealtimeToolChoiceConfig;
671
  tools?: RealtimeToolsConfig;
672
  tracing?: RealtimeTracingConfig | null;
673
  truncation?: RealtimeTruncation;
674
}
675

676
interface CallReferParams {
677
  /** URI in SIP Refer-To header (e.g., 'tel:+14155550123') */
678
  target_uri: string;
679
}
680

681
interface CallRejectParams {
682
  /** SIP response code (defaults to 603 Decline) */
683
  status_code?: number;
684
}
685
```
686

687
**Usage:**
688

689
```typescript
690
// Accept incoming call
691
await client.realtime.calls.accept("call_abc123", {
692
  type: "realtime",
693
  model: "gpt-realtime",
694
  instructions: "You are a helpful assistant on a phone call.",
695
  audio: {
696
    output: { voice: "marin" },
697
  },
698
});
699

700
// Transfer call
701
await client.realtime.calls.refer("call_abc123", {
702
  target_uri: "tel:+14155550199",
703
});
704

705
// Reject call
706
await client.realtime.calls.reject("call_abc123", {
707
  status_code: 486, // Busy Here
708
});
709

710
// Hang up
711
await client.realtime.calls.hangup("call_abc123");
712
```
713

714
[Phone Call Methods](./realtime.md#phone-call-methods)
715

716
### Session Configuration
717

718
Configure session parameters including audio formats, VAD, and model settings.
719

720
```typescript { .api }
721
interface RealtimeSession {
722
  id?: string;
723
  expires_at?: number;
724
  /** Fields to include in server outputs */
725
  include?: Array<"item.input_audio_transcription.logprobs"> | null;
726
  /** Input audio format: 'pcm16', 'g711_ulaw', or 'g711_alaw' */
727
  input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
728
  /** Noise reduction configuration */
729
  input_audio_noise_reduction?: {
730
    type?: NoiseReductionType;
731
  };
732
  /** Transcription configuration */
733
  input_audio_transcription?: AudioTranscription | null;
734
  /** System instructions */
735
  instructions?: string;
736
  /** Max output tokens per response */
737
  max_response_output_tokens?: number | "inf";
738
  /** Response modalities */
739
  modalities?: Array<"text" | "audio">;
740
  /** Model identifier */
741
  model?: string;
742
  object?: "realtime.session";
743
  /** Output audio format */
744
  output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
745
  /** Prompt template reference */
746
  prompt?: ResponsePrompt | null;
747
  /** Audio playback speed (0.25-1.5) */
748
  speed?: number;
749
  /** Sampling temperature (0.6-1.2) */
750
  temperature?: number;
751
  /** Tool choice mode */
752
  tool_choice?: string;
753
  /** Available tools */
754
  tools?: Array<RealtimeFunctionTool>;
755
  /** Tracing configuration */
756
  tracing?: "auto" | TracingConfiguration | null;
757
  /** Turn detection configuration */
758
  turn_detection?: RealtimeAudioInputTurnDetection | null;
759
  /** Truncation behavior */
760
  truncation?: RealtimeTruncation;
761
  /** Output voice */
762
  voice?: string;
763
}
764

765
interface AudioTranscription {
766
  /** Language code (ISO-639-1, e.g., 'en') */
767
  language?: string;
768
  /** Transcription model */
769
  model?:
770
    | "whisper-1"
771
    | "gpt-4o-mini-transcribe"
772
    | "gpt-4o-transcribe"
773
    | "gpt-4o-transcribe-diarize";
774
  /** Transcription guidance prompt */
775
  prompt?: string;
776
}
777

778
type NoiseReductionType = "near_field" | "far_field";
779

780
type RealtimeTruncation =
781
  | "auto"
782
  | "disabled"
783
  | {
784
      type: "retention_ratio";
785
      /** Fraction of max context to retain (0.0-1.0) */
786
      retention_ratio: number;
787
    };
788
```
789

790
[Session Configuration](./realtime.md#session-configuration)
791

792
### Turn Detection (VAD)
793

794
Configure voice activity detection for automatic turn taking.
795

796
```typescript { .api }
797
/**
798
 * Server VAD: Simple volume-based voice activity detection
799
 */
800
interface ServerVad {
801
  type: "server_vad";
802
  /** Auto-generate response on VAD stop */
803
  create_response?: boolean;
804
  /** Timeout for prompting user to continue (ms) */
805
  idle_timeout_ms?: number | null;
806
  /** Auto-interrupt on VAD start */
807
  interrupt_response?: boolean;
808
  /** Audio prefix padding (ms, default: 300) */
809
  prefix_padding_ms?: number;
810
  /** Silence duration to detect stop (ms, default: 500) */
811
  silence_duration_ms?: number;
812
  /** VAD activation threshold (0.0-1.0, default: 0.5) */
813
  threshold?: number;
814
}
815

816
/**
817
 * Semantic VAD: Model-based turn detection with dynamic timeouts
818
 */
819
interface SemanticVad {
820
  type: "semantic_vad";
821
  /** Auto-generate response on VAD stop */
822
  create_response?: boolean;
823
  /** Eagerness: 'low' (8s), 'medium' (4s), 'high' (2s), 'auto' */
824
  eagerness?: "low" | "medium" | "high" | "auto";
825
  /** Auto-interrupt on VAD start */
826
  interrupt_response?: boolean;
827
}
828

829
type RealtimeAudioInputTurnDetection = ServerVad | SemanticVad;
830
```
831

832
**Usage:**
833

834
```typescript
835
// Server VAD with custom settings
836
{
837
  type: "server_vad",
838
  threshold: 0.6,
839
  silence_duration_ms: 700,
840
  prefix_padding_ms: 300,
841
  interrupt_response: true,
842
  create_response: true,
843
  idle_timeout_ms: 30000
844
}
845

846
// Semantic VAD for natural conversations
847
{
848
  type: "semantic_vad",
849
  eagerness: "medium",
850
  interrupt_response: true,
851
  create_response: true
852
}
853

854
// Manual turn detection (no VAD)
855
{
856
  turn_detection: null
857
}
858
```
859

860
[Turn Detection](./realtime.md#turn-detection-vad)
861

862
### Audio Formats
863

864
Configure input and output audio formats for the session.
865

866
```typescript { .api }
867
/**
868
 * PCM 16-bit audio at 24kHz sample rate
869
 */
870
interface AudioPCM {
871
  type?: "audio/pcm";
872
  rate?: 24000;
873
}
874

875
/**
876
 * G.711 μ-law format (commonly used in telephony)
877
 */
878
interface AudioPCMU {
879
  type?: "audio/pcmu";
880
}
881

882
/**
883
 * G.711 A-law format (commonly used in telephony)
884
 */
885
interface AudioPCMA {
886
  type?: "audio/pcma";
887
}
888

889
type RealtimeAudioFormats = AudioPCM | AudioPCMU | AudioPCMA;
890

891
interface RealtimeAudioConfig {
892
  input?: {
893
    format?: RealtimeAudioFormats;
894
    noise_reduction?: { type?: NoiseReductionType };
895
    transcription?: AudioTranscription;
896
    turn_detection?: RealtimeAudioInputTurnDetection | null;
897
  };
898
  output?: {
899
    format?: RealtimeAudioFormats;
900
    /** Playback speed multiplier (0.25-1.5) */
901
    speed?: number;
902
    /** Voice selection */
903
    voice?:
904
      | string
905
      | "alloy"
906
      | "ash"
907
      | "ballad"
908
      | "coral"
909
      | "echo"
910
      | "sage"
911
      | "shimmer"
912
      | "verse"
913
      | "marin"
914
      | "cedar";
915
  };
916
}
917
```
918

919
[Audio Formats](./realtime.md#audio-formats)
920

921
### Client-to-Server Events
922

923
Events sent from client to server to control the conversation.
924

925
```typescript { .api }
926
/**
927
 * Union of all client events
928
 */
929
type RealtimeClientEvent =
930
  | ConversationItemCreateEvent
931
  | ConversationItemDeleteEvent
932
  | ConversationItemRetrieveEvent
933
  | ConversationItemTruncateEvent
934
  | InputAudioBufferAppendEvent
935
  | InputAudioBufferClearEvent
936
  | OutputAudioBufferClearEvent
937
  | InputAudioBufferCommitEvent
938
  | ResponseCancelEvent
939
  | ResponseCreateEvent
940
  | SessionUpdateEvent;
941

942
/**
943
 * Add conversation item (message, function call, or output)
944
 */
945
interface ConversationItemCreateEvent {
946
  type: "conversation.item.create";
947
  item: ConversationItem;
948
  event_id?: string;
949
  /** Insert after this item ID ('root' for beginning) */
950
  previous_item_id?: string;
951
}
952

953
/**
954
 * Delete conversation item by ID
955
 */
956
interface ConversationItemDeleteEvent {
957
  type: "conversation.item.delete";
958
  item_id: string;
959
  event_id?: string;
960
}
961

962
/**
963
 * Retrieve full item including audio data
964
 */
965
interface ConversationItemRetrieveEvent {
966
  type: "conversation.item.retrieve";
967
  item_id: string;
968
  event_id?: string;
969
}
970

971
/**
972
 * Truncate assistant audio message
973
 */
974
interface ConversationItemTruncateEvent {
975
  type: "conversation.item.truncate";
976
  item_id: string;
977
  content_index: number;
978
  /** Duration to keep in milliseconds */
979
  audio_end_ms: number;
980
  event_id?: string;
981
}
982

983
/**
984
 * Append audio to input buffer
985
 */
986
interface InputAudioBufferAppendEvent {
987
  type: "input_audio_buffer.append";
988
  /** Base64-encoded audio bytes */
989
  audio: string;
990
  event_id?: string;
991
}
992

993
/**
994
 * Clear input audio buffer
995
 */
996
interface InputAudioBufferClearEvent {
997
  type: "input_audio_buffer.clear";
998
  event_id?: string;
999
}
1000

1001
/**
1002
 * Commit input audio buffer to conversation
1003
 */
1004
interface InputAudioBufferCommitEvent {
1005
  type: "input_audio_buffer.commit";
1006
  event_id?: string;
1007
}
1008

1009
/**
1010
 * WebRTC only: Clear output audio buffer
1011
 */
1012
interface OutputAudioBufferClearEvent {
1013
  type: "output_audio_buffer.clear";
1014
  event_id?: string;
1015
}
1016

1017
/**
1018
 * Cancel in-progress response
1019
 */
1020
interface ResponseCancelEvent {
1021
  type: "response.cancel";
1022
  event_id?: string;
1023
}
1024

1025
/**
1026
 * Request model response
1027
 */
1028
interface ResponseCreateEvent {
1029
  type: "response.create";
1030
  response?: {
1031
    modalities?: Array<"text" | "audio">;
1032
    instructions?: string;
1033
    voice?: string;
1034
    output_audio_format?: string;
1035
    tools?: Array<RealtimeFunctionTool>;
1036
    tool_choice?: string;
1037
    temperature?: number;
1038
    max_output_tokens?: number | "inf";
1039
    conversation?: "auto" | "none";
1040
    metadata?: Record<string, string>;
1041
    input?: Array<ConversationItemWithReference>;
1042
  };
1043
  event_id?: string;
1044
}
1045

1046
/**
1047
 * Update session configuration
1048
 */
1049
interface SessionUpdateEvent {
1050
  type: "session.update";
1051
  session: Partial<RealtimeSession>;
1052
  event_id?: string;
1053
}
1054
```
1055

1056
[Client Events](./realtime.md#client-to-server-events)
1057

1058
### Server-to-Client Events
1059

1060
Events sent from server to client during the conversation.
1061

1062
```typescript { .api }
1063
/**
1064
 * Union of all server events (50+ event types)
1065
 */
1066
type RealtimeServerEvent =
1067
  | ConversationCreatedEvent
1068
  | ConversationItemCreatedEvent
1069
  | ConversationItemDeletedEvent
1070
  | ConversationItemAdded
1071
  | ConversationItemDone
1072
  | ConversationItemRetrieved
1073
  | ConversationItemTruncatedEvent
1074
  | ConversationItemInputAudioTranscriptionCompletedEvent
1075
  | ConversationItemInputAudioTranscriptionDeltaEvent
1076
  | ConversationItemInputAudioTranscriptionFailedEvent
1077
  | ConversationItemInputAudioTranscriptionSegment
1078
  | InputAudioBufferClearedEvent
1079
  | InputAudioBufferCommittedEvent
1080
  | InputAudioBufferSpeechStartedEvent
1081
  | InputAudioBufferSpeechStoppedEvent
1082
  | InputAudioBufferTimeoutTriggered
1083
  | OutputAudioBufferStarted
1084
  | OutputAudioBufferStopped
1085
  | OutputAudioBufferCleared
1086
  | ResponseCreatedEvent
1087
  | ResponseDoneEvent
1088
  | ResponseOutputItemAddedEvent
1089
  | ResponseOutputItemDoneEvent
1090
  | ResponseContentPartAddedEvent
1091
  | ResponseContentPartDoneEvent
1092
  | ResponseAudioDeltaEvent
1093
  | ResponseAudioDoneEvent
1094
  | ResponseAudioTranscriptDeltaEvent
1095
  | ResponseAudioTranscriptDoneEvent
1096
  | ResponseTextDeltaEvent
1097
  | ResponseTextDoneEvent
1098
  | ResponseFunctionCallArgumentsDeltaEvent
1099
  | ResponseFunctionCallArgumentsDoneEvent
1100
  | ResponseMcpCallArgumentsDelta
1101
  | ResponseMcpCallArgumentsDone
1102
  | ResponseMcpCallInProgress
1103
  | ResponseMcpCallCompleted
1104
  | ResponseMcpCallFailed
1105
  | McpListToolsInProgress
1106
  | McpListToolsCompleted
1107
  | McpListToolsFailed
1108
  | SessionCreatedEvent
1109
  | SessionUpdatedEvent
1110
  | RateLimitsUpdatedEvent
1111
  | RealtimeErrorEvent;
1112

1113
/**
1114
 * Session created (first event after connection)
1115
 */
1116
interface SessionCreatedEvent {
1117
  type: "session.created";
1118
  event_id: string;
1119
  session: RealtimeSession;
1120
}
1121

1122
/**
1123
 * Session updated after client session.update
1124
 */
1125
interface SessionUpdatedEvent {
1126
  type: "session.updated";
1127
  event_id: string;
1128
  session: RealtimeSession;
1129
}
1130

1131
/**
1132
 * Conversation created
1133
 */
1134
interface ConversationCreatedEvent {
1135
  type: "conversation.created";
1136
  event_id: string;
1137
  conversation: {
1138
    id?: string;
1139
    object?: "realtime.conversation";
1140
  };
1141
}
1142

1143
/**
1144
 * Item created in conversation
1145
 */
1146
interface ConversationItemCreatedEvent {
1147
  type: "conversation.item.created";
1148
  event_id: string;
1149
  item: ConversationItem;
1150
  previous_item_id?: string | null;
1151
}
1152

1153
/**
1154
 * Item added to conversation (may have partial content)
1155
 */
1156
interface ConversationItemAdded {
1157
  type: "conversation.item.added";
1158
  event_id: string;
1159
  item: ConversationItem;
1160
  previous_item_id?: string | null;
1161
}
1162

1163
/**
1164
 * Item finalized with complete content
1165
 */
1166
interface ConversationItemDone {
1167
  type: "conversation.item.done";
1168
  event_id: string;
1169
  item: ConversationItem;
1170
  previous_item_id?: string | null;
1171
}
1172

1173
/**
1174
 * Input audio buffer committed
1175
 */
1176
interface InputAudioBufferCommittedEvent {
1177
  type: "input_audio_buffer.committed";
1178
  event_id: string;
1179
  item_id: string;
1180
  previous_item_id?: string | null;
1181
}
1182

1183
/**
1184
 * Speech detected in input buffer (VAD start)
1185
 */
1186
interface InputAudioBufferSpeechStartedEvent {
1187
  type: "input_audio_buffer.speech_started";
1188
  event_id: string;
1189
  item_id: string;
1190
  /** Milliseconds from session start */
1191
  audio_start_ms: number;
1192
}
1193

1194
/**
1195
 * Speech ended in input buffer (VAD stop)
1196
 */
1197
interface InputAudioBufferSpeechStoppedEvent {
1198
  type: "input_audio_buffer.speech_stopped";
1199
  event_id: string;
1200
  item_id: string;
1201
  /** Milliseconds from session start */
1202
  audio_end_ms: number;
1203
}
1204

1205
/**
1206
 * Response started
1207
 */
1208
interface ResponseCreatedEvent {
1209
  type: "response.created";
1210
  event_id: string;
1211
  response: RealtimeResponse;
1212
}
1213

1214
/**
1215
 * Response completed
1216
 */
1217
interface ResponseDoneEvent {
1218
  type: "response.done";
1219
  event_id: string;
1220
  response: RealtimeResponse;
1221
}
1222

1223
/**
1224
 * Audio delta (streaming audio chunk)
1225
 */
1226
interface ResponseAudioDeltaEvent {
1227
  type: "response.audio.delta";
1228
  event_id: string;
1229
  response_id: string;
1230
  item_id: string;
1231
  output_index: number;
1232
  content_index: number;
1233
  /** Base64-encoded audio bytes */
1234
  delta: string;
1235
}
1236

1237
/**
1238
 * Audio generation completed
1239
 */
1240
interface ResponseAudioDoneEvent {
1241
  type: "response.audio.done";
1242
  event_id: string;
1243
  response_id: string;
1244
  item_id: string;
1245
  output_index: number;
1246
  content_index: number;
1247
}
1248

1249
/**
1250
 * Text delta (streaming text chunk)
1251
 */
1252
interface ResponseTextDeltaEvent {
1253
  type: "response.text.delta";
1254
  event_id: string;
1255
  response_id: string;
1256
  item_id: string;
1257
  output_index: number;
1258
  content_index: number;
1259
  /** Text chunk */
1260
  delta: string;
1261
}
1262

1263
/**
1264
 * Text generation completed
1265
 */
1266
interface ResponseTextDoneEvent {
1267
  type: "response.text.done";
1268
  event_id: string;
1269
  response_id: string;
1270
  item_id: string;
1271
  output_index: number;
1272
  content_index: number;
1273
  /** Complete text */
1274
  text: string;
1275
}
1276

1277
/**
1278
 * Function call arguments delta
1279
 */
1280
interface ResponseFunctionCallArgumentsDeltaEvent {
1281
  type: "response.function_call_arguments.delta";
1282
  event_id: string;
1283
  response_id: string;
1284
  item_id: string;
1285
  output_index: number;
1286
  call_id: string;
1287
  /** JSON arguments chunk */
1288
  delta: string;
1289
}
1290

1291
/**
1292
 * Function call arguments completed
1293
 */
1294
interface ResponseFunctionCallArgumentsDoneEvent {
1295
  type: "response.function_call_arguments.done";
1296
  event_id: string;
1297
  response_id: string;
1298
  item_id: string;
1299
  output_index: number;
1300
  call_id: string;
1301
  /** Complete JSON arguments */
1302
  arguments: string;
1303
}
1304

1305
/**
1306
 * Error occurred
1307
 */
1308
interface RealtimeErrorEvent {
1309
  type: "error";
1310
  event_id: string;
1311
  error: {
1312
    type: string;
1313
    code?: string | null;
1314
    message: string;
1315
    param?: string | null;
1316
    event_id?: string | null;
1317
  };
1318
}
1319
```
1320

1321
[Server Events](./realtime.md#server-to-client-events)
1322

1323
### Conversation Items
1324

1325
Items that make up the conversation history.
1326

1327
```typescript { .api }
1328
/**
1329
 * Union of all conversation item types
1330
 */
1331
type ConversationItem =
1332
  | RealtimeConversationItemSystemMessage
1333
  | RealtimeConversationItemUserMessage
1334
  | RealtimeConversationItemAssistantMessage
1335
  | RealtimeConversationItemFunctionCall
1336
  | RealtimeConversationItemFunctionCallOutput
1337
  | RealtimeMcpApprovalResponse
1338
  | RealtimeMcpListTools
1339
  | RealtimeMcpToolCall
1340
  | RealtimeMcpApprovalRequest;
1341

1342
/**
1343
 * System message item
1344
 */
1345
interface RealtimeConversationItemSystemMessage {
1346
  type: "message";
1347
  role: "system";
1348
  content: Array<{
1349
    type?: "input_text";
1350
    text?: string;
1351
  }>;
1352
  id?: string;
1353
  object?: "realtime.item";
1354
  status?: "completed" | "incomplete" | "in_progress";
1355
}
1356

1357
/**
1358
 * User message item (text, audio, or image)
1359
 */
1360
interface RealtimeConversationItemUserMessage {
1361
  type: "message";
1362
  role: "user";
1363
  content: Array<{
1364
    type?: "input_text" | "input_audio" | "input_image";
1365
    text?: string;
1366
    audio?: string; // Base64-encoded
1367
    transcript?: string;
1368
    image_url?: string; // Data URI
1369
    detail?: "auto" | "low" | "high";
1370
  }>;
1371
  id?: string;
1372
  object?: "realtime.item";
1373
  status?: "completed" | "incomplete" | "in_progress";
1374
}
1375

1376
/**
1377
 * Assistant message item (text or audio)
1378
 */
1379
interface RealtimeConversationItemAssistantMessage {
1380
  type: "message";
1381
  role: "assistant";
1382
  content: Array<{
1383
    type?: "output_text" | "output_audio";
1384
    text?: string;
1385
    audio?: string; // Base64-encoded
1386
    transcript?: string;
1387
  }>;
1388
  id?: string;
1389
  object?: "realtime.item";
1390
  status?: "completed" | "incomplete" | "in_progress";
1391
}
1392

1393
/**
1394
 * Function call item
1395
 */
1396
interface RealtimeConversationItemFunctionCall {
1397
  type: "function_call";
1398
  name: string;
1399
  /** JSON-encoded arguments */
1400
  arguments: string;
1401
  call_id?: string;
1402
  id?: string;
1403
  object?: "realtime.item";
1404
  status?: "completed" | "incomplete" | "in_progress";
1405
}
1406

1407
/**
1408
 * Function call output item
1409
 */
1410
interface RealtimeConversationItemFunctionCallOutput {
1411
  type: "function_call_output";
1412
  call_id: string;
1413
  /** Function output (free text) */
1414
  output: string;
1415
  id?: string;
1416
  object?: "realtime.item";
1417
  status?: "completed" | "incomplete" | "in_progress";
1418
}
1419

1420
/**
1421
 * MCP tool call item
1422
 */
1423
interface RealtimeMcpToolCall {
1424
  type: "mcp_call";
1425
  id: string;
1426
  server_label: string;
1427
  name: string;
1428
  arguments: string;
1429
  output?: string | null;
1430
  error?:
1431
    | { type: "protocol_error"; code: number; message: string }
1432
    | { type: "tool_execution_error"; message: string }
1433
    | { type: "http_error"; code: number; message: string }
1434
    | null;
1435
  approval_request_id?: string | null;
1436
}
1437

1438
/**
1439
 * MCP approval request item
1440
 */
1441
interface RealtimeMcpApprovalRequest {
1442
  type: "mcp_approval_request";
1443
  id: string;
1444
  server_label: string;
1445
  name: string;
1446
  arguments: string;
1447
}
1448

1449
/**
1450
 * MCP approval response item
1451
 */
1452
interface RealtimeMcpApprovalResponse {
1453
  type: "mcp_approval_response";
1454
  id: string;
1455
  approval_request_id: string;
1456
  approve: boolean;
1457
  reason?: string | null;
1458
}
1459
```
1460

1461
[Conversation Items](./realtime.md#conversation-items)
1462

1463
### Function Calling
1464

1465
Define and use tools during real-time conversations.
1466

1467
```typescript { .api }
1468
/**
1469
 * Function tool definition for realtime conversations
1470
 */
1471
interface RealtimeFunctionTool {
1472
  type?: "function";
1473
  /** Function name */
1474
  name?: string;
1475
  /** Description and usage guidance */
1476
  description?: string;
1477
  /** JSON Schema for function parameters */
1478
  parameters?: unknown;
1479
}
1480

1481
/**
1482
 * MCP (Model Context Protocol) tool configuration
1483
 */
1484
interface McpTool {
1485
  type: "mcp";
1486
  /** Label identifying the MCP server */
1487
  server_label: string;
1488
  /** MCP server URL or connector ID */
1489
  server_url?: string;
1490
  connector_id?:
1491
    | "connector_dropbox"
1492
    | "connector_gmail"
1493
    | "connector_googlecalendar"
1494
    | "connector_googledrive"
1495
    | "connector_microsoftteams"
1496
    | "connector_outlookcalendar"
1497
    | "connector_outlookemail"
1498
    | "connector_sharepoint";
1499
  /** Server description */
1500
  server_description?: string;
1501
  /** Allowed tools filter */
1502
  allowed_tools?:
1503
    | Array<string>
1504
    | {
1505
        tool_names?: Array<string>;
1506
        read_only?: boolean;
1507
      }
1508
    | null;
1509
  /** Approval requirements */
1510
  require_approval?:
1511
    | "always"
1512
    | "never"
1513
    | {
1514
        always?: { tool_names?: Array<string>; read_only?: boolean };
1515
        never?: { tool_names?: Array<string>; read_only?: boolean };
1516
      }
1517
    | null;
1518
  /** OAuth access token */
1519
  authorization?: string;
1520
  /** HTTP headers */
1521
  headers?: Record<string, string> | null;
1522
}
1523

1524
type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;
1525

1526
type RealtimeToolChoiceConfig =
1527
  | "auto"
1528
  | "none"
1529
  | "required"
1530
  | { type: "function"; function: { name: string } }
1531
  | { type: "mcp"; mcp: { server_label: string; name: string } };
1532
```
1533

1534
**Usage:**
1535

1536
```typescript
1537
// Define tools
1538
const tools: RealtimeToolsConfig = [
1539
  {
1540
    type: "function",
1541
    name: "get_weather",
1542
    description: "Get current weather for a location",
1543
    parameters: {
1544
      type: "object",
1545
      properties: {
1546
        location: { type: "string" },
1547
        unit: { type: "string", enum: ["celsius", "fahrenheit"] },
1548
      },
1549
      required: ["location"],
1550
    },
1551
  },
1552
  {
1553
    type: "mcp",
1554
    server_label: "calendar",
1555
    connector_id: "connector_googlecalendar",
1556
    allowed_tools: {
1557
      tool_names: ["list_events", "create_event"],
1558
    },
1559
  },
1560
];
1561

1562
// Update session with tools
1563
ws.send({
1564
  type: "session.update",
1565
  session: {
1566
    tools,
1567
    tool_choice: "auto",
1568
  },
1569
});
1570

1571
// Handle function call
1572
ws.on("response.function_call_arguments.done", async (event) => {
1573
  const result = await executeFunction(event.call_id, event.arguments);
1574

1575
  // Send function output
1576
  ws.send({
1577
    type: "conversation.item.create",
1578
    item: {
1579
      type: "function_call_output",
1580
      call_id: event.call_id,
1581
      output: JSON.stringify(result),
1582
    },
1583
  });
1584

1585
  // Trigger new response
1586
  ws.send({
1587
    type: "response.create",
1588
  });
1589
});
1590
```
1591

1592
[Function Calling](./realtime.md#function-calling)
1593

1594
### Response Configuration
1595

1596
Configure individual response parameters.
1597

1598
```typescript { .api }
1599
/**
1600
 * Response resource
1601
 */
1602
interface RealtimeResponse {
1603
  id?: string;
1604
  object?: "realtime.response";
1605
  /** Conversation ID or null */
1606
  conversation_id?: string;
1607
  /** Status: 'in_progress', 'completed', 'cancelled', 'failed', 'incomplete' */
1608
  status?: RealtimeResponseStatus;
1609
  /** Usage statistics */
1610
  usage?: RealtimeResponseUsage;
1611
  /** Max output tokens */
1612
  max_output_tokens?: number | "inf";
1613
  /** Response modalities */
1614
  modalities?: Array<"text" | "audio">;
1615
  /** Instructions for this response */
1616
  instructions?: string;
1617
  /** Voice selection */
1618
  voice?: string;
1619
  /** Audio output configuration */
1620
  audio?: {
1621
    format?: RealtimeAudioFormats;
1622
    speed?: number;
1623
    voice?: string;
1624
  };
1625
  /** Response metadata */
1626
  metadata?: Record<string, string> | null;
1627
  /** Tool choice */
1628
  tool_choice?: RealtimeToolChoiceConfig;
1629
  /** Tools for this response */
1630
  tools?: RealtimeToolsConfig;
1631
  /** Temperature */
1632
  temperature?: number;
1633
  /** Output items */
1634
  output?: Array<ConversationItem>;
1635
  /** Status details */
1636
  status_details?: {
1637
    type?: "incomplete" | "failed" | "cancelled";
1638
    reason?: string;
1639
    error?: RealtimeError | null;
1640
  } | null;
1641
}
1642

1643
interface RealtimeResponseStatus {
1644
  type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
1645
  /** Additional status information */
1646
  reason?: string;
1647
}
1648

1649
interface RealtimeResponseUsage {
1650
  /** Total tokens (input + output) */
1651
  total_tokens?: number;
1652
  /** Input tokens */
1653
  input_tokens?: number;
1654
  /** Output tokens */
1655
  output_tokens?: number;
1656
  /** Input token breakdown */
1657
  input_token_details?: {
1658
    text_tokens?: number;
1659
    audio_tokens?: number;
1660
    image_tokens?: number;
1661
    cached_tokens?: number;
1662
    cached_tokens_details?: {
1663
      text_tokens?: number;
1664
      audio_tokens?: number;
1665
      image_tokens?: number;
1666
    };
1667
  };
1668
  /** Output token breakdown */
1669
  output_token_details?: {
1670
    text_tokens?: number;
1671
    audio_tokens?: number;
1672
  };
1673
}
1674
```
1675

1676
[Response Configuration](./realtime.md#response-configuration)
1677

1678
### Transcription
1679

1680
Configure and receive audio transcription during conversations.
1681

1682
```typescript { .api }
1683
/**
1684
 * Transcription configuration
1685
 */
1686
interface AudioTranscription {
1687
  /** Language code (ISO-639-1) */
1688
  language?: string;
1689
  /** Transcription model */
1690
  model?:
1691
    | "whisper-1"
1692
    | "gpt-4o-mini-transcribe"
1693
    | "gpt-4o-transcribe"
1694
    | "gpt-4o-transcribe-diarize";
1695
  /** Guidance prompt */
1696
  prompt?: string;
1697
}
1698

1699
/**
1700
 * Transcription completed event
1701
 */
1702
interface ConversationItemInputAudioTranscriptionCompletedEvent {
1703
  type: "conversation.item.input_audio_transcription.completed";
1704
  event_id: string;
1705
  item_id: string;
1706
  content_index: number;
1707
  /** Transcribed text */
1708
  transcript: string;
1709
  /** Usage statistics */
1710
  usage:
1711
    | {
1712
        type: "tokens";
1713
        input_tokens: number;
1714
        output_tokens: number;
1715
        total_tokens: number;
1716
        input_token_details?: {
1717
          text_tokens?: number;
1718
          audio_tokens?: number;
1719
        };
1720
      }
1721
    | {
1722
        type: "duration";
1723
        /** Duration in seconds */
1724
        seconds: number;
1725
      };
1726
  /** Log probabilities (if enabled) */
1727
  logprobs?: Array<{
1728
    token: string;
1729
    logprob: number;
1730
    bytes: Array<number>;
1731
  }> | null;
1732
}
1733

1734
/**
1735
 * Transcription delta event (streaming)
1736
 */
1737
interface ConversationItemInputAudioTranscriptionDeltaEvent {
1738
  type: "conversation.item.input_audio_transcription.delta";
1739
  event_id: string;
1740
  item_id: string;
1741
  content_index?: number;
1742
  /** Transcript chunk */
1743
  delta?: string;
1744
  /** Log probabilities (if enabled) */
1745
  logprobs?: Array<{
1746
    token: string;
1747
    logprob: number;
1748
    bytes: Array<number>;
1749
  }> | null;
1750
}
1751

1752
/**
1753
 * Transcription segment (for diarization)
1754
 */
1755
interface ConversationItemInputAudioTranscriptionSegment {
1756
  type: "conversation.item.input_audio_transcription.segment";
1757
  event_id: string;
1758
  item_id: string;
1759
  content_index: number;
1760
  id: string;
1761
  /** Segment text */
1762
  text: string;
1763
  /** Speaker label */
1764
  speaker: string;
1765
  /** Start time in seconds */
1766
  start: number;
1767
  /** End time in seconds */
1768
  end: number;
1769
}
1770

1771
/**
1772
 * Transcription failed event
1773
 */
1774
interface ConversationItemInputAudioTranscriptionFailedEvent {
1775
  type: "conversation.item.input_audio_transcription.failed";
1776
  event_id: string;
1777
  item_id: string;
1778
  content_index: number;
1779
  error: {
1780
    type?: string;
1781
    code?: string;
1782
    message?: string;
1783
    param?: string;
1784
  };
1785
}
1786
```
1787

1788
**Usage:**
1789

1790
```typescript
1791
// Enable transcription with log probabilities
1792
ws.send({
1793
  type: "session.update",
1794
  session: {
1795
    input_audio_transcription: {
1796
      model: "gpt-4o-transcribe",
1797
      language: "en",
1798
    },
1799
    include: ["item.input_audio_transcription.logprobs"],
1800
  },
1801
});
1802

1803
// Listen for transcription
1804
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
1805
  console.log("Transcript delta:", event.delta);
1806
});
1807

1808
ws.on(
1809
  "conversation.item.input_audio_transcription.completed",
1810
  (event) => {
1811
    console.log("Full transcript:", event.transcript);
1812
    console.log("Usage:", event.usage);
1813
  }
1814
);
1815

1816
// Diarization support
1817
ws.send({
1818
  type: "session.update",
1819
  session: {
1820
    input_audio_transcription: {
1821
      model: "gpt-4o-transcribe-diarize",
1822
    },
1823
  },
1824
});
1825

1826
ws.on(
1827
  "conversation.item.input_audio_transcription.segment",
1828
  (event) => {
1829
    console.log(
1830
      `[${event.speaker}] ${event.text} (${event.start}s - ${event.end}s)`
1831
    );
1832
  }
1833
);
1834
```
1835

1836
[Transcription](./realtime.md#transcription)
1837

1838
### Error Handling
1839

1840
Handle errors and edge cases in real-time conversations.
1841

1842
```typescript { .api }
1843
/**
1844
 * Error event from server
1845
 */
1846
interface RealtimeErrorEvent {
1847
  type: "error";
1848
  event_id: string;
1849
  error: RealtimeError;
1850
}
1851

1852
interface RealtimeError {
1853
  /** Error type */
1854
  type: string;
1855
  /** Error code (optional) */
1856
  code?: string | null;
1857
  /** Human-readable message */
1858
  message: string;
1859
  /** Related parameter (optional) */
1860
  param?: string | null;
1861
  /** Client event ID that caused error (optional) */
1862
  event_id?: string | null;
1863
}
1864

1865
/**
1866
 * OpenAI Realtime error class
1867
 */
1868
class OpenAIRealtimeError extends Error {
1869
  constructor(message: string);
1870
}
1871
```
1872

1873
**Common Error Types:**
1874

1875
```typescript
1876
// Invalid request errors
1877
{
1878
  type: "invalid_request_error",
1879
  code: "invalid_value",
1880
  message: "Invalid value for 'audio_format'",
1881
  param: "audio_format"
1882
}
1883

1884
// Server errors
1885
{
1886
  type: "server_error",
1887
  message: "Internal server error"
1888
}
1889

1890
// Rate limit errors
1891
{
1892
  type: "rate_limit_error",
1893
  message: "Rate limit exceeded"
1894
}
1895
```
1896

1897
**Usage:**
1898

1899
```typescript
1900
ws.on("error", (event: RealtimeErrorEvent) => {
1901
  console.error("Realtime error:", event.error);
1902

1903
  if (event.error.type === "rate_limit_error") {
1904
    // Handle rate limiting
1905
  } else if (event.error.type === "invalid_request_error") {
1906
    // Handle validation errors
1907
    console.error("Invalid:", event.error.param, event.error.message);
1908
  }
1909
});
1910

1911
// WebSocket errors
1912
ws.socket.addEventListener("error", (error) => {
1913
  console.error("WebSocket error:", error);
1914
});
1915
```
1916

1917
[Error Handling](./realtime.md#error-handling)
1918

1919
### Rate Limits
1920

1921
Monitor rate limits during conversations.
1922

1923
```typescript { .api }
1924
/**
1925
 * Rate limits updated event
1926
 */
1927
interface RateLimitsUpdatedEvent {
1928
  type: "rate_limits.updated";
1929
  event_id: string;
1930
  rate_limits: Array<{
1931
    /** Rate limit name: 'requests' or 'tokens' */
1932
    name?: "requests" | "tokens";
1933
    /** Maximum allowed value */
1934
    limit?: number;
1935
    /** Remaining before limit reached */
1936
    remaining?: number;
1937
    /** Seconds until reset */
1938
    reset_seconds?: number;
1939
  }>;
1940
}
1941
```
1942

1943
**Usage:**
1944

1945
```typescript
1946
ws.on("rate_limits.updated", (event: RateLimitsUpdatedEvent) => {
1947
  event.rate_limits.forEach((limit) => {
1948
    console.log(`${limit.name}: ${limit.remaining}/${limit.limit}`);
1949
    console.log(`Resets in ${limit.reset_seconds}s`);
1950
  });
1951
});
1952
```
1953

1954
[Rate Limits](./realtime.md#rate-limits)
1955

1956
### Tracing
1957

1958
Configure distributed tracing for debugging and monitoring.
1959

1960
```typescript { .api }
1961
/**
1962
 * Tracing configuration
1963
 */
1964
type RealtimeTracingConfig =
1965
  | "auto"
1966
  | {
1967
      /** Workflow name in Traces Dashboard */
1968
      workflow_name?: string;
1969
      /** Group ID for filtering */
1970
      group_id?: string;
1971
      /** Arbitrary metadata */
1972
      metadata?: unknown;
1973
    }
1974
  | null;
1975
```
1976

1977
**Usage:**
1978

1979
```typescript
1980
// Auto tracing with defaults
1981
{
1982
  tracing: "auto";
1983
}
1984

1985
// Custom tracing configuration
1986
{
1987
  tracing: {
1988
    workflow_name: "customer-support-bot",
1989
    group_id: "prod-us-west",
1990
    metadata: {
1991
      customer_id: "cust_123",
1992
      agent_version: "2.1.0"
1993
    }
1994
  }
1995
}
1996

1997
// Disable tracing
1998
{
1999
  tracing: null;
2000
}
2001
```
2002

2003
[Tracing](./realtime.md#tracing)
2004

2005
## Complete Example: Voice Assistant
2006

2007
```typescript
2008
import OpenAI from "openai";
2009
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
2010

2011
const client = new OpenAI();
2012

2013
// Create session token
2014
const secret = await client.realtime.clientSecrets.create({
2015
  session: {
2016
    type: "realtime",
2017
    model: "gpt-realtime",
2018
    audio: {
2019
      input: {
2020
        format: { type: "audio/pcm", rate: 24000 },
2021
        turn_detection: {
2022
          type: "server_vad",
2023
          threshold: 0.5,
2024
          silence_duration_ms: 500,
2025
          interrupt_response: true,
2026
        },
2027
        transcription: {
2028
          model: "gpt-4o-transcribe",
2029
        },
2030
      },
2031
      output: {
2032
        format: { type: "audio/pcm", rate: 24000 },
2033
        voice: "marin",
2034
      },
2035
    },
2036
    instructions:
2037
      "You are a helpful voice assistant. Speak naturally and concisely.",
2038
    tools: [
2039
      {
2040
        type: "function",
2041
        name: "get_weather",
2042
        description: "Get weather for a location",
2043
        parameters: {
2044
          type: "object",
2045
          properties: {
2046
            location: { type: "string" },
2047
          },
2048
          required: ["location"],
2049
        },
2050
      },
2051
    ],
2052
  },
2053
});
2054

2055
// Connect WebSocket
2056
const ws = await OpenAIRealtimeWebSocket.create(client, {
2057
  model: "gpt-realtime",
2058
});
2059

2060
// Handle session
2061
ws.on("session.created", (event) => {
2062
  console.log("Session created:", event.session.id);
2063
});
2064

2065
// Handle conversation
2066
ws.on("conversation.item.created", (event) => {
2067
  console.log("Item created:", event.item.type);
2068
});
2069

2070
// Handle audio output
2071
ws.on("response.audio.delta", (event) => {
2072
  const audioData = Buffer.from(event.delta, "base64");
2073
  playAudio(audioData); // Play to speaker
2074
});
2075

2076
// Handle transcripts
2077
ws.on("conversation.item.input_audio_transcription.completed", (event) => {
2078
  console.log("User said:", event.transcript);
2079
});
2080

2081
ws.on("response.audio_transcript.delta", (event) => {
2082
  process.stdout.write(event.delta);
2083
});
2084

2085
// Handle VAD
2086
ws.on("input_audio_buffer.speech_started", () => {
2087
  console.log("User started speaking");
2088
  stopAudioPlayback(); // Interrupt assistant
2089
});
2090

2091
ws.on("input_audio_buffer.speech_stopped", () => {
2092
  console.log("User stopped speaking");
2093
});
2094

2095
// Handle function calls
2096
ws.on("response.function_call_arguments.done", async (event) => {
2097
  console.log("Function call:", event.call_id);
2098

2099
  const args = JSON.parse(event.arguments);
2100
  const result = await getWeather(args.location);
2101

2102
  // Send result
2103
  ws.send({
2104
    type: "conversation.item.create",
2105
    item: {
2106
      type: "function_call_output",
2107
      call_id: event.call_id,
2108
      output: JSON.stringify(result),
2109
    },
2110
  });
2111

2112
  // Continue conversation
2113
  ws.send({
2114
    type: "response.create",
2115
  });
2116
});
2117

2118
// Handle errors
2119
ws.on("error", (event) => {
2120
  console.error("Error:", event.error.message);
2121
});
2122

2123
// Capture and send microphone audio
2124
const audioStream = captureMicrophone();
2125
audioStream.on("data", (chunk) => {
2126
  const base64 = chunk.toString("base64");
2127
  ws.send({
2128
    type: "input_audio_buffer.append",
2129
    audio: base64,
2130
  });
2131
});
2132

2133
// Cleanup
2134
process.on("SIGINT", () => {
2135
  ws.close();
2136
  process.exit(0);
2137
});
2138
```
2139

2140
## Complete Example: Phone Call Handler
2141

2142
```typescript
2143
import OpenAI from "openai";
2144
import express from "express";
2145

2146
const client = new OpenAI();
2147
const app = express();
2148

2149
app.use(express.json());
2150

2151
// Webhook for incoming calls
2152
app.post("/realtime/webhook/incoming_call", async (req, res) => {
2153
  const event = req.body;
2154

2155
  if (event.type === "realtime.call.incoming") {
2156
    const callId = event.data.id;
2157

2158
    // Accept the call
2159
    await client.realtime.calls.accept(callId, {
2160
      type: "realtime",
2161
      model: "gpt-realtime",
2162
      instructions:
2163
        "You are a customer service agent. Be professional and helpful.",
2164
      audio: {
2165
        input: {
2166
          format: { type: "audio/pcmu" }, // G.711 for telephony
2167
          turn_detection: {
2168
            type: "server_vad",
2169
            silence_duration_ms: 700,
2170
          },
2171
        },
2172
        output: {
2173
          format: { type: "audio/pcmu" },
2174
          voice: "marin",
2175
        },
2176
      },
2177
      tools: [
2178
        {
2179
          type: "function",
2180
          name: "transfer_to_agent",
2181
          description: "Transfer to human agent",
2182
          parameters: {
2183
            type: "object",
2184
            properties: {
2185
              reason: { type: "string" },
2186
            },
2187
          },
2188
        },
2189
      ],
2190
    });
2191

2192
    console.log(`Accepted call: ${callId}`);
2193
  }
2194

2195
  res.sendStatus(200);
2196
});
2197

2198
// Webhook for call events
2199
app.post("/realtime/webhook/call_events", async (req, res) => {
2200
  const event = req.body;
2201

2202
  if (event.type === "realtime.response.function_call_output.done") {
2203
    const { call_id, function_name, arguments: args } = event.data;
2204

2205
    if (function_name === "transfer_to_agent") {
2206
      // Transfer call
2207
      await client.realtime.calls.refer(call_id, {
2208
        target_uri: "sip:support@example.com",
2209
      });
2210
    }
2211
  }
2212

2213
  res.sendStatus(200);
2214
});
2215

2216
app.listen(3000, () => {
2217
  console.log("Webhook server running on port 3000");
2218
});
2219
```
2220

2221
## Type Reference
2222

2223
### Core Types
2224

2225
```typescript { .api }
2226
type RealtimeClientEvent =
2227
  | ConversationItemCreateEvent
2228
  | ConversationItemDeleteEvent
2229
  | ConversationItemRetrieveEvent
2230
  | ConversationItemTruncateEvent
2231
  | InputAudioBufferAppendEvent
2232
  | InputAudioBufferClearEvent
2233
  | OutputAudioBufferClearEvent
2234
  | InputAudioBufferCommitEvent
2235
  | ResponseCancelEvent
2236
  | ResponseCreateEvent
2237
  | SessionUpdateEvent;
2238

2239
type RealtimeServerEvent =
2240
  | ConversationCreatedEvent
2241
  | ConversationItemCreatedEvent
2242
  | ConversationItemDeletedEvent
2243
  | ConversationItemAdded
2244
  | ConversationItemDone
2245
  | ConversationItemRetrieved
2246
  | ConversationItemTruncatedEvent
2247
  | ConversationItemInputAudioTranscriptionCompletedEvent
2248
  | ConversationItemInputAudioTranscriptionDeltaEvent
2249
  | ConversationItemInputAudioTranscriptionFailedEvent
2250
  | ConversationItemInputAudioTranscriptionSegment
2251
  | InputAudioBufferClearedEvent
2252
  | InputAudioBufferCommittedEvent
2253
  | InputAudioBufferSpeechStartedEvent
2254
  | InputAudioBufferSpeechStoppedEvent
2255
  | InputAudioBufferTimeoutTriggered
2256
  | OutputAudioBufferStarted
2257
  | OutputAudioBufferStopped
2258
  | OutputAudioBufferCleared
2259
  | ResponseCreatedEvent
2260
  | ResponseDoneEvent
2261
  | ResponseOutputItemAddedEvent
2262
  | ResponseOutputItemDoneEvent
2263
  | ResponseContentPartAddedEvent
2264
  | ResponseContentPartDoneEvent
2265
  | ResponseAudioDeltaEvent
2266
  | ResponseAudioDoneEvent
2267
  | ResponseAudioTranscriptDeltaEvent
2268
  | ResponseAudioTranscriptDoneEvent
2269
  | ResponseTextDeltaEvent
2270
  | ResponseTextDoneEvent
2271
  | ResponseFunctionCallArgumentsDeltaEvent
2272
  | ResponseFunctionCallArgumentsDoneEvent
2273
  | ResponseMcpCallArgumentsDelta
2274
  | ResponseMcpCallArgumentsDone
2275
  | ResponseMcpCallInProgress
2276
  | ResponseMcpCallCompleted
2277
  | ResponseMcpCallFailed
2278
  | McpListToolsInProgress
2279
  | McpListToolsCompleted
2280
  | McpListToolsFailed
2281
  | SessionCreatedEvent
2282
  | SessionUpdatedEvent
2283
  | RateLimitsUpdatedEvent
2284
  | RealtimeErrorEvent;
2285

2286
type ConversationItem =
2287
  | RealtimeConversationItemSystemMessage
2288
  | RealtimeConversationItemUserMessage
2289
  | RealtimeConversationItemAssistantMessage
2290
  | RealtimeConversationItemFunctionCall
2291
  | RealtimeConversationItemFunctionCallOutput
2292
  | RealtimeMcpApprovalResponse
2293
  | RealtimeMcpListTools
2294
  | RealtimeMcpToolCall
2295
  | RealtimeMcpApprovalRequest;
2296

2297
interface RealtimeSession {
2298
  id?: string;
2299
  object?: "realtime.session";
2300
  model?: string;
2301
  expires_at?: number;
2302
  modalities?: Array<"text" | "audio">;
2303
  instructions?: string;
2304
  voice?: string;
2305
  input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
2306
  output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
2307
  input_audio_transcription?: AudioTranscription | null;
2308
  turn_detection?: RealtimeAudioInputTurnDetection | null;
2309
  tools?: Array<RealtimeFunctionTool>;
2310
  tool_choice?: string;
2311
  temperature?: number;
2312
  max_response_output_tokens?: number | "inf";
2313
  speed?: number;
2314
  input_audio_noise_reduction?: {
2315
    type?: NoiseReductionType;
2316
  };
2317
  include?: Array<"item.input_audio_transcription.logprobs"> | null;
2318
  prompt?: ResponsePrompt | null;
2319
  tracing?: RealtimeTracingConfig | null;
2320
  truncation?: RealtimeTruncation;
2321
}
2322

2323
interface RealtimeResponse {
2324
  id?: string;
2325
  object?: "realtime.response";
2326
  status?: RealtimeResponseStatus;
2327
  conversation_id?: string;
2328
  output?: Array<ConversationItem>;
2329
  usage?: RealtimeResponseUsage;
2330
  status_details?: {
2331
    type?: "incomplete" | "failed" | "cancelled";
2332
    reason?: string;
2333
    error?: RealtimeError | null;
2334
  } | null;
2335
  max_output_tokens?: number | "inf";
2336
  modalities?: Array<"text" | "audio">;
2337
  instructions?: string;
2338
  voice?: string;
2339
  audio?: {
2340
    format?: RealtimeAudioFormats;
2341
    speed?: number;
2342
    voice?: string;
2343
  };
2344
  metadata?: Record<string, string> | null;
2345
  tool_choice?: RealtimeToolChoiceConfig;
2346
  tools?: RealtimeToolsConfig;
2347
  temperature?: number;
2348
}
2349

2350
interface AudioTranscription {
2351
  language?: string;
2352
  model?:
2353
    | "whisper-1"
2354
    | "gpt-4o-mini-transcribe"
2355
    | "gpt-4o-transcribe"
2356
    | "gpt-4o-transcribe-diarize";
2357
  prompt?: string;
2358
}
2359

2360
type RealtimeAudioFormats =
2361
  | { type?: "audio/pcm"; rate?: 24000 }
2362
  | { type?: "audio/pcmu" }
2363
  | { type?: "audio/pcma" };
2364

2365
type NoiseReductionType = "near_field" | "far_field";
2366

2367
type RealtimeAudioInputTurnDetection =
2368
  | {
2369
      type: "server_vad";
2370
      threshold?: number;
2371
      prefix_padding_ms?: number;
2372
      silence_duration_ms?: number;
2373
      create_response?: boolean;
2374
      interrupt_response?: boolean;
2375
      idle_timeout_ms?: number | null;
2376
    }
2377
  | {
2378
      type: "semantic_vad";
2379
      eagerness?: "low" | "medium" | "high" | "auto";
2380
      create_response?: boolean;
2381
      interrupt_response?: boolean;
2382
    };
2383

2384
type RealtimeTruncation =
2385
  | "auto"
2386
  | "disabled"
2387
  | { type: "retention_ratio"; retention_ratio: number };
2388

2389
type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;
2390

2391
type RealtimeToolChoiceConfig =
2392
  | "auto"
2393
  | "none"
2394
  | "required"
2395
  | { type: "function"; function: { name: string } }
2396
  | { type: "mcp"; mcp: { server_label: string; name: string } };
2397

2398
type RealtimeTracingConfig =
2399
  | "auto"
2400
  | {
2401
      workflow_name?: string;
2402
      group_id?: string;
2403
      metadata?: unknown;
2404
    }
2405
  | null;
2406

2407
interface RealtimeError {
2408
  type: string;
2409
  code?: string | null;
2410
  message: string;
2411
  param?: string | null;
2412
  event_id?: string | null;
2413
}
2414

2415
interface RealtimeResponseUsage {
2416
  total_tokens?: number;
2417
  input_tokens?: number;
2418
  output_tokens?: number;
2419
  input_token_details?: {
2420
    text_tokens?: number;
2421
    audio_tokens?: number;
2422
    image_tokens?: number;
2423
    cached_tokens?: number;
2424
    cached_tokens_details?: {
2425
      text_tokens?: number;
2426
      audio_tokens?: number;
2427
      image_tokens?: number;
2428
    };
2429
  };
2430
  output_token_details?: {
2431
    text_tokens?: number;
2432
    audio_tokens?: number;
2433
  };
2434
}
2435

2436
interface RealtimeResponseStatus {
2437
  type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
2438
  reason?: string;
2439
}
2440
```
2441

2442
## Models
2443

2444
Available Realtime API models:
2445

2446
- `gpt-realtime` (latest)
2447
- `gpt-realtime-2025-08-28`
2448
- `gpt-4o-realtime-preview`
2449
- `gpt-4o-realtime-preview-2024-10-01`
2450
- `gpt-4o-realtime-preview-2024-12-17`
2451
- `gpt-4o-realtime-preview-2025-06-03`
2452
- `gpt-4o-mini-realtime-preview`
2453
- `gpt-4o-mini-realtime-preview-2024-12-17`
2454
- `gpt-realtime-mini`
2455
- `gpt-realtime-mini-2025-10-06`
2456
- `gpt-audio-mini`
2457
- `gpt-audio-mini-2025-10-06`
2458

2459
## Best Practices
2460

2461
### Security
2462

2463
- **Never expose API keys in browser**: Always use ephemeral session tokens
2464
- **Token expiration**: Default 10 minutes, max 2 hours
2465
- **Server-side validation**: Validate all tool calls server-side
2466
- **Rate limiting**: Monitor rate limit events and handle gracefully
2467

2468
### Performance
2469

2470
- **Audio chunking**: Send audio in chunks (1-5 seconds recommended)
2471
- **VAD tuning**: Adjust threshold and silence duration for your environment
2472
- **Voice selection**: Use `marin` or `cedar` for best quality
2473
- **Caching**: Enable context caching for repeated conversations
2474

2475
### Audio Quality
2476

2477
- **Noise reduction**: Enable for far-field or noisy environments
2478
- **Sample rate**: Always use 24kHz for PCM audio
2479
- **Format selection**: Use G.711 (pcmu/pcma) for telephony, PCM for quality
2480
- **Interrupt handling**: Clear audio buffers on interruption
2481

2482
### Conversation Management
2483

2484
- **Context length**: Monitor token usage, configure truncation
2485
- **Function calling**: Keep tool outputs concise
2486
- **System messages**: Use for mid-conversation context updates
2487
- **Item ordering**: Use `previous_item_id` for precise insertion
2488

2489
### Error Handling
2490

2491
- **Graceful degradation**: Handle WebSocket disconnections
2492
- **Retry logic**: Implement exponential backoff for transient errors
2493
- **Error logging**: Log all error events for debugging
2494
- **User feedback**: Provide clear feedback on connection/processing status
2495

2496
## Common Patterns
2497

2498
### Voice-to-Voice Assistant
2499

2500
```typescript
2501
const ws = await OpenAIRealtimeWebSocket.create(client, {
2502
  model: "gpt-realtime",
2503
});
2504

2505
// Microphone → Input Buffer
2506
micStream.on("data", (chunk) => {
2507
  ws.send({
2508
    type: "input_audio_buffer.append",
2509
    audio: chunk.toString("base64"),
2510
  });
2511
});
2512

2513
// Output Audio → Speaker
2514
ws.on("response.audio.delta", (event) => {
2515
  playAudio(Buffer.from(event.delta, "base64"));
2516
});
2517

2518
// VAD-based interruption
2519
ws.on("input_audio_buffer.speech_started", () => {
2520
  stopPlayback();
2521
});
2522
```
2523

2524
### Text-to-Voice Assistant
2525

2526
```typescript
2527
// Send text message
2528
ws.send({
2529
  type: "conversation.item.create",
2530
  item: {
2531
    type: "message",
2532
    role: "user",
2533
    content: [{ type: "input_text", text: "Hello!" }],
2534
  },
2535
});
2536

2537
// Request audio response
2538
ws.send({
2539
  type: "response.create",
2540
  response: {
2541
    modalities: ["audio"],
2542
  },
2543
});
2544
```
2545

2546
### Streaming Transcripts
2547

2548
```typescript
2549
ws.on("response.audio_transcript.delta", (event) => {
2550
  updateSubtitles(event.delta);
2551
});
2552

2553
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
2554
  updateUserTranscript(event.delta);
2555
});
2556
```
2557

2558
### Multi-Tool Assistant
2559

2560
```typescript
2561
const tools = [
2562
  {
2563
    type: "function",
2564
    name: "search_database",
2565
    description: "Search customer database",
2566
    parameters: {
2567
      /* ... */
2568
    },
2569
  },
2570
  {
2571
    type: "mcp",
2572
    server_label: "calendar",
2573
    connector_id: "connector_googlecalendar",
2574
  },
2575
];
2576

2577
ws.send({
2578
  type: "session.update",
2579
  session: { tools, tool_choice: "auto" },
2580
});
2581
```
2582

Version

Tile

Files

realtime.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

realtime.mddocs/