0
# Realtime API
1
2
The Realtime API provides WebSocket-based real-time voice conversations with OpenAI models. It supports bidirectional audio streaming, server-side voice activity detection (VAD), function calling, and full conversation management. The API is designed for live voice applications including phone calls, voice assistants, and interactive conversational experiences.
3
4
## Package Information
5
6
- **Package Name**: openai
7
- **Package Type**: npm
8
- **Language**: TypeScript
9
- **Installation**: `npm install openai`
10
11
## API Status
12
13
The Realtime API is now generally available (GA) at `client.realtime.*`.
14
15
**Deprecation Notice**: The legacy beta Realtime API at `client.beta.realtime.*` is deprecated. If you are using the beta API, migrate to the GA API documented here. The beta API includes:
16
- `client.beta.realtime.sessions.create()` (deprecated - use `client.realtime.clientSecrets.create()` instead)
17
- `client.beta.realtime.transcriptionSessions.create()` (deprecated)
18
19
All new projects should use the GA Realtime API (`client.realtime.*`) documented on this page.
20
21
## Core Imports
22
23
```typescript
24
import OpenAI from "openai";
25
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket"; // Browser
26
import { OpenAIRealtimeWS } from "openai/realtime/ws"; // Node.js (requires 'ws' package)
27
```
28
29
## WebSocket Clients
30
31
The Realtime API provides two WebSocket client implementations for different runtime environments:
32
33
### OpenAIRealtimeWebSocket (Browser)
34
35
For browser environments, use `OpenAIRealtimeWebSocket` which uses the native browser WebSocket API.
36
37
```typescript
38
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
39
import OpenAI from "openai";
40
41
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
42
43
const ws = new OpenAIRealtimeWebSocket(
44
{
45
model: "gpt-realtime",
46
dangerouslyAllowBrowser: true, // Required for browser use
47
},
48
client
49
);
50
51
// Event handling
52
ws.on("session.created", (event) => {
53
console.log("Session started:", event.session.id);
54
});
55
56
ws.on("response.audio.delta", (event) => {
57
// Handle audio deltas - event.delta is base64 encoded audio
58
const audioData = atob(event.delta);
59
playAudio(audioData);
60
});
61
62
ws.on("error", (error) => {
63
console.error("WebSocket error:", error);
64
});
65
66
// Send audio to the server
67
function sendAudio(audioData: ArrayBuffer) {
68
const base64Audio = btoa(String.fromCharCode(...new Uint8Array(audioData)));
69
ws.send({
70
type: "input_audio_buffer.append",
71
audio: base64Audio,
72
});
73
}
74
75
// Commit audio buffer to trigger processing
76
ws.send({
77
type: "input_audio_buffer.commit",
78
});
79
80
// Close connection
81
ws.close();
82
```
83
84
**Key features:**
85
- Uses native browser WebSocket API
86
- Requires `dangerouslyAllowBrowser: true` in configuration
87
- Audio must be base64 encoded
88
- Automatic reconnection handling
89
- Built-in event emitter for all realtime events
90
91
### OpenAIRealtimeWS (Node.js)
92
93
For Node.js environments, use `OpenAIRealtimeWS` which uses the `ws` package for WebSocket support.
94
95
```typescript
96
import { OpenAIRealtimeWS } from "openai/realtime/ws";
97
import OpenAI from "openai";
98
import fs from "fs";
99
100
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
101
102
const ws = new OpenAIRealtimeWS(
103
{
104
model: "gpt-realtime",
105
},
106
client
107
);
108
109
// Event handling (same interface as browser version)
110
ws.on("session.created", (event) => {
111
console.log("Session started:", event.session.id);
112
});
113
114
ws.on("response.audio.delta", (event) => {
115
// Handle audio deltas
116
const audioBuffer = Buffer.from(event.delta, "base64");
117
// Write to file or stream to audio output
118
fs.appendFileSync("output.pcm", audioBuffer);
119
});
120
121
ws.on("response.done", (event) => {
122
console.log("Response complete:", event.response.id);
123
});
124
125
// Send audio from file or buffer
126
function sendAudioFromFile(filePath: string) {
127
const audioBuffer = fs.readFileSync(filePath);
128
const base64Audio = audioBuffer.toString("base64");
129
130
ws.send({
131
type: "input_audio_buffer.append",
132
audio: base64Audio,
133
});
134
}
135
136
// Trigger response generation
137
ws.send({
138
type: "input_audio_buffer.commit",
139
});
140
141
// Close connection
142
ws.close();
143
```
144
145
**Key features:**
146
- Uses `ws` package for WebSocket support (add to dependencies: `npm install ws @types/ws`)
147
- Same event interface as browser version for consistency
148
- Better Node.js stream integration
149
- Automatic reconnection handling
150
- Suitable for server-side applications
151
152
### Common Event Patterns
153
154
Both WebSocket clients support the same event handling interface:
155
156
```typescript
157
// Connection events
158
ws.on("session.created", (event) => { /* Session initialization */ });
159
ws.on("session.updated", (event) => { /* Session configuration changed */ });
160
161
// Conversation events
162
ws.on("conversation.created", (event) => { /* New conversation */ });
163
ws.on("conversation.item.created", (event) => { /* New item added */ });
164
ws.on("conversation.item.deleted", (event) => { /* Item removed */ });
165
166
// Audio events (streaming)
167
ws.on("response.audio.delta", (event) => { /* Audio chunk received */ });
168
ws.on("response.audio.done", (event) => { /* Audio complete */ });
169
ws.on("response.audio_transcript.delta", (event) => { /* Transcript chunk */ });
170
ws.on("response.audio_transcript.done", (event) => { /* Transcript complete */ });
171
172
// Response events
173
ws.on("response.created", (event) => { /* Response started */ });
174
ws.on("response.done", (event) => { /* Response complete */ });
175
ws.on("response.cancelled", (event) => { /* Response cancelled */ });
176
ws.on("response.failed", (event) => { /* Response failed */ });
177
178
// Function calling events
179
ws.on("response.function_call_arguments.delta", (event) => { /* Function args streaming */ });
180
ws.on("response.function_call_arguments.done", (event) => { /* Function args complete */ });
181
182
// Error events
183
ws.on("error", (error) => { /* WebSocket or API error */ });
184
ws.on("close", (event) => { /* Connection closed */ });
185
```
186
187
### Sending Commands
188
189
Both clients use the same `.send()` method for sending commands:
190
191
```typescript
192
// Append audio to input buffer
193
ws.send({
194
type: "input_audio_buffer.append",
195
audio: base64AudioString,
196
});
197
198
// Commit audio buffer (triggers VAD or manual processing)
199
ws.send({
200
type: "input_audio_buffer.commit",
201
});
202
203
// Clear audio buffer
204
ws.send({
205
type: "input_audio_buffer.clear",
206
});
207
208
// Update session configuration
209
ws.send({
210
type: "session.update",
211
session: {
212
instructions: "You are a helpful assistant.",
213
turn_detection: { type: "server_vad" },
214
},
215
});
216
217
// Create conversation item (text message)
218
ws.send({
219
type: "conversation.item.create",
220
item: {
221
type: "message",
222
role: "user",
223
content: [{ type: "input_text", text: "Hello!" }],
224
},
225
});
226
227
// Trigger response generation
228
ws.send({
229
type: "response.create",
230
response: {
231
modalities: ["text", "audio"],
232
instructions: "Respond briefly.",
233
},
234
});
235
236
// Cancel in-progress response
237
ws.send({
238
type: "response.cancel",
239
});
240
```
241
242
### Connection Lifecycle
243
244
Both clients handle connection lifecycle automatically:
245
246
```typescript
247
const ws = new OpenAIRealtimeWS({ model: "gpt-realtime" }, client);
248
249
// Connection opens automatically
250
ws.on("session.created", (event) => {
251
console.log("Connected and ready");
252
});
253
254
// Handle disconnections
255
ws.on("close", (event) => {
256
console.log("Connection closed:", event.code, event.reason);
257
});
258
259
// Handle errors
260
ws.on("error", (error) => {
261
console.error("Connection error:", error);
262
});
263
264
// Manually close connection
265
ws.close();
266
```
267
268
## Basic Usage
269
270
### Creating a Session Token
271
272
```typescript
273
import OpenAI from "openai";
274
275
const client = new OpenAI({
276
apiKey: process.env.OPENAI_API_KEY,
277
});
278
279
// Create an ephemeral session token for client-side use
280
const response = await client.realtime.clientSecrets.create({
281
session: {
282
type: "realtime",
283
model: "gpt-realtime",
284
audio: {
285
input: {
286
format: { type: "audio/pcm", rate: 24000 },
287
turn_detection: {
288
type: "server_vad",
289
threshold: 0.5,
290
silence_duration_ms: 500,
291
},
292
},
293
output: {
294
format: { type: "audio/pcm", rate: 24000 },
295
voice: "marin",
296
},
297
},
298
},
299
});
300
301
const sessionToken = response.value;
302
```
303
304
### Connecting via WebSocket
305
306
```typescript
307
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
308
309
const ws = new OpenAIRealtimeWebSocket(
310
{
311
model: "gpt-realtime",
312
dangerouslyAllowBrowser: false,
313
},
314
client
315
);
316
317
// Listen for events
318
ws.on("session.created", (event) => {
319
console.log("Session created:", event);
320
});
321
322
ws.on("conversation.item.created", (event) => {
323
console.log("Item created:", event.item);
324
});
325
326
ws.on("response.audio.delta", (event) => {
327
// Handle audio delta
328
const audioData = Buffer.from(event.delta, "base64");
329
playAudio(audioData);
330
});
331
332
// Send audio
333
ws.send({
334
type: "input_audio_buffer.append",
335
audio: audioBase64String,
336
});
337
338
// Commit audio buffer
339
ws.send({
340
type: "input_audio_buffer.commit",
341
});
342
```
343
344
## Architecture
345
346
The Realtime API operates through a WebSocket connection with an event-driven architecture:
347
348
- **Session Management**: Create ephemeral tokens server-side, connect from client
349
- **Audio Streaming**: Bidirectional PCM16/G.711 audio at 24kHz
350
- **Event System**: 50+ client-to-server and server-to-client events
351
- **VAD Integration**: Server-side voice activity detection with configurable parameters
352
- **Conversation Context**: Automatic conversation history management
353
- **Function Calling**: Real-time tool execution during conversations
354
- **Phone Integration**: SIP/WebRTC support for phone calls
355
356
## Capabilities
357
358
### Session Token Creation
359
360
Generate ephemeral session tokens for secure client-side WebSocket connections.
361
362
```typescript { .api }
363
/**
364
* Create a Realtime client secret with an associated session configuration.
365
* Returns an ephemeral token with 1-minute default TTL (configurable up to 2 hours).
366
*/
367
function create(
368
params: ClientSecretCreateParams
369
): Promise<ClientSecretCreateResponse>;
370
371
interface ClientSecretCreateParams {
372
/** Configuration for the client secret expiration */
373
expires_after?: {
374
/** Anchor point for expiration (only 'created_at' is supported) */
375
anchor?: "created_at";
376
/** Seconds from anchor to expiration (10-7200, defaults to 600) */
377
seconds?: number;
378
};
379
/** Session configuration (realtime or transcription session) */
380
session?:
381
| RealtimeSessionCreateRequest
382
| RealtimeTranscriptionSessionCreateRequest;
383
}
384
385
interface ClientSecretCreateResponse {
386
/** Expiration timestamp in seconds since epoch */
387
expires_at: number;
388
/** The session configuration */
389
session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse;
390
/** The generated client secret value */
391
value: string;
392
}
393
394
interface RealtimeSessionCreateResponse {
395
/** Ephemeral key for client environments */
396
client_secret: {
397
expires_at: number;
398
value: string;
399
};
400
/** Session type: always 'realtime' */
401
type: "realtime";
402
/** Audio configuration */
403
audio?: {
404
input?: {
405
format?: RealtimeAudioFormats;
406
noise_reduction?: { type?: NoiseReductionType };
407
transcription?: AudioTranscription;
408
turn_detection?: ServerVad | SemanticVad | null;
409
};
410
output?: {
411
format?: RealtimeAudioFormats;
412
speed?: number;
413
voice?: string;
414
};
415
};
416
/** Fields to include in server outputs */
417
include?: Array<"item.input_audio_transcription.logprobs">;
418
/** System instructions for the model */
419
instructions?: string;
420
/** Max output tokens (1-4096 or 'inf') */
421
max_output_tokens?: number | "inf";
422
/** Realtime model to use */
423
model?: string;
424
/** Output modalities ('text' | 'audio') */
425
output_modalities?: Array<"text" | "audio">;
426
/** Prompt template reference */
427
prompt?: ResponsePrompt | null;
428
/** Tool choice configuration */
429
tool_choice?: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp;
430
/** Available tools */
431
tools?: Array<RealtimeFunctionTool | McpTool>;
432
/** Tracing configuration */
433
tracing?: "auto" | TracingConfiguration | null;
434
/** Truncation behavior */
435
truncation?: RealtimeTruncation;
436
}
437
```
438
439
[Session Token Creation](./realtime.md#session-token-creation)
440
441
### SIP Call Management
442
443
Manage incoming and outgoing SIP/WebRTC calls with the Realtime API.
444
445
```typescript { .api }
446
/**
447
* Accept an incoming SIP call and configure the realtime session that will handle it
448
*/
449
function accept(
450
callID: string,
451
params: CallAcceptParams,
452
options?: RequestOptions
453
): Promise<void>;
454
455
/**
456
* End an active Realtime API call, whether it was initiated over SIP or WebRTC
457
*/
458
function hangup(
459
callID: string,
460
options?: RequestOptions
461
): Promise<void>;
462
463
/**
464
* Transfer an active SIP call to a new destination using the SIP REFER verb
465
*/
466
function refer(
467
callID: string,
468
params: CallReferParams,
469
options?: RequestOptions
470
): Promise<void>;
471
472
/**
473
* Decline an incoming SIP call by returning a SIP status code to the caller
474
*/
475
function reject(
476
callID: string,
477
params?: CallRejectParams,
478
options?: RequestOptions
479
): Promise<void>;
480
481
interface CallAcceptParams {
482
/** The type of session to create. Always 'realtime' for the Realtime API */
483
type: "realtime";
484
/** Configuration for input and output audio */
485
audio?: RealtimeAudioConfig;
486
/** Additional fields to include in server outputs */
487
include?: Array<"item.input_audio_transcription.logprobs">;
488
/** The default system instructions prepended to model calls */
489
instructions?: string;
490
/** Maximum number of output tokens for a single assistant response (1-4096 or 'inf') */
491
max_output_tokens?: number | "inf";
492
/** The Realtime model used for this session */
493
model?: string;
494
/** The set of modalities the model can respond with */
495
output_modalities?: Array<"text" | "audio">;
496
/** Reference to a prompt template and its variables */
497
prompt?: ResponsePrompt | null;
498
/** How the model chooses tools */
499
tool_choice?: RealtimeToolChoiceConfig;
500
/** Tools available to the model */
501
tools?: RealtimeToolsConfig;
502
/** Tracing configuration for the session */
503
tracing?: RealtimeTracingConfig | null;
504
/** Truncation behavior when conversation exceeds token limits */
505
truncation?: RealtimeTruncation;
506
}
507
508
interface CallReferParams {
509
/** URI that should appear in the SIP Refer-To header (e.g., 'tel:+14155550123' or 'sip:agent@example.com') */
510
target_uri: string;
511
}
512
513
interface CallRejectParams {
514
/** SIP response code to send back to the caller. Defaults to 603 (Decline) when omitted */
515
status_code?: number;
516
}
517
```
518
519
**Available at:** `client.realtime.calls`
520
521
**Usage Example:**
522
523
```typescript
524
import OpenAI from "openai";
525
526
const client = new OpenAI({
527
apiKey: process.env.OPENAI_API_KEY,
528
});
529
530
// Accept incoming call
531
await client.realtime.calls.accept("call-123", {
532
type: "realtime",
533
model: "gpt-realtime",
534
audio: {
535
input: { format: { type: "audio/pcm", rate: 24000 } },
536
output: { format: { type: "audio/pcm", rate: 24000 }, voice: "marin" },
537
},
538
instructions: "You are a helpful phone assistant.",
539
});
540
541
// Hang up call
542
await client.realtime.calls.hangup("call-123");
543
544
// Reject incoming call
545
await client.realtime.calls.reject("call-123", {
546
status_code: 603, // Decline
547
});
548
549
// Transfer call
550
await client.realtime.calls.refer("call-123", {
551
target_uri: "tel:+14155550123",
552
});
553
```
554
555
### WebSocket Connection
556
557
Connect to the Realtime API using WebSocket with the OpenAIRealtimeWebSocket class.
558
559
```typescript { .api }
560
/**
561
* WebSocket client for the Realtime API. Handles connection lifecycle,
562
* event streaming, and message sending.
563
*/
564
class OpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {
565
url: URL;
566
socket: WebSocket;
567
568
constructor(
569
props: {
570
model: string;
571
dangerouslyAllowBrowser?: boolean;
572
onURL?: (url: URL) => void;
573
__resolvedApiKey?: boolean;
574
},
575
client?: Pick<OpenAI, "apiKey" | "baseURL">
576
);
577
578
/**
579
* Factory method that resolves API key before connecting
580
*/
581
static create(
582
client: Pick<OpenAI, "apiKey" | "baseURL" | "_callApiKey">,
583
props: { model: string; dangerouslyAllowBrowser?: boolean }
584
): Promise<OpenAIRealtimeWebSocket>;
585
586
/**
587
* Factory method for Azure OpenAI connections
588
*/
589
static azure(
590
client: Pick<
591
AzureOpenAI,
592
"_callApiKey" | "apiVersion" | "apiKey" | "baseURL" | "deploymentName"
593
>,
594
options?: {
595
deploymentName?: string;
596
dangerouslyAllowBrowser?: boolean;
597
}
598
): Promise<OpenAIRealtimeWebSocket>;
599
600
/**
601
* Send a client event to the server
602
*/
603
send(event: RealtimeClientEvent): void;
604
605
/**
606
* Close the WebSocket connection
607
*/
608
close(props?: { code: number; reason: string }): void;
609
610
/**
611
* Register event listener
612
*/
613
on(event: string, listener: (event: any) => void): void;
614
}
615
```
616
617
**Usage:**
618
619
```typescript
620
// Standard connection
621
const ws = await OpenAIRealtimeWebSocket.create(client, {
622
model: "gpt-realtime",
623
});
624
625
// Azure connection
626
const wsAzure = await OpenAIRealtimeWebSocket.azure(azureClient, {
627
deploymentName: "my-realtime-deployment",
628
});
629
```
630
631
[WebSocket Connection](./realtime.md#websocket-connection)
632
633
### Phone Call Methods
634
635
Accept, reject, transfer, and hang up phone calls via SIP integration.
636
637
```typescript { .api }
638
/**
639
* Accept an incoming SIP call and configure the realtime session
640
*/
641
function accept(callID: string, params: CallAcceptParams): Promise<void>;
642
643
/**
644
* End an active Realtime API call (SIP or WebRTC)
645
*/
646
function hangup(callID: string): Promise<void>;
647
648
/**
649
* Transfer an active SIP call to a new destination using SIP REFER
650
*/
651
function refer(callID: string, params: CallReferParams): Promise<void>;
652
653
/**
654
* Decline an incoming SIP call with a SIP status code
655
*/
656
function reject(
657
callID: string,
658
params?: CallRejectParams
659
): Promise<void>;
660
661
interface CallAcceptParams {
662
type: "realtime";
663
audio?: RealtimeAudioConfig;
664
include?: Array<"item.input_audio_transcription.logprobs">;
665
instructions?: string;
666
max_output_tokens?: number | "inf";
667
model?: string;
668
output_modalities?: Array<"text" | "audio">;
669
prompt?: ResponsePrompt | null;
670
tool_choice?: RealtimeToolChoiceConfig;
671
tools?: RealtimeToolsConfig;
672
tracing?: RealtimeTracingConfig | null;
673
truncation?: RealtimeTruncation;
674
}
675
676
interface CallReferParams {
677
/** URI in SIP Refer-To header (e.g., 'tel:+14155550123') */
678
target_uri: string;
679
}
680
681
interface CallRejectParams {
682
/** SIP response code (defaults to 603 Decline) */
683
status_code?: number;
684
}
685
```
686
687
**Usage:**
688
689
```typescript
690
// Accept incoming call
691
await client.realtime.calls.accept("call_abc123", {
692
type: "realtime",
693
model: "gpt-realtime",
694
instructions: "You are a helpful assistant on a phone call.",
695
audio: {
696
output: { voice: "marin" },
697
},
698
});
699
700
// Transfer call
701
await client.realtime.calls.refer("call_abc123", {
702
target_uri: "tel:+14155550199",
703
});
704
705
// Reject call
706
await client.realtime.calls.reject("call_abc123", {
707
status_code: 486, // Busy Here
708
});
709
710
// Hang up
711
await client.realtime.calls.hangup("call_abc123");
712
```
713
714
[Phone Call Methods](./realtime.md#phone-call-methods)
715
716
### Session Configuration
717
718
Configure session parameters including audio formats, VAD, and model settings.
719
720
```typescript { .api }
721
interface RealtimeSession {
722
id?: string;
723
expires_at?: number;
724
/** Fields to include in server outputs */
725
include?: Array<"item.input_audio_transcription.logprobs"> | null;
726
/** Input audio format: 'pcm16', 'g711_ulaw', or 'g711_alaw' */
727
input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
728
/** Noise reduction configuration */
729
input_audio_noise_reduction?: {
730
type?: NoiseReductionType;
731
};
732
/** Transcription configuration */
733
input_audio_transcription?: AudioTranscription | null;
734
/** System instructions */
735
instructions?: string;
736
/** Max output tokens per response */
737
max_response_output_tokens?: number | "inf";
738
/** Response modalities */
739
modalities?: Array<"text" | "audio">;
740
/** Model identifier */
741
model?: string;
742
object?: "realtime.session";
743
/** Output audio format */
744
output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
745
/** Prompt template reference */
746
prompt?: ResponsePrompt | null;
747
/** Audio playback speed (0.25-1.5) */
748
speed?: number;
749
/** Sampling temperature (0.6-1.2) */
750
temperature?: number;
751
/** Tool choice mode */
752
tool_choice?: string;
753
/** Available tools */
754
tools?: Array<RealtimeFunctionTool>;
755
/** Tracing configuration */
756
tracing?: "auto" | TracingConfiguration | null;
757
/** Turn detection configuration */
758
turn_detection?: RealtimeAudioInputTurnDetection | null;
759
/** Truncation behavior */
760
truncation?: RealtimeTruncation;
761
/** Output voice */
762
voice?: string;
763
}
764
765
interface AudioTranscription {
766
/** Language code (ISO-639-1, e.g., 'en') */
767
language?: string;
768
/** Transcription model */
769
model?:
770
| "whisper-1"
771
| "gpt-4o-mini-transcribe"
772
| "gpt-4o-transcribe"
773
| "gpt-4o-transcribe-diarize";
774
/** Transcription guidance prompt */
775
prompt?: string;
776
}
777
778
type NoiseReductionType = "near_field" | "far_field";
779
780
type RealtimeTruncation =
781
| "auto"
782
| "disabled"
783
| {
784
type: "retention_ratio";
785
/** Fraction of max context to retain (0.0-1.0) */
786
retention_ratio: number;
787
};
788
```
789
790
[Session Configuration](./realtime.md#session-configuration)
791
792
### Turn Detection (VAD)
793
794
Configure voice activity detection for automatic turn taking.
795
796
```typescript { .api }
797
/**
798
* Server VAD: Simple volume-based voice activity detection
799
*/
800
interface ServerVad {
801
type: "server_vad";
802
/** Auto-generate response on VAD stop */
803
create_response?: boolean;
804
/** Timeout for prompting user to continue (ms) */
805
idle_timeout_ms?: number | null;
806
/** Auto-interrupt on VAD start */
807
interrupt_response?: boolean;
808
/** Audio prefix padding (ms, default: 300) */
809
prefix_padding_ms?: number;
810
/** Silence duration to detect stop (ms, default: 500) */
811
silence_duration_ms?: number;
812
/** VAD activation threshold (0.0-1.0, default: 0.5) */
813
threshold?: number;
814
}
815
816
/**
817
* Semantic VAD: Model-based turn detection with dynamic timeouts
818
*/
819
interface SemanticVad {
820
type: "semantic_vad";
821
/** Auto-generate response on VAD stop */
822
create_response?: boolean;
823
/** Eagerness: 'low' (8s), 'medium' (4s), 'high' (2s), 'auto' */
824
eagerness?: "low" | "medium" | "high" | "auto";
825
/** Auto-interrupt on VAD start */
826
interrupt_response?: boolean;
827
}
828
829
type RealtimeAudioInputTurnDetection = ServerVad | SemanticVad;
830
```
831
832
**Usage:**
833
834
```typescript
835
// Server VAD with custom settings
836
{
837
type: "server_vad",
838
threshold: 0.6,
839
silence_duration_ms: 700,
840
prefix_padding_ms: 300,
841
interrupt_response: true,
842
create_response: true,
843
idle_timeout_ms: 30000
844
}
845
846
// Semantic VAD for natural conversations
847
{
848
type: "semantic_vad",
849
eagerness: "medium",
850
interrupt_response: true,
851
create_response: true
852
}
853
854
// Manual turn detection (no VAD)
855
{
856
turn_detection: null
857
}
858
```
859
860
[Turn Detection](./realtime.md#turn-detection-vad)
861
862
### Audio Formats
863
864
Configure input and output audio formats for the session.
865
866
```typescript { .api }
867
/**
868
* PCM 16-bit audio at 24kHz sample rate
869
*/
870
interface AudioPCM {
871
type?: "audio/pcm";
872
rate?: 24000;
873
}
874
875
/**
876
* G.711 μ-law format (commonly used in telephony)
877
*/
878
interface AudioPCMU {
879
type?: "audio/pcmu";
880
}
881
882
/**
883
* G.711 A-law format (commonly used in telephony)
884
*/
885
interface AudioPCMA {
886
type?: "audio/pcma";
887
}
888
889
type RealtimeAudioFormats = AudioPCM | AudioPCMU | AudioPCMA;
890
891
interface RealtimeAudioConfig {
892
input?: {
893
format?: RealtimeAudioFormats;
894
noise_reduction?: { type?: NoiseReductionType };
895
transcription?: AudioTranscription;
896
turn_detection?: RealtimeAudioInputTurnDetection | null;
897
};
898
output?: {
899
format?: RealtimeAudioFormats;
900
/** Playback speed multiplier (0.25-1.5) */
901
speed?: number;
902
/** Voice selection */
903
voice?:
904
| string
905
| "alloy"
906
| "ash"
907
| "ballad"
908
| "coral"
909
| "echo"
910
| "sage"
911
| "shimmer"
912
| "verse"
913
| "marin"
914
| "cedar";
915
};
916
}
917
```
918
919
[Audio Formats](./realtime.md#audio-formats)
920
921
### Client-to-Server Events
922
923
Events sent from client to server to control the conversation.
924
925
```typescript { .api }
926
/**
927
* Union of all client events
928
*/
929
type RealtimeClientEvent =
930
| ConversationItemCreateEvent
931
| ConversationItemDeleteEvent
932
| ConversationItemRetrieveEvent
933
| ConversationItemTruncateEvent
934
| InputAudioBufferAppendEvent
935
| InputAudioBufferClearEvent
936
| OutputAudioBufferClearEvent
937
| InputAudioBufferCommitEvent
938
| ResponseCancelEvent
939
| ResponseCreateEvent
940
| SessionUpdateEvent;
941
942
/**
943
* Add conversation item (message, function call, or output)
944
*/
945
interface ConversationItemCreateEvent {
946
type: "conversation.item.create";
947
item: ConversationItem;
948
event_id?: string;
949
/** Insert after this item ID ('root' for beginning) */
950
previous_item_id?: string;
951
}
952
953
/**
954
* Delete conversation item by ID
955
*/
956
interface ConversationItemDeleteEvent {
957
type: "conversation.item.delete";
958
item_id: string;
959
event_id?: string;
960
}
961
962
/**
963
* Retrieve full item including audio data
964
*/
965
interface ConversationItemRetrieveEvent {
966
type: "conversation.item.retrieve";
967
item_id: string;
968
event_id?: string;
969
}
970
971
/**
972
* Truncate assistant audio message
973
*/
974
interface ConversationItemTruncateEvent {
975
type: "conversation.item.truncate";
976
item_id: string;
977
content_index: number;
978
/** Duration to keep in milliseconds */
979
audio_end_ms: number;
980
event_id?: string;
981
}
982
983
/**
984
* Append audio to input buffer
985
*/
986
interface InputAudioBufferAppendEvent {
987
type: "input_audio_buffer.append";
988
/** Base64-encoded audio bytes */
989
audio: string;
990
event_id?: string;
991
}
992
993
/**
994
* Clear input audio buffer
995
*/
996
interface InputAudioBufferClearEvent {
997
type: "input_audio_buffer.clear";
998
event_id?: string;
999
}
1000
1001
/**
1002
* Commit input audio buffer to conversation
1003
*/
1004
interface InputAudioBufferCommitEvent {
1005
type: "input_audio_buffer.commit";
1006
event_id?: string;
1007
}
1008
1009
/**
1010
* WebRTC only: Clear output audio buffer
1011
*/
1012
interface OutputAudioBufferClearEvent {
1013
type: "output_audio_buffer.clear";
1014
event_id?: string;
1015
}
1016
1017
/**
1018
* Cancel in-progress response
1019
*/
1020
interface ResponseCancelEvent {
1021
type: "response.cancel";
1022
event_id?: string;
1023
}
1024
1025
/**
1026
* Request model response
1027
*/
1028
interface ResponseCreateEvent {
1029
type: "response.create";
1030
response?: {
1031
modalities?: Array<"text" | "audio">;
1032
instructions?: string;
1033
voice?: string;
1034
output_audio_format?: string;
1035
tools?: Array<RealtimeFunctionTool>;
1036
tool_choice?: string;
1037
temperature?: number;
1038
max_output_tokens?: number | "inf";
1039
conversation?: "auto" | "none";
1040
metadata?: Record<string, string>;
1041
input?: Array<ConversationItemWithReference>;
1042
};
1043
event_id?: string;
1044
}
1045
1046
/**
1047
* Update session configuration
1048
*/
1049
interface SessionUpdateEvent {
1050
type: "session.update";
1051
session: Partial<RealtimeSession>;
1052
event_id?: string;
1053
}
1054
```
1055
1056
[Client Events](./realtime.md#client-to-server-events)
1057
1058
### Server-to-Client Events
1059
1060
Events sent from server to client during the conversation.
1061
1062
```typescript { .api }
1063
/**
1064
* Union of all server events (50+ event types)
1065
*/
1066
type RealtimeServerEvent =
1067
| ConversationCreatedEvent
1068
| ConversationItemCreatedEvent
1069
| ConversationItemDeletedEvent
1070
| ConversationItemAdded
1071
| ConversationItemDone
1072
| ConversationItemRetrieved
1073
| ConversationItemTruncatedEvent
1074
| ConversationItemInputAudioTranscriptionCompletedEvent
1075
| ConversationItemInputAudioTranscriptionDeltaEvent
1076
| ConversationItemInputAudioTranscriptionFailedEvent
1077
| ConversationItemInputAudioTranscriptionSegment
1078
| InputAudioBufferClearedEvent
1079
| InputAudioBufferCommittedEvent
1080
| InputAudioBufferSpeechStartedEvent
1081
| InputAudioBufferSpeechStoppedEvent
1082
| InputAudioBufferTimeoutTriggered
1083
| OutputAudioBufferStarted
1084
| OutputAudioBufferStopped
1085
| OutputAudioBufferCleared
1086
| ResponseCreatedEvent
1087
| ResponseDoneEvent
1088
| ResponseOutputItemAddedEvent
1089
| ResponseOutputItemDoneEvent
1090
| ResponseContentPartAddedEvent
1091
| ResponseContentPartDoneEvent
1092
| ResponseAudioDeltaEvent
1093
| ResponseAudioDoneEvent
1094
| ResponseAudioTranscriptDeltaEvent
1095
| ResponseAudioTranscriptDoneEvent
1096
| ResponseTextDeltaEvent
1097
| ResponseTextDoneEvent
1098
| ResponseFunctionCallArgumentsDeltaEvent
1099
| ResponseFunctionCallArgumentsDoneEvent
1100
| ResponseMcpCallArgumentsDelta
1101
| ResponseMcpCallArgumentsDone
1102
| ResponseMcpCallInProgress
1103
| ResponseMcpCallCompleted
1104
| ResponseMcpCallFailed
1105
| McpListToolsInProgress
1106
| McpListToolsCompleted
1107
| McpListToolsFailed
1108
| SessionCreatedEvent
1109
| SessionUpdatedEvent
1110
| RateLimitsUpdatedEvent
1111
| RealtimeErrorEvent;
1112
1113
/**
1114
* Session created (first event after connection)
1115
*/
1116
interface SessionCreatedEvent {
1117
type: "session.created";
1118
event_id: string;
1119
session: RealtimeSession;
1120
}
1121
1122
/**
1123
* Session updated after client session.update
1124
*/
1125
interface SessionUpdatedEvent {
1126
type: "session.updated";
1127
event_id: string;
1128
session: RealtimeSession;
1129
}
1130
1131
/**
1132
* Conversation created
1133
*/
1134
interface ConversationCreatedEvent {
1135
type: "conversation.created";
1136
event_id: string;
1137
conversation: {
1138
id?: string;
1139
object?: "realtime.conversation";
1140
};
1141
}
1142
1143
/**
1144
* Item created in conversation
1145
*/
1146
interface ConversationItemCreatedEvent {
1147
type: "conversation.item.created";
1148
event_id: string;
1149
item: ConversationItem;
1150
previous_item_id?: string | null;
1151
}
1152
1153
/**
1154
* Item added to conversation (may have partial content)
1155
*/
1156
interface ConversationItemAdded {
1157
type: "conversation.item.added";
1158
event_id: string;
1159
item: ConversationItem;
1160
previous_item_id?: string | null;
1161
}
1162
1163
/**
1164
* Item finalized with complete content
1165
*/
1166
interface ConversationItemDone {
1167
type: "conversation.item.done";
1168
event_id: string;
1169
item: ConversationItem;
1170
previous_item_id?: string | null;
1171
}
1172
1173
/**
1174
* Input audio buffer committed
1175
*/
1176
interface InputAudioBufferCommittedEvent {
1177
type: "input_audio_buffer.committed";
1178
event_id: string;
1179
item_id: string;
1180
previous_item_id?: string | null;
1181
}
1182
1183
/**
1184
* Speech detected in input buffer (VAD start)
1185
*/
1186
interface InputAudioBufferSpeechStartedEvent {
1187
type: "input_audio_buffer.speech_started";
1188
event_id: string;
1189
item_id: string;
1190
/** Milliseconds from session start */
1191
audio_start_ms: number;
1192
}
1193
1194
/**
1195
* Speech ended in input buffer (VAD stop)
1196
*/
1197
interface InputAudioBufferSpeechStoppedEvent {
1198
type: "input_audio_buffer.speech_stopped";
1199
event_id: string;
1200
item_id: string;
1201
/** Milliseconds from session start */
1202
audio_end_ms: number;
1203
}
1204
1205
/**
1206
* Response started
1207
*/
1208
interface ResponseCreatedEvent {
1209
type: "response.created";
1210
event_id: string;
1211
response: RealtimeResponse;
1212
}
1213
1214
/**
1215
* Response completed
1216
*/
1217
interface ResponseDoneEvent {
1218
type: "response.done";
1219
event_id: string;
1220
response: RealtimeResponse;
1221
}
1222
1223
/**
1224
* Audio delta (streaming audio chunk)
1225
*/
1226
interface ResponseAudioDeltaEvent {
1227
type: "response.audio.delta";
1228
event_id: string;
1229
response_id: string;
1230
item_id: string;
1231
output_index: number;
1232
content_index: number;
1233
/** Base64-encoded audio bytes */
1234
delta: string;
1235
}
1236
1237
/**
1238
* Audio generation completed
1239
*/
1240
interface ResponseAudioDoneEvent {
1241
type: "response.audio.done";
1242
event_id: string;
1243
response_id: string;
1244
item_id: string;
1245
output_index: number;
1246
content_index: number;
1247
}
1248
1249
/**
1250
* Text delta (streaming text chunk)
1251
*/
1252
interface ResponseTextDeltaEvent {
1253
type: "response.text.delta";
1254
event_id: string;
1255
response_id: string;
1256
item_id: string;
1257
output_index: number;
1258
content_index: number;
1259
/** Text chunk */
1260
delta: string;
1261
}
1262
1263
/**
1264
* Text generation completed
1265
*/
1266
interface ResponseTextDoneEvent {
1267
type: "response.text.done";
1268
event_id: string;
1269
response_id: string;
1270
item_id: string;
1271
output_index: number;
1272
content_index: number;
1273
/** Complete text */
1274
text: string;
1275
}
1276
1277
/**
1278
* Function call arguments delta
1279
*/
1280
interface ResponseFunctionCallArgumentsDeltaEvent {
1281
type: "response.function_call_arguments.delta";
1282
event_id: string;
1283
response_id: string;
1284
item_id: string;
1285
output_index: number;
1286
call_id: string;
1287
/** JSON arguments chunk */
1288
delta: string;
1289
}
1290
1291
/**
1292
* Function call arguments completed
1293
*/
1294
interface ResponseFunctionCallArgumentsDoneEvent {
1295
type: "response.function_call_arguments.done";
1296
event_id: string;
1297
response_id: string;
1298
item_id: string;
1299
output_index: number;
1300
call_id: string;
1301
/** Complete JSON arguments */
1302
arguments: string;
1303
}
1304
1305
/**
1306
* Error occurred
1307
*/
1308
interface RealtimeErrorEvent {
1309
type: "error";
1310
event_id: string;
1311
error: {
1312
type: string;
1313
code?: string | null;
1314
message: string;
1315
param?: string | null;
1316
event_id?: string | null;
1317
};
1318
}
1319
```
1320
1321
[Server Events](./realtime.md#server-to-client-events)
1322
1323
### Conversation Items
1324
1325
Items that make up the conversation history.
1326
1327
```typescript { .api }
1328
/**
1329
* Union of all conversation item types
1330
*/
1331
type ConversationItem =
1332
| RealtimeConversationItemSystemMessage
1333
| RealtimeConversationItemUserMessage
1334
| RealtimeConversationItemAssistantMessage
1335
| RealtimeConversationItemFunctionCall
1336
| RealtimeConversationItemFunctionCallOutput
1337
| RealtimeMcpApprovalResponse
1338
| RealtimeMcpListTools
1339
| RealtimeMcpToolCall
1340
| RealtimeMcpApprovalRequest;
1341
1342
/**
1343
* System message item
1344
*/
1345
interface RealtimeConversationItemSystemMessage {
1346
type: "message";
1347
role: "system";
1348
content: Array<{
1349
type?: "input_text";
1350
text?: string;
1351
}>;
1352
id?: string;
1353
object?: "realtime.item";
1354
status?: "completed" | "incomplete" | "in_progress";
1355
}
1356
1357
/**
1358
* User message item (text, audio, or image)
1359
*/
1360
interface RealtimeConversationItemUserMessage {
1361
type: "message";
1362
role: "user";
1363
content: Array<{
1364
type?: "input_text" | "input_audio" | "input_image";
1365
text?: string;
1366
audio?: string; // Base64-encoded
1367
transcript?: string;
1368
image_url?: string; // Data URI
1369
detail?: "auto" | "low" | "high";
1370
}>;
1371
id?: string;
1372
object?: "realtime.item";
1373
status?: "completed" | "incomplete" | "in_progress";
1374
}
1375
1376
/**
1377
* Assistant message item (text or audio)
1378
*/
1379
interface RealtimeConversationItemAssistantMessage {
1380
type: "message";
1381
role: "assistant";
1382
content: Array<{
1383
type?: "output_text" | "output_audio";
1384
text?: string;
1385
audio?: string; // Base64-encoded
1386
transcript?: string;
1387
}>;
1388
id?: string;
1389
object?: "realtime.item";
1390
status?: "completed" | "incomplete" | "in_progress";
1391
}
1392
1393
/**
1394
* Function call item
1395
*/
1396
interface RealtimeConversationItemFunctionCall {
1397
type: "function_call";
1398
name: string;
1399
/** JSON-encoded arguments */
1400
arguments: string;
1401
call_id?: string;
1402
id?: string;
1403
object?: "realtime.item";
1404
status?: "completed" | "incomplete" | "in_progress";
1405
}
1406
1407
/**
1408
* Function call output item
1409
*/
1410
interface RealtimeConversationItemFunctionCallOutput {
1411
type: "function_call_output";
1412
call_id: string;
1413
/** Function output (free text) */
1414
output: string;
1415
id?: string;
1416
object?: "realtime.item";
1417
status?: "completed" | "incomplete" | "in_progress";
1418
}
1419
1420
/**
1421
* MCP tool call item
1422
*/
1423
interface RealtimeMcpToolCall {
1424
type: "mcp_call";
1425
id: string;
1426
server_label: string;
1427
name: string;
1428
arguments: string;
1429
output?: string | null;
1430
error?:
1431
| { type: "protocol_error"; code: number; message: string }
1432
| { type: "tool_execution_error"; message: string }
1433
| { type: "http_error"; code: number; message: string }
1434
| null;
1435
approval_request_id?: string | null;
1436
}
1437
1438
/**
1439
* MCP approval request item
1440
*/
1441
interface RealtimeMcpApprovalRequest {
1442
type: "mcp_approval_request";
1443
id: string;
1444
server_label: string;
1445
name: string;
1446
arguments: string;
1447
}
1448
1449
/**
1450
* MCP approval response item
1451
*/
1452
interface RealtimeMcpApprovalResponse {
1453
type: "mcp_approval_response";
1454
id: string;
1455
approval_request_id: string;
1456
approve: boolean;
1457
reason?: string | null;
1458
}
1459
```
1460
1461
[Conversation Items](./realtime.md#conversation-items)
1462
1463
### Function Calling
1464
1465
Define and use tools during real-time conversations.
1466
1467
```typescript { .api }
1468
/**
1469
* Function tool definition for realtime conversations
1470
*/
1471
interface RealtimeFunctionTool {
1472
type?: "function";
1473
/** Function name */
1474
name?: string;
1475
/** Description and usage guidance */
1476
description?: string;
1477
/** JSON Schema for function parameters */
1478
parameters?: unknown;
1479
}
1480
1481
/**
1482
* MCP (Model Context Protocol) tool configuration
1483
*/
1484
interface McpTool {
1485
type: "mcp";
1486
/** Label identifying the MCP server */
1487
server_label: string;
1488
/** MCP server URL or connector ID */
1489
server_url?: string;
1490
connector_id?:
1491
| "connector_dropbox"
1492
| "connector_gmail"
1493
| "connector_googlecalendar"
1494
| "connector_googledrive"
1495
| "connector_microsoftteams"
1496
| "connector_outlookcalendar"
1497
| "connector_outlookemail"
1498
| "connector_sharepoint";
1499
/** Server description */
1500
server_description?: string;
1501
/** Allowed tools filter */
1502
allowed_tools?:
1503
| Array<string>
1504
| {
1505
tool_names?: Array<string>;
1506
read_only?: boolean;
1507
}
1508
| null;
1509
/** Approval requirements */
1510
require_approval?:
1511
| "always"
1512
| "never"
1513
| {
1514
always?: { tool_names?: Array<string>; read_only?: boolean };
1515
never?: { tool_names?: Array<string>; read_only?: boolean };
1516
}
1517
| null;
1518
/** OAuth access token */
1519
authorization?: string;
1520
/** HTTP headers */
1521
headers?: Record<string, string> | null;
1522
}
1523
1524
type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;
1525
1526
type RealtimeToolChoiceConfig =
1527
| "auto"
1528
| "none"
1529
| "required"
1530
| { type: "function"; function: { name: string } }
1531
| { type: "mcp"; mcp: { server_label: string; name: string } };
1532
```
1533
1534
**Usage:**
1535
1536
```typescript
1537
// Define tools
1538
const tools: RealtimeToolsConfig = [
1539
{
1540
type: "function",
1541
name: "get_weather",
1542
description: "Get current weather for a location",
1543
parameters: {
1544
type: "object",
1545
properties: {
1546
location: { type: "string" },
1547
unit: { type: "string", enum: ["celsius", "fahrenheit"] },
1548
},
1549
required: ["location"],
1550
},
1551
},
1552
{
1553
type: "mcp",
1554
server_label: "calendar",
1555
connector_id: "connector_googlecalendar",
1556
allowed_tools: {
1557
tool_names: ["list_events", "create_event"],
1558
},
1559
},
1560
];
1561
1562
// Update session with tools
1563
ws.send({
1564
type: "session.update",
1565
session: {
1566
tools,
1567
tool_choice: "auto",
1568
},
1569
});
1570
1571
// Handle function call
1572
ws.on("response.function_call_arguments.done", async (event) => {
1573
const result = await executeFunction(event.call_id, event.arguments);
1574
1575
// Send function output
1576
ws.send({
1577
type: "conversation.item.create",
1578
item: {
1579
type: "function_call_output",
1580
call_id: event.call_id,
1581
output: JSON.stringify(result),
1582
},
1583
});
1584
1585
// Trigger new response
1586
ws.send({
1587
type: "response.create",
1588
});
1589
});
1590
```
1591
1592
[Function Calling](./realtime.md#function-calling)
1593
1594
### Response Configuration
1595
1596
Configure individual response parameters.
1597
1598
```typescript { .api }
1599
/**
1600
* Response resource
1601
*/
1602
interface RealtimeResponse {
1603
id?: string;
1604
object?: "realtime.response";
1605
/** Conversation ID or null */
1606
conversation_id?: string;
1607
/** Status: 'in_progress', 'completed', 'cancelled', 'failed', 'incomplete' */
1608
status?: RealtimeResponseStatus;
1609
/** Usage statistics */
1610
usage?: RealtimeResponseUsage;
1611
/** Max output tokens */
1612
max_output_tokens?: number | "inf";
1613
/** Response modalities */
1614
modalities?: Array<"text" | "audio">;
1615
/** Instructions for this response */
1616
instructions?: string;
1617
/** Voice selection */
1618
voice?: string;
1619
/** Audio output configuration */
1620
audio?: {
1621
format?: RealtimeAudioFormats;
1622
speed?: number;
1623
voice?: string;
1624
};
1625
/** Response metadata */
1626
metadata?: Record<string, string> | null;
1627
/** Tool choice */
1628
tool_choice?: RealtimeToolChoiceConfig;
1629
/** Tools for this response */
1630
tools?: RealtimeToolsConfig;
1631
/** Temperature */
1632
temperature?: number;
1633
/** Output items */
1634
output?: Array<ConversationItem>;
1635
/** Status details */
1636
status_details?: {
1637
type?: "incomplete" | "failed" | "cancelled";
1638
reason?: string;
1639
error?: RealtimeError | null;
1640
} | null;
1641
}
1642
1643
interface RealtimeResponseStatus {
1644
type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
1645
/** Additional status information */
1646
reason?: string;
1647
}
1648
1649
interface RealtimeResponseUsage {
1650
/** Total tokens (input + output) */
1651
total_tokens?: number;
1652
/** Input tokens */
1653
input_tokens?: number;
1654
/** Output tokens */
1655
output_tokens?: number;
1656
/** Input token breakdown */
1657
input_token_details?: {
1658
text_tokens?: number;
1659
audio_tokens?: number;
1660
image_tokens?: number;
1661
cached_tokens?: number;
1662
cached_tokens_details?: {
1663
text_tokens?: number;
1664
audio_tokens?: number;
1665
image_tokens?: number;
1666
};
1667
};
1668
/** Output token breakdown */
1669
output_token_details?: {
1670
text_tokens?: number;
1671
audio_tokens?: number;
1672
};
1673
}
1674
```
1675
1676
[Response Configuration](./realtime.md#response-configuration)
1677
1678
### Transcription
1679
1680
Configure and receive audio transcription during conversations.
1681
1682
```typescript { .api }
1683
/**
1684
* Transcription configuration
1685
*/
1686
interface AudioTranscription {
1687
/** Language code (ISO-639-1) */
1688
language?: string;
1689
/** Transcription model */
1690
model?:
1691
| "whisper-1"
1692
| "gpt-4o-mini-transcribe"
1693
| "gpt-4o-transcribe"
1694
| "gpt-4o-transcribe-diarize";
1695
/** Guidance prompt */
1696
prompt?: string;
1697
}
1698
1699
/**
1700
* Transcription completed event
1701
*/
1702
interface ConversationItemInputAudioTranscriptionCompletedEvent {
1703
type: "conversation.item.input_audio_transcription.completed";
1704
event_id: string;
1705
item_id: string;
1706
content_index: number;
1707
/** Transcribed text */
1708
transcript: string;
1709
/** Usage statistics */
1710
usage:
1711
| {
1712
type: "tokens";
1713
input_tokens: number;
1714
output_tokens: number;
1715
total_tokens: number;
1716
input_token_details?: {
1717
text_tokens?: number;
1718
audio_tokens?: number;
1719
};
1720
}
1721
| {
1722
type: "duration";
1723
/** Duration in seconds */
1724
seconds: number;
1725
};
1726
/** Log probabilities (if enabled) */
1727
logprobs?: Array<{
1728
token: string;
1729
logprob: number;
1730
bytes: Array<number>;
1731
}> | null;
1732
}
1733
1734
/**
1735
* Transcription delta event (streaming)
1736
*/
1737
interface ConversationItemInputAudioTranscriptionDeltaEvent {
1738
type: "conversation.item.input_audio_transcription.delta";
1739
event_id: string;
1740
item_id: string;
1741
content_index?: number;
1742
/** Transcript chunk */
1743
delta?: string;
1744
/** Log probabilities (if enabled) */
1745
logprobs?: Array<{
1746
token: string;
1747
logprob: number;
1748
bytes: Array<number>;
1749
}> | null;
1750
}
1751
1752
/**
1753
* Transcription segment (for diarization)
1754
*/
1755
interface ConversationItemInputAudioTranscriptionSegment {
1756
type: "conversation.item.input_audio_transcription.segment";
1757
event_id: string;
1758
item_id: string;
1759
content_index: number;
1760
id: string;
1761
/** Segment text */
1762
text: string;
1763
/** Speaker label */
1764
speaker: string;
1765
/** Start time in seconds */
1766
start: number;
1767
/** End time in seconds */
1768
end: number;
1769
}
1770
1771
/**
1772
* Transcription failed event
1773
*/
1774
interface ConversationItemInputAudioTranscriptionFailedEvent {
1775
type: "conversation.item.input_audio_transcription.failed";
1776
event_id: string;
1777
item_id: string;
1778
content_index: number;
1779
error: {
1780
type?: string;
1781
code?: string;
1782
message?: string;
1783
param?: string;
1784
};
1785
}
1786
```
1787
1788
**Usage:**
1789
1790
```typescript
1791
// Enable transcription with log probabilities
1792
ws.send({
1793
type: "session.update",
1794
session: {
1795
input_audio_transcription: {
1796
model: "gpt-4o-transcribe",
1797
language: "en",
1798
},
1799
include: ["item.input_audio_transcription.logprobs"],
1800
},
1801
});
1802
1803
// Listen for transcription
1804
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
1805
console.log("Transcript delta:", event.delta);
1806
});
1807
1808
ws.on(
1809
"conversation.item.input_audio_transcription.completed",
1810
(event) => {
1811
console.log("Full transcript:", event.transcript);
1812
console.log("Usage:", event.usage);
1813
}
1814
);
1815
1816
// Diarization support
1817
ws.send({
1818
type: "session.update",
1819
session: {
1820
input_audio_transcription: {
1821
model: "gpt-4o-transcribe-diarize",
1822
},
1823
},
1824
});
1825
1826
ws.on(
1827
"conversation.item.input_audio_transcription.segment",
1828
(event) => {
1829
console.log(
1830
`[${event.speaker}] ${event.text} (${event.start}s - ${event.end}s)`
1831
);
1832
}
1833
);
1834
```
1835
1836
[Transcription](./realtime.md#transcription)
1837
1838
### Error Handling
1839
1840
Handle errors and edge cases in real-time conversations.
1841
1842
```typescript { .api }
1843
/**
1844
* Error event from server
1845
*/
1846
interface RealtimeErrorEvent {
1847
type: "error";
1848
event_id: string;
1849
error: RealtimeError;
1850
}
1851
1852
interface RealtimeError {
1853
/** Error type */
1854
type: string;
1855
/** Error code (optional) */
1856
code?: string | null;
1857
/** Human-readable message */
1858
message: string;
1859
/** Related parameter (optional) */
1860
param?: string | null;
1861
/** Client event ID that caused error (optional) */
1862
event_id?: string | null;
1863
}
1864
1865
/**
1866
* OpenAI Realtime error class
1867
*/
1868
class OpenAIRealtimeError extends Error {
1869
constructor(message: string);
1870
}
1871
```
1872
1873
**Common Error Types:**
1874
1875
```typescript
1876
// Invalid request errors
1877
{
1878
type: "invalid_request_error",
1879
code: "invalid_value",
1880
message: "Invalid value for 'audio_format'",
1881
param: "audio_format"
1882
}
1883
1884
// Server errors
1885
{
1886
type: "server_error",
1887
message: "Internal server error"
1888
}
1889
1890
// Rate limit errors
1891
{
1892
type: "rate_limit_error",
1893
message: "Rate limit exceeded"
1894
}
1895
```
1896
1897
**Usage:**
1898
1899
```typescript
1900
ws.on("error", (event: RealtimeErrorEvent) => {
1901
console.error("Realtime error:", event.error);
1902
1903
if (event.error.type === "rate_limit_error") {
1904
// Handle rate limiting
1905
} else if (event.error.type === "invalid_request_error") {
1906
// Handle validation errors
1907
console.error("Invalid:", event.error.param, event.error.message);
1908
}
1909
});
1910
1911
// WebSocket errors
1912
ws.socket.addEventListener("error", (error) => {
1913
console.error("WebSocket error:", error);
1914
});
1915
```
1916
1917
[Error Handling](./realtime.md#error-handling)
1918
1919
### Rate Limits
1920
1921
Monitor rate limits during conversations.
1922
1923
```typescript { .api }
1924
/**
1925
* Rate limits updated event
1926
*/
1927
interface RateLimitsUpdatedEvent {
1928
type: "rate_limits.updated";
1929
event_id: string;
1930
rate_limits: Array<{
1931
/** Rate limit name: 'requests' or 'tokens' */
1932
name?: "requests" | "tokens";
1933
/** Maximum allowed value */
1934
limit?: number;
1935
/** Remaining before limit reached */
1936
remaining?: number;
1937
/** Seconds until reset */
1938
reset_seconds?: number;
1939
}>;
1940
}
1941
```
1942
1943
**Usage:**
1944
1945
```typescript
1946
ws.on("rate_limits.updated", (event: RateLimitsUpdatedEvent) => {
1947
event.rate_limits.forEach((limit) => {
1948
console.log(`${limit.name}: ${limit.remaining}/${limit.limit}`);
1949
console.log(`Resets in ${limit.reset_seconds}s`);
1950
});
1951
});
1952
```
1953
1954
[Rate Limits](./realtime.md#rate-limits)
1955
1956
### Tracing
1957
1958
Configure distributed tracing for debugging and monitoring.
1959
1960
```typescript { .api }
1961
/**
1962
* Tracing configuration
1963
*/
1964
type RealtimeTracingConfig =
1965
| "auto"
1966
| {
1967
/** Workflow name in Traces Dashboard */
1968
workflow_name?: string;
1969
/** Group ID for filtering */
1970
group_id?: string;
1971
/** Arbitrary metadata */
1972
metadata?: unknown;
1973
}
1974
| null;
1975
```
1976
1977
**Usage:**
1978
1979
```typescript
1980
// Auto tracing with defaults
1981
{
1982
tracing: "auto";
1983
}
1984
1985
// Custom tracing configuration
1986
{
1987
tracing: {
1988
workflow_name: "customer-support-bot",
1989
group_id: "prod-us-west",
1990
metadata: {
1991
customer_id: "cust_123",
1992
agent_version: "2.1.0"
1993
}
1994
}
1995
}
1996
1997
// Disable tracing
1998
{
1999
tracing: null;
2000
}
2001
```
2002
2003
[Tracing](./realtime.md#tracing)
2004
2005
## Complete Example: Voice Assistant
2006
2007
```typescript
2008
import OpenAI from "openai";
2009
import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";
2010
2011
const client = new OpenAI();
2012
2013
// Create session token
2014
const secret = await client.realtime.clientSecrets.create({
2015
session: {
2016
type: "realtime",
2017
model: "gpt-realtime",
2018
audio: {
2019
input: {
2020
format: { type: "audio/pcm", rate: 24000 },
2021
turn_detection: {
2022
type: "server_vad",
2023
threshold: 0.5,
2024
silence_duration_ms: 500,
2025
interrupt_response: true,
2026
},
2027
transcription: {
2028
model: "gpt-4o-transcribe",
2029
},
2030
},
2031
output: {
2032
format: { type: "audio/pcm", rate: 24000 },
2033
voice: "marin",
2034
},
2035
},
2036
instructions:
2037
"You are a helpful voice assistant. Speak naturally and concisely.",
2038
tools: [
2039
{
2040
type: "function",
2041
name: "get_weather",
2042
description: "Get weather for a location",
2043
parameters: {
2044
type: "object",
2045
properties: {
2046
location: { type: "string" },
2047
},
2048
required: ["location"],
2049
},
2050
},
2051
],
2052
},
2053
});
2054
2055
// Connect WebSocket
2056
const ws = await OpenAIRealtimeWebSocket.create(client, {
2057
model: "gpt-realtime",
2058
});
2059
2060
// Handle session
2061
ws.on("session.created", (event) => {
2062
console.log("Session created:", event.session.id);
2063
});
2064
2065
// Handle conversation
2066
ws.on("conversation.item.created", (event) => {
2067
console.log("Item created:", event.item.type);
2068
});
2069
2070
// Handle audio output
2071
ws.on("response.audio.delta", (event) => {
2072
const audioData = Buffer.from(event.delta, "base64");
2073
playAudio(audioData); // Play to speaker
2074
});
2075
2076
// Handle transcripts
2077
ws.on("conversation.item.input_audio_transcription.completed", (event) => {
2078
console.log("User said:", event.transcript);
2079
});
2080
2081
ws.on("response.audio_transcript.delta", (event) => {
2082
process.stdout.write(event.delta);
2083
});
2084
2085
// Handle VAD
2086
ws.on("input_audio_buffer.speech_started", () => {
2087
console.log("User started speaking");
2088
stopAudioPlayback(); // Interrupt assistant
2089
});
2090
2091
ws.on("input_audio_buffer.speech_stopped", () => {
2092
console.log("User stopped speaking");
2093
});
2094
2095
// Handle function calls
2096
ws.on("response.function_call_arguments.done", async (event) => {
2097
console.log("Function call:", event.call_id);
2098
2099
const args = JSON.parse(event.arguments);
2100
const result = await getWeather(args.location);
2101
2102
// Send result
2103
ws.send({
2104
type: "conversation.item.create",
2105
item: {
2106
type: "function_call_output",
2107
call_id: event.call_id,
2108
output: JSON.stringify(result),
2109
},
2110
});
2111
2112
// Continue conversation
2113
ws.send({
2114
type: "response.create",
2115
});
2116
});
2117
2118
// Handle errors
2119
ws.on("error", (event) => {
2120
console.error("Error:", event.error.message);
2121
});
2122
2123
// Capture and send microphone audio
2124
const audioStream = captureMicrophone();
2125
audioStream.on("data", (chunk) => {
2126
const base64 = chunk.toString("base64");
2127
ws.send({
2128
type: "input_audio_buffer.append",
2129
audio: base64,
2130
});
2131
});
2132
2133
// Cleanup
2134
process.on("SIGINT", () => {
2135
ws.close();
2136
process.exit(0);
2137
});
2138
```
2139
2140
## Complete Example: Phone Call Handler
2141
2142
```typescript
2143
import OpenAI from "openai";
2144
import express from "express";
2145
2146
const client = new OpenAI();
2147
const app = express();
2148
2149
app.use(express.json());
2150
2151
// Webhook for incoming calls
2152
app.post("/realtime/webhook/incoming_call", async (req, res) => {
2153
const event = req.body;
2154
2155
if (event.type === "realtime.call.incoming") {
2156
const callId = event.data.id;
2157
2158
// Accept the call
2159
await client.realtime.calls.accept(callId, {
2160
type: "realtime",
2161
model: "gpt-realtime",
2162
instructions:
2163
"You are a customer service agent. Be professional and helpful.",
2164
audio: {
2165
input: {
2166
format: { type: "audio/pcmu" }, // G.711 for telephony
2167
turn_detection: {
2168
type: "server_vad",
2169
silence_duration_ms: 700,
2170
},
2171
},
2172
output: {
2173
format: { type: "audio/pcmu" },
2174
voice: "marin",
2175
},
2176
},
2177
tools: [
2178
{
2179
type: "function",
2180
name: "transfer_to_agent",
2181
description: "Transfer to human agent",
2182
parameters: {
2183
type: "object",
2184
properties: {
2185
reason: { type: "string" },
2186
},
2187
},
2188
},
2189
],
2190
});
2191
2192
console.log(`Accepted call: ${callId}`);
2193
}
2194
2195
res.sendStatus(200);
2196
});
2197
2198
// Webhook for call events
2199
app.post("/realtime/webhook/call_events", async (req, res) => {
2200
const event = req.body;
2201
2202
if (event.type === "realtime.response.function_call_output.done") {
2203
const { call_id, function_name, arguments: args } = event.data;
2204
2205
if (function_name === "transfer_to_agent") {
2206
// Transfer call
2207
await client.realtime.calls.refer(call_id, {
2208
target_uri: "sip:support@example.com",
2209
});
2210
}
2211
}
2212
2213
res.sendStatus(200);
2214
});
2215
2216
app.listen(3000, () => {
2217
console.log("Webhook server running on port 3000");
2218
});
2219
```
2220
2221
## Type Reference
2222
2223
### Core Types
2224
2225
```typescript { .api }
2226
type RealtimeClientEvent =
2227
| ConversationItemCreateEvent
2228
| ConversationItemDeleteEvent
2229
| ConversationItemRetrieveEvent
2230
| ConversationItemTruncateEvent
2231
| InputAudioBufferAppendEvent
2232
| InputAudioBufferClearEvent
2233
| OutputAudioBufferClearEvent
2234
| InputAudioBufferCommitEvent
2235
| ResponseCancelEvent
2236
| ResponseCreateEvent
2237
| SessionUpdateEvent;
2238
2239
type RealtimeServerEvent =
2240
| ConversationCreatedEvent
2241
| ConversationItemCreatedEvent
2242
| ConversationItemDeletedEvent
2243
| ConversationItemAdded
2244
| ConversationItemDone
2245
| ConversationItemRetrieved
2246
| ConversationItemTruncatedEvent
2247
| ConversationItemInputAudioTranscriptionCompletedEvent
2248
| ConversationItemInputAudioTranscriptionDeltaEvent
2249
| ConversationItemInputAudioTranscriptionFailedEvent
2250
| ConversationItemInputAudioTranscriptionSegment
2251
| InputAudioBufferClearedEvent
2252
| InputAudioBufferCommittedEvent
2253
| InputAudioBufferSpeechStartedEvent
2254
| InputAudioBufferSpeechStoppedEvent
2255
| InputAudioBufferTimeoutTriggered
2256
| OutputAudioBufferStarted
2257
| OutputAudioBufferStopped
2258
| OutputAudioBufferCleared
2259
| ResponseCreatedEvent
2260
| ResponseDoneEvent
2261
| ResponseOutputItemAddedEvent
2262
| ResponseOutputItemDoneEvent
2263
| ResponseContentPartAddedEvent
2264
| ResponseContentPartDoneEvent
2265
| ResponseAudioDeltaEvent
2266
| ResponseAudioDoneEvent
2267
| ResponseAudioTranscriptDeltaEvent
2268
| ResponseAudioTranscriptDoneEvent
2269
| ResponseTextDeltaEvent
2270
| ResponseTextDoneEvent
2271
| ResponseFunctionCallArgumentsDeltaEvent
2272
| ResponseFunctionCallArgumentsDoneEvent
2273
| ResponseMcpCallArgumentsDelta
2274
| ResponseMcpCallArgumentsDone
2275
| ResponseMcpCallInProgress
2276
| ResponseMcpCallCompleted
2277
| ResponseMcpCallFailed
2278
| McpListToolsInProgress
2279
| McpListToolsCompleted
2280
| McpListToolsFailed
2281
| SessionCreatedEvent
2282
| SessionUpdatedEvent
2283
| RateLimitsUpdatedEvent
2284
| RealtimeErrorEvent;
2285
2286
type ConversationItem =
2287
| RealtimeConversationItemSystemMessage
2288
| RealtimeConversationItemUserMessage
2289
| RealtimeConversationItemAssistantMessage
2290
| RealtimeConversationItemFunctionCall
2291
| RealtimeConversationItemFunctionCallOutput
2292
| RealtimeMcpApprovalResponse
2293
| RealtimeMcpListTools
2294
| RealtimeMcpToolCall
2295
| RealtimeMcpApprovalRequest;
2296
2297
interface RealtimeSession {
2298
id?: string;
2299
object?: "realtime.session";
2300
model?: string;
2301
expires_at?: number;
2302
modalities?: Array<"text" | "audio">;
2303
instructions?: string;
2304
voice?: string;
2305
input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
2306
output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
2307
input_audio_transcription?: AudioTranscription | null;
2308
turn_detection?: RealtimeAudioInputTurnDetection | null;
2309
tools?: Array<RealtimeFunctionTool>;
2310
tool_choice?: string;
2311
temperature?: number;
2312
max_response_output_tokens?: number | "inf";
2313
speed?: number;
2314
input_audio_noise_reduction?: {
2315
type?: NoiseReductionType;
2316
};
2317
include?: Array<"item.input_audio_transcription.logprobs"> | null;
2318
prompt?: ResponsePrompt | null;
2319
tracing?: RealtimeTracingConfig | null;
2320
truncation?: RealtimeTruncation;
2321
}
2322
2323
interface RealtimeResponse {
2324
id?: string;
2325
object?: "realtime.response";
2326
status?: RealtimeResponseStatus;
2327
conversation_id?: string;
2328
output?: Array<ConversationItem>;
2329
usage?: RealtimeResponseUsage;
2330
status_details?: {
2331
type?: "incomplete" | "failed" | "cancelled";
2332
reason?: string;
2333
error?: RealtimeError | null;
2334
} | null;
2335
max_output_tokens?: number | "inf";
2336
modalities?: Array<"text" | "audio">;
2337
instructions?: string;
2338
voice?: string;
2339
audio?: {
2340
format?: RealtimeAudioFormats;
2341
speed?: number;
2342
voice?: string;
2343
};
2344
metadata?: Record<string, string> | null;
2345
tool_choice?: RealtimeToolChoiceConfig;
2346
tools?: RealtimeToolsConfig;
2347
temperature?: number;
2348
}
2349
2350
interface AudioTranscription {
2351
language?: string;
2352
model?:
2353
| "whisper-1"
2354
| "gpt-4o-mini-transcribe"
2355
| "gpt-4o-transcribe"
2356
| "gpt-4o-transcribe-diarize";
2357
prompt?: string;
2358
}
2359
2360
type RealtimeAudioFormats =
2361
| { type?: "audio/pcm"; rate?: 24000 }
2362
| { type?: "audio/pcmu" }
2363
| { type?: "audio/pcma" };
2364
2365
type NoiseReductionType = "near_field" | "far_field";
2366
2367
type RealtimeAudioInputTurnDetection =
2368
| {
2369
type: "server_vad";
2370
threshold?: number;
2371
prefix_padding_ms?: number;
2372
silence_duration_ms?: number;
2373
create_response?: boolean;
2374
interrupt_response?: boolean;
2375
idle_timeout_ms?: number | null;
2376
}
2377
| {
2378
type: "semantic_vad";
2379
eagerness?: "low" | "medium" | "high" | "auto";
2380
create_response?: boolean;
2381
interrupt_response?: boolean;
2382
};
2383
2384
type RealtimeTruncation =
2385
| "auto"
2386
| "disabled"
2387
| { type: "retention_ratio"; retention_ratio: number };
2388
2389
type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;
2390
2391
type RealtimeToolChoiceConfig =
2392
| "auto"
2393
| "none"
2394
| "required"
2395
| { type: "function"; function: { name: string } }
2396
| { type: "mcp"; mcp: { server_label: string; name: string } };
2397
2398
type RealtimeTracingConfig =
2399
| "auto"
2400
| {
2401
workflow_name?: string;
2402
group_id?: string;
2403
metadata?: unknown;
2404
}
2405
| null;
2406
2407
interface RealtimeError {
2408
type: string;
2409
code?: string | null;
2410
message: string;
2411
param?: string | null;
2412
event_id?: string | null;
2413
}
2414
2415
interface RealtimeResponseUsage {
2416
total_tokens?: number;
2417
input_tokens?: number;
2418
output_tokens?: number;
2419
input_token_details?: {
2420
text_tokens?: number;
2421
audio_tokens?: number;
2422
image_tokens?: number;
2423
cached_tokens?: number;
2424
cached_tokens_details?: {
2425
text_tokens?: number;
2426
audio_tokens?: number;
2427
image_tokens?: number;
2428
};
2429
};
2430
output_token_details?: {
2431
text_tokens?: number;
2432
audio_tokens?: number;
2433
};
2434
}
2435
2436
interface RealtimeResponseStatus {
2437
type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";
2438
reason?: string;
2439
}
2440
```
2441
2442
## Models
2443
2444
Available Realtime API models:
2445
2446
- `gpt-realtime` (latest)
2447
- `gpt-realtime-2025-08-28`
2448
- `gpt-4o-realtime-preview`
2449
- `gpt-4o-realtime-preview-2024-10-01`
2450
- `gpt-4o-realtime-preview-2024-12-17`
2451
- `gpt-4o-realtime-preview-2025-06-03`
2452
- `gpt-4o-mini-realtime-preview`
2453
- `gpt-4o-mini-realtime-preview-2024-12-17`
2454
- `gpt-realtime-mini`
2455
- `gpt-realtime-mini-2025-10-06`
2456
- `gpt-audio-mini`
2457
- `gpt-audio-mini-2025-10-06`
2458
2459
## Best Practices
2460
2461
### Security
2462
2463
- **Never expose API keys in browser**: Always use ephemeral session tokens
2464
- **Token expiration**: Default 10 minutes, max 2 hours
2465
- **Server-side validation**: Validate all tool calls server-side
2466
- **Rate limiting**: Monitor rate limit events and handle gracefully
2467
2468
### Performance
2469
2470
- **Audio chunking**: Send audio in chunks (1-5 seconds recommended)
2471
- **VAD tuning**: Adjust threshold and silence duration for your environment
2472
- **Voice selection**: Use `marin` or `cedar` for best quality
2473
- **Caching**: Enable context caching for repeated conversations
2474
2475
### Audio Quality
2476
2477
- **Noise reduction**: Enable for far-field or noisy environments
2478
- **Sample rate**: Always use 24kHz for PCM audio
2479
- **Format selection**: Use G.711 (pcmu/pcma) for telephony, PCM for quality
2480
- **Interrupt handling**: Clear audio buffers on interruption
2481
2482
### Conversation Management
2483
2484
- **Context length**: Monitor token usage, configure truncation
2485
- **Function calling**: Keep tool outputs concise
2486
- **System messages**: Use for mid-conversation context updates
2487
- **Item ordering**: Use `previous_item_id` for precise insertion
2488
2489
### Error Handling
2490
2491
- **Graceful degradation**: Handle WebSocket disconnections
2492
- **Retry logic**: Implement exponential backoff for transient errors
2493
- **Error logging**: Log all error events for debugging
2494
- **User feedback**: Provide clear feedback on connection/processing status
2495
2496
## Common Patterns
2497
2498
### Voice-to-Voice Assistant
2499
2500
```typescript
2501
const ws = await OpenAIRealtimeWebSocket.create(client, {
2502
model: "gpt-realtime",
2503
});
2504
2505
// Microphone → Input Buffer
2506
micStream.on("data", (chunk) => {
2507
ws.send({
2508
type: "input_audio_buffer.append",
2509
audio: chunk.toString("base64"),
2510
});
2511
});
2512
2513
// Output Audio → Speaker
2514
ws.on("response.audio.delta", (event) => {
2515
playAudio(Buffer.from(event.delta, "base64"));
2516
});
2517
2518
// VAD-based interruption
2519
ws.on("input_audio_buffer.speech_started", () => {
2520
stopPlayback();
2521
});
2522
```
2523
2524
### Text-to-Voice Assistant
2525
2526
```typescript
2527
// Send text message
2528
ws.send({
2529
type: "conversation.item.create",
2530
item: {
2531
type: "message",
2532
role: "user",
2533
content: [{ type: "input_text", text: "Hello!" }],
2534
},
2535
});
2536
2537
// Request audio response
2538
ws.send({
2539
type: "response.create",
2540
response: {
2541
modalities: ["audio"],
2542
},
2543
});
2544
```
2545
2546
### Streaming Transcripts
2547
2548
```typescript
2549
ws.on("response.audio_transcript.delta", (event) => {
2550
updateSubtitles(event.delta);
2551
});
2552
2553
ws.on("conversation.item.input_audio_transcription.delta", (event) => {
2554
updateUserTranscript(event.delta);
2555
});
2556
```
2557
2558
### Multi-Tool Assistant
2559
2560
```typescript
2561
const tools = [
2562
{
2563
type: "function",
2564
name: "search_database",
2565
description: "Search customer database",
2566
parameters: {
2567
/* ... */
2568
},
2569
},
2570
{
2571
type: "mcp",
2572
server_label: "calendar",
2573
connector_id: "connector_googlecalendar",
2574
},
2575
];
2576
2577
ws.send({
2578
type: "session.update",
2579
session: { tools, tool_choice: "auto" },
2580
});
2581
```
2582