or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

assistants.mdaudio.mdbatches-evals.mdchat-completions.mdclient-configuration.mdcontainers.mdconversations.mdembeddings.mdfiles-uploads.mdfine-tuning.mdhelpers-audio.mdhelpers-zod.mdimages.mdindex.mdrealtime.mdresponses-api.mdvector-stores.mdvideos.md

realtime.mddocs/

0

# Realtime API

1

2

The Realtime API provides WebSocket-based real-time voice conversations with OpenAI models. It supports bidirectional audio streaming, server-side voice activity detection (VAD), function calling, and full conversation management. The API is designed for live voice applications including phone calls, voice assistants, and interactive conversational experiences.

3

4

## Package Information

5

6

- **Package Name**: openai

7

- **Package Type**: npm

8

- **Language**: TypeScript

9

- **Installation**: `npm install openai`

10

11

## API Status

12

13

The Realtime API is now generally available (GA) at `client.realtime.*`.

14

15

**Deprecation Notice**: The legacy beta Realtime API at `client.beta.realtime.*` is deprecated. If you are using the beta API, migrate to the GA API documented here. The beta API includes:

16

- `client.beta.realtime.sessions.create()` (deprecated - use `client.realtime.clientSecrets.create()` instead)

17

- `client.beta.realtime.transcriptionSessions.create()` (deprecated)

18

19

All new projects should use the GA Realtime API (`client.realtime.*`) documented on this page.

20

21

## Core Imports

22

23

```typescript

24

import OpenAI from "openai";

25

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket"; // Browser

26

import { OpenAIRealtimeWS } from "openai/realtime/ws"; // Node.js (requires 'ws' package)

27

```

28

29

## WebSocket Clients

30

31

The Realtime API provides two WebSocket client implementations for different runtime environments:

32

33

### OpenAIRealtimeWebSocket (Browser)

34

35

For browser environments, use `OpenAIRealtimeWebSocket` which uses the native browser WebSocket API.

36

37

```typescript

38

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

39

import OpenAI from "openai";

40

41

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

42

43

const ws = new OpenAIRealtimeWebSocket(

44

{

45

model: "gpt-realtime",

46

dangerouslyAllowBrowser: true, // Required for browser use

47

},

48

client

49

);

50

51

// Event handling

52

ws.on("session.created", (event) => {

53

console.log("Session started:", event.session.id);

54

});

55

56

ws.on("response.audio.delta", (event) => {

57

// Handle audio deltas - event.delta is base64 encoded audio

58

const audioData = atob(event.delta);

59

playAudio(audioData);

60

});

61

62

ws.on("error", (error) => {

63

console.error("WebSocket error:", error);

64

});

65

66

// Send audio to the server

67

function sendAudio(audioData: ArrayBuffer) {

68

const base64Audio = btoa(String.fromCharCode(...new Uint8Array(audioData)));

69

ws.send({

70

type: "input_audio_buffer.append",

71

audio: base64Audio,

72

});

73

}

74

75

// Commit audio buffer to trigger processing

76

ws.send({

77

type: "input_audio_buffer.commit",

78

});

79

80

// Close connection

81

ws.close();

82

```

83

84

**Key features:**

85

- Uses native browser WebSocket API

86

- Requires `dangerouslyAllowBrowser: true` in configuration

87

- Audio must be base64 encoded

88

- Automatic reconnection handling

89

- Built-in event emitter for all realtime events

90

91

### OpenAIRealtimeWS (Node.js)

92

93

For Node.js environments, use `OpenAIRealtimeWS` which uses the `ws` package for WebSocket support.

94

95

```typescript

96

import { OpenAIRealtimeWS } from "openai/realtime/ws";

97

import OpenAI from "openai";

98

import fs from "fs";

99

100

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

101

102

const ws = new OpenAIRealtimeWS(

103

{

104

model: "gpt-realtime",

105

},

106

client

107

);

108

109

// Event handling (same interface as browser version)

110

ws.on("session.created", (event) => {

111

console.log("Session started:", event.session.id);

112

});

113

114

ws.on("response.audio.delta", (event) => {

115

// Handle audio deltas

116

const audioBuffer = Buffer.from(event.delta, "base64");

117

// Write to file or stream to audio output

118

fs.appendFileSync("output.pcm", audioBuffer);

119

});

120

121

ws.on("response.done", (event) => {

122

console.log("Response complete:", event.response.id);

123

});

124

125

// Send audio from file or buffer

126

function sendAudioFromFile(filePath: string) {

127

const audioBuffer = fs.readFileSync(filePath);

128

const base64Audio = audioBuffer.toString("base64");

129

130

ws.send({

131

type: "input_audio_buffer.append",

132

audio: base64Audio,

133

});

134

}

135

136

// Trigger response generation

137

ws.send({

138

type: "input_audio_buffer.commit",

139

});

140

141

// Close connection

142

ws.close();

143

```

144

145

**Key features:**

146

- Uses `ws` package for WebSocket support (add to dependencies: `npm install ws @types/ws`)

147

- Same event interface as browser version for consistency

148

- Better Node.js stream integration

149

- Automatic reconnection handling

150

- Suitable for server-side applications

151

152

### Common Event Patterns

153

154

Both WebSocket clients support the same event handling interface:

155

156

```typescript

157

// Connection events

158

ws.on("session.created", (event) => { /* Session initialization */ });

159

ws.on("session.updated", (event) => { /* Session configuration changed */ });

160

161

// Conversation events

162

ws.on("conversation.created", (event) => { /* New conversation */ });

163

ws.on("conversation.item.created", (event) => { /* New item added */ });

164

ws.on("conversation.item.deleted", (event) => { /* Item removed */ });

165

166

// Audio events (streaming)

167

ws.on("response.audio.delta", (event) => { /* Audio chunk received */ });

168

ws.on("response.audio.done", (event) => { /* Audio complete */ });

169

ws.on("response.audio_transcript.delta", (event) => { /* Transcript chunk */ });

170

ws.on("response.audio_transcript.done", (event) => { /* Transcript complete */ });

171

172

// Response events

173

ws.on("response.created", (event) => { /* Response started */ });

174

ws.on("response.done", (event) => { /* Response complete */ });

175

ws.on("response.cancelled", (event) => { /* Response cancelled */ });

176

ws.on("response.failed", (event) => { /* Response failed */ });

177

178

// Function calling events

179

ws.on("response.function_call_arguments.delta", (event) => { /* Function args streaming */ });

180

ws.on("response.function_call_arguments.done", (event) => { /* Function args complete */ });

181

182

// Error events

183

ws.on("error", (error) => { /* WebSocket or API error */ });

184

ws.on("close", (event) => { /* Connection closed */ });

185

```

186

187

### Sending Commands

188

189

Both clients use the same `.send()` method for sending commands:

190

191

```typescript

192

// Append audio to input buffer

193

ws.send({

194

type: "input_audio_buffer.append",

195

audio: base64AudioString,

196

});

197

198

// Commit audio buffer (triggers VAD or manual processing)

199

ws.send({

200

type: "input_audio_buffer.commit",

201

});

202

203

// Clear audio buffer

204

ws.send({

205

type: "input_audio_buffer.clear",

206

});

207

208

// Update session configuration

209

ws.send({

210

type: "session.update",

211

session: {

212

instructions: "You are a helpful assistant.",

213

turn_detection: { type: "server_vad" },

214

},

215

});

216

217

// Create conversation item (text message)

218

ws.send({

219

type: "conversation.item.create",

220

item: {

221

type: "message",

222

role: "user",

223

content: [{ type: "input_text", text: "Hello!" }],

224

},

225

});

226

227

// Trigger response generation

228

ws.send({

229

type: "response.create",

230

response: {

231

modalities: ["text", "audio"],

232

instructions: "Respond briefly.",

233

},

234

});

235

236

// Cancel in-progress response

237

ws.send({

238

type: "response.cancel",

239

});

240

```

241

242

### Connection Lifecycle

243

244

Both clients handle connection lifecycle automatically:

245

246

```typescript

247

const ws = new OpenAIRealtimeWS({ model: "gpt-realtime" }, client);

248

249

// Connection opens automatically

250

ws.on("session.created", (event) => {

251

console.log("Connected and ready");

252

});

253

254

// Handle disconnections

255

ws.on("close", (event) => {

256

console.log("Connection closed:", event.code, event.reason);

257

});

258

259

// Handle errors

260

ws.on("error", (error) => {

261

console.error("Connection error:", error);

262

});

263

264

// Manually close connection

265

ws.close();

266

```

267

268

## Basic Usage

269

270

### Creating a Session Token

271

272

```typescript

273

import OpenAI from "openai";

274

275

const client = new OpenAI({

276

apiKey: process.env.OPENAI_API_KEY,

277

});

278

279

// Create an ephemeral session token for client-side use

280

const response = await client.realtime.clientSecrets.create({

281

session: {

282

type: "realtime",

283

model: "gpt-realtime",

284

audio: {

285

input: {

286

format: { type: "audio/pcm", rate: 24000 },

287

turn_detection: {

288

type: "server_vad",

289

threshold: 0.5,

290

silence_duration_ms: 500,

291

},

292

},

293

output: {

294

format: { type: "audio/pcm", rate: 24000 },

295

voice: "marin",

296

},

297

},

298

},

299

});

300

301

const sessionToken = response.value;

302

```

303

304

### Connecting via WebSocket

305

306

```typescript

307

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

308

309

const ws = new OpenAIRealtimeWebSocket(

310

{

311

model: "gpt-realtime",

312

dangerouslyAllowBrowser: false,

313

},

314

client

315

);

316

317

// Listen for events

318

ws.on("session.created", (event) => {

319

console.log("Session created:", event);

320

});

321

322

ws.on("conversation.item.created", (event) => {

323

console.log("Item created:", event.item);

324

});

325

326

ws.on("response.audio.delta", (event) => {

327

// Handle audio delta

328

const audioData = Buffer.from(event.delta, "base64");

329

playAudio(audioData);

330

});

331

332

// Send audio

333

ws.send({

334

type: "input_audio_buffer.append",

335

audio: audioBase64String,

336

});

337

338

// Commit audio buffer

339

ws.send({

340

type: "input_audio_buffer.commit",

341

});

342

```

343

344

## Architecture

345

346

The Realtime API operates through a WebSocket connection with an event-driven architecture:

347

348

- **Session Management**: Create ephemeral tokens server-side, connect from client

349

- **Audio Streaming**: Bidirectional PCM16/G.711 audio at 24kHz

350

- **Event System**: 50+ client-to-server and server-to-client events

351

- **VAD Integration**: Server-side voice activity detection with configurable parameters

352

- **Conversation Context**: Automatic conversation history management

353

- **Function Calling**: Real-time tool execution during conversations

354

- **Phone Integration**: SIP/WebRTC support for phone calls

355

356

## Capabilities

357

358

### Session Token Creation

359

360

Generate ephemeral session tokens for secure client-side WebSocket connections.

361

362

```typescript { .api }

363

/**

364

* Create a Realtime client secret with an associated session configuration.

365

* Returns an ephemeral token with 1-minute default TTL (configurable up to 2 hours).

366

*/

367

function create(

368

params: ClientSecretCreateParams

369

): Promise<ClientSecretCreateResponse>;

370

371

interface ClientSecretCreateParams {

372

/** Configuration for the client secret expiration */

373

expires_after?: {

374

/** Anchor point for expiration (only 'created_at' is supported) */

375

anchor?: "created_at";

376

/** Seconds from anchor to expiration (10-7200, defaults to 600) */

377

seconds?: number;

378

};

379

/** Session configuration (realtime or transcription session) */

380

session?:

381

| RealtimeSessionCreateRequest

382

| RealtimeTranscriptionSessionCreateRequest;

383

}

384

385

interface ClientSecretCreateResponse {

386

/** Expiration timestamp in seconds since epoch */

387

expires_at: number;

388

/** The session configuration */

389

session: RealtimeSessionCreateResponse | RealtimeTranscriptionSessionCreateResponse;

390

/** The generated client secret value */

391

value: string;

392

}

393

394

interface RealtimeSessionCreateResponse {

395

/** Ephemeral key for client environments */

396

client_secret: {

397

expires_at: number;

398

value: string;

399

};

400

/** Session type: always 'realtime' */

401

type: "realtime";

402

/** Audio configuration */

403

audio?: {

404

input?: {

405

format?: RealtimeAudioFormats;

406

noise_reduction?: { type?: NoiseReductionType };

407

transcription?: AudioTranscription;

408

turn_detection?: ServerVad | SemanticVad | null;

409

};

410

output?: {

411

format?: RealtimeAudioFormats;

412

speed?: number;

413

voice?: string;

414

};

415

};

416

/** Fields to include in server outputs */

417

include?: Array<"item.input_audio_transcription.logprobs">;

418

/** System instructions for the model */

419

instructions?: string;

420

/** Max output tokens (1-4096 or 'inf') */

421

max_output_tokens?: number | "inf";

422

/** Realtime model to use */

423

model?: string;

424

/** Output modalities ('text' | 'audio') */

425

output_modalities?: Array<"text" | "audio">;

426

/** Prompt template reference */

427

prompt?: ResponsePrompt | null;

428

/** Tool choice configuration */

429

tool_choice?: ToolChoiceOptions | ToolChoiceFunction | ToolChoiceMcp;

430

/** Available tools */

431

tools?: Array<RealtimeFunctionTool | McpTool>;

432

/** Tracing configuration */

433

tracing?: "auto" | TracingConfiguration | null;

434

/** Truncation behavior */

435

truncation?: RealtimeTruncation;

436

}

437

```

438

439

[Session Token Creation](./realtime.md#session-token-creation)

440

441

### SIP Call Management

442

443

Manage incoming and outgoing SIP/WebRTC calls with the Realtime API.

444

445

```typescript { .api }

446

/**

447

* Accept an incoming SIP call and configure the realtime session that will handle it

448

*/

449

function accept(

450

callID: string,

451

params: CallAcceptParams,

452

options?: RequestOptions

453

): Promise<void>;

454

455

/**

456

* End an active Realtime API call, whether it was initiated over SIP or WebRTC

457

*/

458

function hangup(

459

callID: string,

460

options?: RequestOptions

461

): Promise<void>;

462

463

/**

464

* Transfer an active SIP call to a new destination using the SIP REFER verb

465

*/

466

function refer(

467

callID: string,

468

params: CallReferParams,

469

options?: RequestOptions

470

): Promise<void>;

471

472

/**

473

* Decline an incoming SIP call by returning a SIP status code to the caller

474

*/

475

function reject(

476

callID: string,

477

params?: CallRejectParams,

478

options?: RequestOptions

479

): Promise<void>;

480

481

interface CallAcceptParams {

482

/** The type of session to create. Always 'realtime' for the Realtime API */

483

type: "realtime";

484

/** Configuration for input and output audio */

485

audio?: RealtimeAudioConfig;

486

/** Additional fields to include in server outputs */

487

include?: Array<"item.input_audio_transcription.logprobs">;

488

/** The default system instructions prepended to model calls */

489

instructions?: string;

490

/** Maximum number of output tokens for a single assistant response (1-4096 or 'inf') */

491

max_output_tokens?: number | "inf";

492

/** The Realtime model used for this session */

493

model?: string;

494

/** The set of modalities the model can respond with */

495

output_modalities?: Array<"text" | "audio">;

496

/** Reference to a prompt template and its variables */

497

prompt?: ResponsePrompt | null;

498

/** How the model chooses tools */

499

tool_choice?: RealtimeToolChoiceConfig;

500

/** Tools available to the model */

501

tools?: RealtimeToolsConfig;

502

/** Tracing configuration for the session */

503

tracing?: RealtimeTracingConfig | null;

504

/** Truncation behavior when conversation exceeds token limits */

505

truncation?: RealtimeTruncation;

506

}

507

508

interface CallReferParams {

509

/** URI that should appear in the SIP Refer-To header (e.g., 'tel:+14155550123' or 'sip:agent@example.com') */

510

target_uri: string;

511

}

512

513

interface CallRejectParams {

514

/** SIP response code to send back to the caller. Defaults to 603 (Decline) when omitted */

515

status_code?: number;

516

}

517

```

518

519

**Available at:** `client.realtime.calls`

520

521

**Usage Example:**

522

523

```typescript

524

import OpenAI from "openai";

525

526

const client = new OpenAI({

527

apiKey: process.env.OPENAI_API_KEY,

528

});

529

530

// Accept incoming call

531

await client.realtime.calls.accept("call-123", {

532

type: "realtime",

533

model: "gpt-realtime",

534

audio: {

535

input: { format: { type: "audio/pcm", rate: 24000 } },

536

output: { format: { type: "audio/pcm", rate: 24000 }, voice: "marin" },

537

},

538

instructions: "You are a helpful phone assistant.",

539

});

540

541

// Hang up call

542

await client.realtime.calls.hangup("call-123");

543

544

// Reject incoming call

545

await client.realtime.calls.reject("call-123", {

546

status_code: 603, // Decline

547

});

548

549

// Transfer call

550

await client.realtime.calls.refer("call-123", {

551

target_uri: "tel:+14155550123",

552

});

553

```

554

555

### WebSocket Connection

556

557

Connect to the Realtime API using WebSocket with the OpenAIRealtimeWebSocket class.

558

559

```typescript { .api }

560

/**

561

* WebSocket client for the Realtime API. Handles connection lifecycle,

562

* event streaming, and message sending.

563

*/

564

class OpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {

565

url: URL;

566

socket: WebSocket;

567

568

constructor(

569

props: {

570

model: string;

571

dangerouslyAllowBrowser?: boolean;

572

onURL?: (url: URL) => void;

573

__resolvedApiKey?: boolean;

574

},

575

client?: Pick<OpenAI, "apiKey" | "baseURL">

576

);

577

578

/**

579

* Factory method that resolves API key before connecting

580

*/

581

static create(

582

client: Pick<OpenAI, "apiKey" | "baseURL" | "_callApiKey">,

583

props: { model: string; dangerouslyAllowBrowser?: boolean }

584

): Promise<OpenAIRealtimeWebSocket>;

585

586

/**

587

* Factory method for Azure OpenAI connections

588

*/

589

static azure(

590

client: Pick<

591

AzureOpenAI,

592

"_callApiKey" | "apiVersion" | "apiKey" | "baseURL" | "deploymentName"

593

>,

594

options?: {

595

deploymentName?: string;

596

dangerouslyAllowBrowser?: boolean;

597

}

598

): Promise<OpenAIRealtimeWebSocket>;

599

600

/**

601

* Send a client event to the server

602

*/

603

send(event: RealtimeClientEvent): void;

604

605

/**

606

* Close the WebSocket connection

607

*/

608

close(props?: { code: number; reason: string }): void;

609

610

/**

611

* Register event listener

612

*/

613

on(event: string, listener: (event: any) => void): void;

614

}

615

```

616

617

**Usage:**

618

619

```typescript

620

// Standard connection

621

const ws = await OpenAIRealtimeWebSocket.create(client, {

622

model: "gpt-realtime",

623

});

624

625

// Azure connection

626

const wsAzure = await OpenAIRealtimeWebSocket.azure(azureClient, {

627

deploymentName: "my-realtime-deployment",

628

});

629

```

630

631

[WebSocket Connection](./realtime.md#websocket-connection)

632

633

### Phone Call Methods

634

635

Accept, reject, transfer, and hang up phone calls via SIP integration.

636

637

```typescript { .api }

638

/**

639

* Accept an incoming SIP call and configure the realtime session

640

*/

641

function accept(callID: string, params: CallAcceptParams): Promise<void>;

642

643

/**

644

* End an active Realtime API call (SIP or WebRTC)

645

*/

646

function hangup(callID: string): Promise<void>;

647

648

/**

649

* Transfer an active SIP call to a new destination using SIP REFER

650

*/

651

function refer(callID: string, params: CallReferParams): Promise<void>;

652

653

/**

654

* Decline an incoming SIP call with a SIP status code

655

*/

656

function reject(

657

callID: string,

658

params?: CallRejectParams

659

): Promise<void>;

660

661

interface CallAcceptParams {

662

type: "realtime";

663

audio?: RealtimeAudioConfig;

664

include?: Array<"item.input_audio_transcription.logprobs">;

665

instructions?: string;

666

max_output_tokens?: number | "inf";

667

model?: string;

668

output_modalities?: Array<"text" | "audio">;

669

prompt?: ResponsePrompt | null;

670

tool_choice?: RealtimeToolChoiceConfig;

671

tools?: RealtimeToolsConfig;

672

tracing?: RealtimeTracingConfig | null;

673

truncation?: RealtimeTruncation;

674

}

675

676

interface CallReferParams {

677

/** URI in SIP Refer-To header (e.g., 'tel:+14155550123') */

678

target_uri: string;

679

}

680

681

interface CallRejectParams {

682

/** SIP response code (defaults to 603 Decline) */

683

status_code?: number;

684

}

685

```

686

687

**Usage:**

688

689

```typescript

690

// Accept incoming call

691

await client.realtime.calls.accept("call_abc123", {

692

type: "realtime",

693

model: "gpt-realtime",

694

instructions: "You are a helpful assistant on a phone call.",

695

audio: {

696

output: { voice: "marin" },

697

},

698

});

699

700

// Transfer call

701

await client.realtime.calls.refer("call_abc123", {

702

target_uri: "tel:+14155550199",

703

});

704

705

// Reject call

706

await client.realtime.calls.reject("call_abc123", {

707

status_code: 486, // Busy Here

708

});

709

710

// Hang up

711

await client.realtime.calls.hangup("call_abc123");

712

```

713

714

[Phone Call Methods](./realtime.md#phone-call-methods)

715

716

### Session Configuration

717

718

Configure session parameters including audio formats, VAD, and model settings.

719

720

```typescript { .api }

721

interface RealtimeSession {

722

id?: string;

723

expires_at?: number;

724

/** Fields to include in server outputs */

725

include?: Array<"item.input_audio_transcription.logprobs"> | null;

726

/** Input audio format: 'pcm16', 'g711_ulaw', or 'g711_alaw' */

727

input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";

728

/** Noise reduction configuration */

729

input_audio_noise_reduction?: {

730

type?: NoiseReductionType;

731

};

732

/** Transcription configuration */

733

input_audio_transcription?: AudioTranscription | null;

734

/** System instructions */

735

instructions?: string;

736

/** Max output tokens per response */

737

max_response_output_tokens?: number | "inf";

738

/** Response modalities */

739

modalities?: Array<"text" | "audio">;

740

/** Model identifier */

741

model?: string;

742

object?: "realtime.session";

743

/** Output audio format */

744

output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";

745

/** Prompt template reference */

746

prompt?: ResponsePrompt | null;

747

/** Audio playback speed (0.25-1.5) */

748

speed?: number;

749

/** Sampling temperature (0.6-1.2) */

750

temperature?: number;

751

/** Tool choice mode */

752

tool_choice?: string;

753

/** Available tools */

754

tools?: Array<RealtimeFunctionTool>;

755

/** Tracing configuration */

756

tracing?: "auto" | TracingConfiguration | null;

757

/** Turn detection configuration */

758

turn_detection?: RealtimeAudioInputTurnDetection | null;

759

/** Truncation behavior */

760

truncation?: RealtimeTruncation;

761

/** Output voice */

762

voice?: string;

763

}

764

765

interface AudioTranscription {

766

/** Language code (ISO-639-1, e.g., 'en') */

767

language?: string;

768

/** Transcription model */

769

model?:

770

| "whisper-1"

771

| "gpt-4o-mini-transcribe"

772

| "gpt-4o-transcribe"

773

| "gpt-4o-transcribe-diarize";

774

/** Transcription guidance prompt */

775

prompt?: string;

776

}

777

778

type NoiseReductionType = "near_field" | "far_field";

779

780

type RealtimeTruncation =

781

| "auto"

782

| "disabled"

783

| {

784

type: "retention_ratio";

785

/** Fraction of max context to retain (0.0-1.0) */

786

retention_ratio: number;

787

};

788

```

789

790

[Session Configuration](./realtime.md#session-configuration)

791

792

### Turn Detection (VAD)

793

794

Configure voice activity detection for automatic turn taking.

795

796

```typescript { .api }

797

/**

798

* Server VAD: Simple volume-based voice activity detection

799

*/

800

interface ServerVad {

801

type: "server_vad";

802

/** Auto-generate response on VAD stop */

803

create_response?: boolean;

804

/** Timeout for prompting user to continue (ms) */

805

idle_timeout_ms?: number | null;

806

/** Auto-interrupt on VAD start */

807

interrupt_response?: boolean;

808

/** Audio prefix padding (ms, default: 300) */

809

prefix_padding_ms?: number;

810

/** Silence duration to detect stop (ms, default: 500) */

811

silence_duration_ms?: number;

812

/** VAD activation threshold (0.0-1.0, default: 0.5) */

813

threshold?: number;

814

}

815

816

/**

817

* Semantic VAD: Model-based turn detection with dynamic timeouts

818

*/

819

interface SemanticVad {

820

type: "semantic_vad";

821

/** Auto-generate response on VAD stop */

822

create_response?: boolean;

823

/** Eagerness: 'low' (8s), 'medium' (4s), 'high' (2s), 'auto' */

824

eagerness?: "low" | "medium" | "high" | "auto";

825

/** Auto-interrupt on VAD start */

826

interrupt_response?: boolean;

827

}

828

829

type RealtimeAudioInputTurnDetection = ServerVad | SemanticVad;

830

```

831

832

**Usage:**

833

834

```typescript

835

// Server VAD with custom settings

836

{

837

type: "server_vad",

838

threshold: 0.6,

839

silence_duration_ms: 700,

840

prefix_padding_ms: 300,

841

interrupt_response: true,

842

create_response: true,

843

idle_timeout_ms: 30000

844

}

845

846

// Semantic VAD for natural conversations

847

{

848

type: "semantic_vad",

849

eagerness: "medium",

850

interrupt_response: true,

851

create_response: true

852

}

853

854

// Manual turn detection (no VAD)

855

{

856

turn_detection: null

857

}

858

```

859

860

[Turn Detection](./realtime.md#turn-detection-vad)

861

862

### Audio Formats

863

864

Configure input and output audio formats for the session.

865

866

```typescript { .api }

867

/**

868

* PCM 16-bit audio at 24kHz sample rate

869

*/

870

interface AudioPCM {

871

type?: "audio/pcm";

872

rate?: 24000;

873

}

874

875

/**

876

* G.711 μ-law format (commonly used in telephony)

877

*/

878

interface AudioPCMU {

879

type?: "audio/pcmu";

880

}

881

882

/**

883

* G.711 A-law format (commonly used in telephony)

884

*/

885

interface AudioPCMA {

886

type?: "audio/pcma";

887

}

888

889

type RealtimeAudioFormats = AudioPCM | AudioPCMU | AudioPCMA;

890

891

interface RealtimeAudioConfig {

892

input?: {

893

format?: RealtimeAudioFormats;

894

noise_reduction?: { type?: NoiseReductionType };

895

transcription?: AudioTranscription;

896

turn_detection?: RealtimeAudioInputTurnDetection | null;

897

};

898

output?: {

899

format?: RealtimeAudioFormats;

900

/** Playback speed multiplier (0.25-1.5) */

901

speed?: number;

902

/** Voice selection */

903

voice?:

904

| string

905

| "alloy"

906

| "ash"

907

| "ballad"

908

| "coral"

909

| "echo"

910

| "sage"

911

| "shimmer"

912

| "verse"

913

| "marin"

914

| "cedar";

915

};

916

}

917

```

918

919

[Audio Formats](./realtime.md#audio-formats)

920

921

### Client-to-Server Events

922

923

Events sent from client to server to control the conversation.

924

925

```typescript { .api }

926

/**

927

* Union of all client events

928

*/

929

type RealtimeClientEvent =

930

| ConversationItemCreateEvent

931

| ConversationItemDeleteEvent

932

| ConversationItemRetrieveEvent

933

| ConversationItemTruncateEvent

934

| InputAudioBufferAppendEvent

935

| InputAudioBufferClearEvent

936

| OutputAudioBufferClearEvent

937

| InputAudioBufferCommitEvent

938

| ResponseCancelEvent

939

| ResponseCreateEvent

940

| SessionUpdateEvent;

941

942

/**

943

* Add conversation item (message, function call, or output)

944

*/

945

interface ConversationItemCreateEvent {

946

type: "conversation.item.create";

947

item: ConversationItem;

948

event_id?: string;

949

/** Insert after this item ID ('root' for beginning) */

950

previous_item_id?: string;

951

}

952

953

/**

954

* Delete conversation item by ID

955

*/

956

interface ConversationItemDeleteEvent {

957

type: "conversation.item.delete";

958

item_id: string;

959

event_id?: string;

960

}

961

962

/**

963

* Retrieve full item including audio data

964

*/

965

interface ConversationItemRetrieveEvent {

966

type: "conversation.item.retrieve";

967

item_id: string;

968

event_id?: string;

969

}

970

971

/**

972

* Truncate assistant audio message

973

*/

974

interface ConversationItemTruncateEvent {

975

type: "conversation.item.truncate";

976

item_id: string;

977

content_index: number;

978

/** Duration to keep in milliseconds */

979

audio_end_ms: number;

980

event_id?: string;

981

}

982

983

/**

984

* Append audio to input buffer

985

*/

986

interface InputAudioBufferAppendEvent {

987

type: "input_audio_buffer.append";

988

/** Base64-encoded audio bytes */

989

audio: string;

990

event_id?: string;

991

}

992

993

/**

994

* Clear input audio buffer

995

*/

996

interface InputAudioBufferClearEvent {

997

type: "input_audio_buffer.clear";

998

event_id?: string;

999

}

1000

1001

/**

1002

* Commit input audio buffer to conversation

1003

*/

1004

interface InputAudioBufferCommitEvent {

1005

type: "input_audio_buffer.commit";

1006

event_id?: string;

1007

}

1008

1009

/**

1010

* WebRTC only: Clear output audio buffer

1011

*/

1012

interface OutputAudioBufferClearEvent {

1013

type: "output_audio_buffer.clear";

1014

event_id?: string;

1015

}

1016

1017

/**

1018

* Cancel in-progress response

1019

*/

1020

interface ResponseCancelEvent {

1021

type: "response.cancel";

1022

event_id?: string;

1023

}

1024

1025

/**

1026

* Request model response

1027

*/

1028

interface ResponseCreateEvent {

1029

type: "response.create";

1030

response?: {

1031

modalities?: Array<"text" | "audio">;

1032

instructions?: string;

1033

voice?: string;

1034

output_audio_format?: string;

1035

tools?: Array<RealtimeFunctionTool>;

1036

tool_choice?: string;

1037

temperature?: number;

1038

max_output_tokens?: number | "inf";

1039

conversation?: "auto" | "none";

1040

metadata?: Record<string, string>;

1041

input?: Array<ConversationItemWithReference>;

1042

};

1043

event_id?: string;

1044

}

1045

1046

/**

1047

* Update session configuration

1048

*/

1049

interface SessionUpdateEvent {

1050

type: "session.update";

1051

session: Partial<RealtimeSession>;

1052

event_id?: string;

1053

}

1054

```

1055

1056

[Client Events](./realtime.md#client-to-server-events)

1057

1058

### Server-to-Client Events

1059

1060

Events sent from server to client during the conversation.

1061

1062

```typescript { .api }

1063

/**

1064

* Union of all server events (50+ event types)

1065

*/

1066

type RealtimeServerEvent =

1067

| ConversationCreatedEvent

1068

| ConversationItemCreatedEvent

1069

| ConversationItemDeletedEvent

1070

| ConversationItemAdded

1071

| ConversationItemDone

1072

| ConversationItemRetrieved

1073

| ConversationItemTruncatedEvent

1074

| ConversationItemInputAudioTranscriptionCompletedEvent

1075

| ConversationItemInputAudioTranscriptionDeltaEvent

1076

| ConversationItemInputAudioTranscriptionFailedEvent

1077

| ConversationItemInputAudioTranscriptionSegment

1078

| InputAudioBufferClearedEvent

1079

| InputAudioBufferCommittedEvent

1080

| InputAudioBufferSpeechStartedEvent

1081

| InputAudioBufferSpeechStoppedEvent

1082

| InputAudioBufferTimeoutTriggered

1083

| OutputAudioBufferStarted

1084

| OutputAudioBufferStopped

1085

| OutputAudioBufferCleared

1086

| ResponseCreatedEvent

1087

| ResponseDoneEvent

1088

| ResponseOutputItemAddedEvent

1089

| ResponseOutputItemDoneEvent

1090

| ResponseContentPartAddedEvent

1091

| ResponseContentPartDoneEvent

1092

| ResponseAudioDeltaEvent

1093

| ResponseAudioDoneEvent

1094

| ResponseAudioTranscriptDeltaEvent

1095

| ResponseAudioTranscriptDoneEvent

1096

| ResponseTextDeltaEvent

1097

| ResponseTextDoneEvent

1098

| ResponseFunctionCallArgumentsDeltaEvent

1099

| ResponseFunctionCallArgumentsDoneEvent

1100

| ResponseMcpCallArgumentsDelta

1101

| ResponseMcpCallArgumentsDone

1102

| ResponseMcpCallInProgress

1103

| ResponseMcpCallCompleted

1104

| ResponseMcpCallFailed

1105

| McpListToolsInProgress

1106

| McpListToolsCompleted

1107

| McpListToolsFailed

1108

| SessionCreatedEvent

1109

| SessionUpdatedEvent

1110

| RateLimitsUpdatedEvent

1111

| RealtimeErrorEvent;

1112

1113

/**

1114

* Session created (first event after connection)

1115

*/

1116

interface SessionCreatedEvent {

1117

type: "session.created";

1118

event_id: string;

1119

session: RealtimeSession;

1120

}

1121

1122

/**

1123

* Session updated after client session.update

1124

*/

1125

interface SessionUpdatedEvent {

1126

type: "session.updated";

1127

event_id: string;

1128

session: RealtimeSession;

1129

}

1130

1131

/**

1132

* Conversation created

1133

*/

1134

interface ConversationCreatedEvent {

1135

type: "conversation.created";

1136

event_id: string;

1137

conversation: {

1138

id?: string;

1139

object?: "realtime.conversation";

1140

};

1141

}

1142

1143

/**

1144

* Item created in conversation

1145

*/

1146

interface ConversationItemCreatedEvent {

1147

type: "conversation.item.created";

1148

event_id: string;

1149

item: ConversationItem;

1150

previous_item_id?: string | null;

1151

}

1152

1153

/**

1154

* Item added to conversation (may have partial content)

1155

*/

1156

interface ConversationItemAdded {

1157

type: "conversation.item.added";

1158

event_id: string;

1159

item: ConversationItem;

1160

previous_item_id?: string | null;

1161

}

1162

1163

/**

1164

* Item finalized with complete content

1165

*/

1166

interface ConversationItemDone {

1167

type: "conversation.item.done";

1168

event_id: string;

1169

item: ConversationItem;

1170

previous_item_id?: string | null;

1171

}

1172

1173

/**

1174

* Input audio buffer committed

1175

*/

1176

interface InputAudioBufferCommittedEvent {

1177

type: "input_audio_buffer.committed";

1178

event_id: string;

1179

item_id: string;

1180

previous_item_id?: string | null;

1181

}

1182

1183

/**

1184

* Speech detected in input buffer (VAD start)

1185

*/

1186

interface InputAudioBufferSpeechStartedEvent {

1187

type: "input_audio_buffer.speech_started";

1188

event_id: string;

1189

item_id: string;

1190

/** Milliseconds from session start */

1191

audio_start_ms: number;

1192

}

1193

1194

/**

1195

* Speech ended in input buffer (VAD stop)

1196

*/

1197

interface InputAudioBufferSpeechStoppedEvent {

1198

type: "input_audio_buffer.speech_stopped";

1199

event_id: string;

1200

item_id: string;

1201

/** Milliseconds from session start */

1202

audio_end_ms: number;

1203

}

1204

1205

/**

1206

* Response started

1207

*/

1208

interface ResponseCreatedEvent {

1209

type: "response.created";

1210

event_id: string;

1211

response: RealtimeResponse;

1212

}

1213

1214

/**

1215

* Response completed

1216

*/

1217

interface ResponseDoneEvent {

1218

type: "response.done";

1219

event_id: string;

1220

response: RealtimeResponse;

1221

}

1222

1223

/**

1224

* Audio delta (streaming audio chunk)

1225

*/

1226

interface ResponseAudioDeltaEvent {

1227

type: "response.audio.delta";

1228

event_id: string;

1229

response_id: string;

1230

item_id: string;

1231

output_index: number;

1232

content_index: number;

1233

/** Base64-encoded audio bytes */

1234

delta: string;

1235

}

1236

1237

/**

1238

* Audio generation completed

1239

*/

1240

interface ResponseAudioDoneEvent {

1241

type: "response.audio.done";

1242

event_id: string;

1243

response_id: string;

1244

item_id: string;

1245

output_index: number;

1246

content_index: number;

1247

}

1248

1249

/**

1250

* Text delta (streaming text chunk)

1251

*/

1252

interface ResponseTextDeltaEvent {

1253

type: "response.text.delta";

1254

event_id: string;

1255

response_id: string;

1256

item_id: string;

1257

output_index: number;

1258

content_index: number;

1259

/** Text chunk */

1260

delta: string;

1261

}

1262

1263

/**

1264

* Text generation completed

1265

*/

1266

interface ResponseTextDoneEvent {

1267

type: "response.text.done";

1268

event_id: string;

1269

response_id: string;

1270

item_id: string;

1271

output_index: number;

1272

content_index: number;

1273

/** Complete text */

1274

text: string;

1275

}

1276

1277

/**

1278

* Function call arguments delta

1279

*/

1280

interface ResponseFunctionCallArgumentsDeltaEvent {

1281

type: "response.function_call_arguments.delta";

1282

event_id: string;

1283

response_id: string;

1284

item_id: string;

1285

output_index: number;

1286

call_id: string;

1287

/** JSON arguments chunk */

1288

delta: string;

1289

}

1290

1291

/**

1292

* Function call arguments completed

1293

*/

1294

interface ResponseFunctionCallArgumentsDoneEvent {

1295

type: "response.function_call_arguments.done";

1296

event_id: string;

1297

response_id: string;

1298

item_id: string;

1299

output_index: number;

1300

call_id: string;

1301

/** Complete JSON arguments */

1302

arguments: string;

1303

}

1304

1305

/**

1306

* Error occurred

1307

*/

1308

interface RealtimeErrorEvent {

1309

type: "error";

1310

event_id: string;

1311

error: {

1312

type: string;

1313

code?: string | null;

1314

message: string;

1315

param?: string | null;

1316

event_id?: string | null;

1317

};

1318

}

1319

```

1320

1321

[Server Events](./realtime.md#server-to-client-events)

1322

1323

### Conversation Items

1324

1325

Items that make up the conversation history.

1326

1327

```typescript { .api }

1328

/**

1329

* Union of all conversation item types

1330

*/

1331

type ConversationItem =

1332

| RealtimeConversationItemSystemMessage

1333

| RealtimeConversationItemUserMessage

1334

| RealtimeConversationItemAssistantMessage

1335

| RealtimeConversationItemFunctionCall

1336

| RealtimeConversationItemFunctionCallOutput

1337

| RealtimeMcpApprovalResponse

1338

| RealtimeMcpListTools

1339

| RealtimeMcpToolCall

1340

| RealtimeMcpApprovalRequest;

1341

1342

/**

1343

* System message item

1344

*/

1345

interface RealtimeConversationItemSystemMessage {

1346

type: "message";

1347

role: "system";

1348

content: Array<{

1349

type?: "input_text";

1350

text?: string;

1351

}>;

1352

id?: string;

1353

object?: "realtime.item";

1354

status?: "completed" | "incomplete" | "in_progress";

1355

}

1356

1357

/**

1358

* User message item (text, audio, or image)

1359

*/

1360

interface RealtimeConversationItemUserMessage {

1361

type: "message";

1362

role: "user";

1363

content: Array<{

1364

type?: "input_text" | "input_audio" | "input_image";

1365

text?: string;

1366

audio?: string; // Base64-encoded

1367

transcript?: string;

1368

image_url?: string; // Data URI

1369

detail?: "auto" | "low" | "high";

1370

}>;

1371

id?: string;

1372

object?: "realtime.item";

1373

status?: "completed" | "incomplete" | "in_progress";

1374

}

1375

1376

/**

1377

* Assistant message item (text or audio)

1378

*/

1379

interface RealtimeConversationItemAssistantMessage {

1380

type: "message";

1381

role: "assistant";

1382

content: Array<{

1383

type?: "output_text" | "output_audio";

1384

text?: string;

1385

audio?: string; // Base64-encoded

1386

transcript?: string;

1387

}>;

1388

id?: string;

1389

object?: "realtime.item";

1390

status?: "completed" | "incomplete" | "in_progress";

1391

}

1392

1393

/**

1394

* Function call item

1395

*/

1396

interface RealtimeConversationItemFunctionCall {

1397

type: "function_call";

1398

name: string;

1399

/** JSON-encoded arguments */

1400

arguments: string;

1401

call_id?: string;

1402

id?: string;

1403

object?: "realtime.item";

1404

status?: "completed" | "incomplete" | "in_progress";

1405

}

1406

1407

/**

1408

* Function call output item

1409

*/

1410

interface RealtimeConversationItemFunctionCallOutput {

1411

type: "function_call_output";

1412

call_id: string;

1413

/** Function output (free text) */

1414

output: string;

1415

id?: string;

1416

object?: "realtime.item";

1417

status?: "completed" | "incomplete" | "in_progress";

1418

}

1419

1420

/**

1421

* MCP tool call item

1422

*/

1423

interface RealtimeMcpToolCall {

1424

type: "mcp_call";

1425

id: string;

1426

server_label: string;

1427

name: string;

1428

arguments: string;

1429

output?: string | null;

1430

error?:

1431

| { type: "protocol_error"; code: number; message: string }

1432

| { type: "tool_execution_error"; message: string }

1433

| { type: "http_error"; code: number; message: string }

1434

| null;

1435

approval_request_id?: string | null;

1436

}

1437

1438

/**

1439

* MCP approval request item

1440

*/

1441

interface RealtimeMcpApprovalRequest {

1442

type: "mcp_approval_request";

1443

id: string;

1444

server_label: string;

1445

name: string;

1446

arguments: string;

1447

}

1448

1449

/**

1450

* MCP approval response item

1451

*/

1452

interface RealtimeMcpApprovalResponse {

1453

type: "mcp_approval_response";

1454

id: string;

1455

approval_request_id: string;

1456

approve: boolean;

1457

reason?: string | null;

1458

}

1459

```

1460

1461

[Conversation Items](./realtime.md#conversation-items)

1462

1463

### Function Calling

1464

1465

Define and use tools during real-time conversations.

1466

1467

```typescript { .api }

1468

/**

1469

* Function tool definition for realtime conversations

1470

*/

1471

interface RealtimeFunctionTool {

1472

type?: "function";

1473

/** Function name */

1474

name?: string;

1475

/** Description and usage guidance */

1476

description?: string;

1477

/** JSON Schema for function parameters */

1478

parameters?: unknown;

1479

}

1480

1481

/**

1482

* MCP (Model Context Protocol) tool configuration

1483

*/

1484

interface McpTool {

1485

type: "mcp";

1486

/** Label identifying the MCP server */

1487

server_label: string;

1488

/** MCP server URL or connector ID */

1489

server_url?: string;

1490

connector_id?:

1491

| "connector_dropbox"

1492

| "connector_gmail"

1493

| "connector_googlecalendar"

1494

| "connector_googledrive"

1495

| "connector_microsoftteams"

1496

| "connector_outlookcalendar"

1497

| "connector_outlookemail"

1498

| "connector_sharepoint";

1499

/** Server description */

1500

server_description?: string;

1501

/** Allowed tools filter */

1502

allowed_tools?:

1503

| Array<string>

1504

| {

1505

tool_names?: Array<string>;

1506

read_only?: boolean;

1507

}

1508

| null;

1509

/** Approval requirements */

1510

require_approval?:

1511

| "always"

1512

| "never"

1513

| {

1514

always?: { tool_names?: Array<string>; read_only?: boolean };

1515

never?: { tool_names?: Array<string>; read_only?: boolean };

1516

}

1517

| null;

1518

/** OAuth access token */

1519

authorization?: string;

1520

/** HTTP headers */

1521

headers?: Record<string, string> | null;

1522

}

1523

1524

type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;

1525

1526

type RealtimeToolChoiceConfig =

1527

| "auto"

1528

| "none"

1529

| "required"

1530

| { type: "function"; function: { name: string } }

1531

| { type: "mcp"; mcp: { server_label: string; name: string } };

1532

```

1533

1534

**Usage:**

1535

1536

```typescript

1537

// Define tools

1538

const tools: RealtimeToolsConfig = [

1539

{

1540

type: "function",

1541

name: "get_weather",

1542

description: "Get current weather for a location",

1543

parameters: {

1544

type: "object",

1545

properties: {

1546

location: { type: "string" },

1547

unit: { type: "string", enum: ["celsius", "fahrenheit"] },

1548

},

1549

required: ["location"],

1550

},

1551

},

1552

{

1553

type: "mcp",

1554

server_label: "calendar",

1555

connector_id: "connector_googlecalendar",

1556

allowed_tools: {

1557

tool_names: ["list_events", "create_event"],

1558

},

1559

},

1560

];

1561

1562

// Update session with tools

1563

ws.send({

1564

type: "session.update",

1565

session: {

1566

tools,

1567

tool_choice: "auto",

1568

},

1569

});

1570

1571

// Handle function call

1572

ws.on("response.function_call_arguments.done", async (event) => {

1573

const result = await executeFunction(event.call_id, event.arguments);

1574

1575

// Send function output

1576

ws.send({

1577

type: "conversation.item.create",

1578

item: {

1579

type: "function_call_output",

1580

call_id: event.call_id,

1581

output: JSON.stringify(result),

1582

},

1583

});

1584

1585

// Trigger new response

1586

ws.send({

1587

type: "response.create",

1588

});

1589

});

1590

```

1591

1592

[Function Calling](./realtime.md#function-calling)

1593

1594

### Response Configuration

1595

1596

Configure individual response parameters.

1597

1598

```typescript { .api }

1599

/**

1600

* Response resource

1601

*/

1602

interface RealtimeResponse {

1603

id?: string;

1604

object?: "realtime.response";

1605

/** Conversation ID or null */

1606

conversation_id?: string;

1607

/** Status: 'in_progress', 'completed', 'cancelled', 'failed', 'incomplete' */

1608

status?: RealtimeResponseStatus;

1609

/** Usage statistics */

1610

usage?: RealtimeResponseUsage;

1611

/** Max output tokens */

1612

max_output_tokens?: number | "inf";

1613

/** Response modalities */

1614

modalities?: Array<"text" | "audio">;

1615

/** Instructions for this response */

1616

instructions?: string;

1617

/** Voice selection */

1618

voice?: string;

1619

/** Audio output configuration */

1620

audio?: {

1621

format?: RealtimeAudioFormats;

1622

speed?: number;

1623

voice?: string;

1624

};

1625

/** Response metadata */

1626

metadata?: Record<string, string> | null;

1627

/** Tool choice */

1628

tool_choice?: RealtimeToolChoiceConfig;

1629

/** Tools for this response */

1630

tools?: RealtimeToolsConfig;

1631

/** Temperature */

1632

temperature?: number;

1633

/** Output items */

1634

output?: Array<ConversationItem>;

1635

/** Status details */

1636

status_details?: {

1637

type?: "incomplete" | "failed" | "cancelled";

1638

reason?: string;

1639

error?: RealtimeError | null;

1640

} | null;

1641

}

1642

1643

interface RealtimeResponseStatus {

1644

type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";

1645

/** Additional status information */

1646

reason?: string;

1647

}

1648

1649

interface RealtimeResponseUsage {

1650

/** Total tokens (input + output) */

1651

total_tokens?: number;

1652

/** Input tokens */

1653

input_tokens?: number;

1654

/** Output tokens */

1655

output_tokens?: number;

1656

/** Input token breakdown */

1657

input_token_details?: {

1658

text_tokens?: number;

1659

audio_tokens?: number;

1660

image_tokens?: number;

1661

cached_tokens?: number;

1662

cached_tokens_details?: {

1663

text_tokens?: number;

1664

audio_tokens?: number;

1665

image_tokens?: number;

1666

};

1667

};

1668

/** Output token breakdown */

1669

output_token_details?: {

1670

text_tokens?: number;

1671

audio_tokens?: number;

1672

};

1673

}

1674

```

1675

1676

[Response Configuration](./realtime.md#response-configuration)

1677

1678

### Transcription

1679

1680

Configure and receive audio transcription during conversations.

1681

1682

```typescript { .api }

1683

/**

1684

* Transcription configuration

1685

*/

1686

interface AudioTranscription {

1687

/** Language code (ISO-639-1) */

1688

language?: string;

1689

/** Transcription model */

1690

model?:

1691

| "whisper-1"

1692

| "gpt-4o-mini-transcribe"

1693

| "gpt-4o-transcribe"

1694

| "gpt-4o-transcribe-diarize";

1695

/** Guidance prompt */

1696

prompt?: string;

1697

}

1698

1699

/**

1700

* Transcription completed event

1701

*/

1702

interface ConversationItemInputAudioTranscriptionCompletedEvent {

1703

type: "conversation.item.input_audio_transcription.completed";

1704

event_id: string;

1705

item_id: string;

1706

content_index: number;

1707

/** Transcribed text */

1708

transcript: string;

1709

/** Usage statistics */

1710

usage:

1711

| {

1712

type: "tokens";

1713

input_tokens: number;

1714

output_tokens: number;

1715

total_tokens: number;

1716

input_token_details?: {

1717

text_tokens?: number;

1718

audio_tokens?: number;

1719

};

1720

}

1721

| {

1722

type: "duration";

1723

/** Duration in seconds */

1724

seconds: number;

1725

};

1726

/** Log probabilities (if enabled) */

1727

logprobs?: Array<{

1728

token: string;

1729

logprob: number;

1730

bytes: Array<number>;

1731

}> | null;

1732

}

1733

1734

/**

1735

* Transcription delta event (streaming)

1736

*/

1737

interface ConversationItemInputAudioTranscriptionDeltaEvent {

1738

type: "conversation.item.input_audio_transcription.delta";

1739

event_id: string;

1740

item_id: string;

1741

content_index?: number;

1742

/** Transcript chunk */

1743

delta?: string;

1744

/** Log probabilities (if enabled) */

1745

logprobs?: Array<{

1746

token: string;

1747

logprob: number;

1748

bytes: Array<number>;

1749

}> | null;

1750

}

1751

1752

/**

1753

* Transcription segment (for diarization)

1754

*/

1755

interface ConversationItemInputAudioTranscriptionSegment {

1756

type: "conversation.item.input_audio_transcription.segment";

1757

event_id: string;

1758

item_id: string;

1759

content_index: number;

1760

id: string;

1761

/** Segment text */

1762

text: string;

1763

/** Speaker label */

1764

speaker: string;

1765

/** Start time in seconds */

1766

start: number;

1767

/** End time in seconds */

1768

end: number;

1769

}

1770

1771

/**

1772

* Transcription failed event

1773

*/

1774

interface ConversationItemInputAudioTranscriptionFailedEvent {

1775

type: "conversation.item.input_audio_transcription.failed";

1776

event_id: string;

1777

item_id: string;

1778

content_index: number;

1779

error: {

1780

type?: string;

1781

code?: string;

1782

message?: string;

1783

param?: string;

1784

};

1785

}

1786

```

1787

1788

**Usage:**

1789

1790

```typescript

1791

// Enable transcription with log probabilities

1792

ws.send({

1793

type: "session.update",

1794

session: {

1795

input_audio_transcription: {

1796

model: "gpt-4o-transcribe",

1797

language: "en",

1798

},

1799

include: ["item.input_audio_transcription.logprobs"],

1800

},

1801

});

1802

1803

// Listen for transcription

1804

ws.on("conversation.item.input_audio_transcription.delta", (event) => {

1805

console.log("Transcript delta:", event.delta);

1806

});

1807

1808

ws.on(

1809

"conversation.item.input_audio_transcription.completed",

1810

(event) => {

1811

console.log("Full transcript:", event.transcript);

1812

console.log("Usage:", event.usage);

1813

}

1814

);

1815

1816

// Diarization support

1817

ws.send({

1818

type: "session.update",

1819

session: {

1820

input_audio_transcription: {

1821

model: "gpt-4o-transcribe-diarize",

1822

},

1823

},

1824

});

1825

1826

ws.on(

1827

"conversation.item.input_audio_transcription.segment",

1828

(event) => {

1829

console.log(

1830

`[${event.speaker}] ${event.text} (${event.start}s - ${event.end}s)`

1831

);

1832

}

1833

);

1834

```

1835

1836

[Transcription](./realtime.md#transcription)

1837

1838

### Error Handling

1839

1840

Handle errors and edge cases in real-time conversations.

1841

1842

```typescript { .api }

1843

/**

1844

* Error event from server

1845

*/

1846

interface RealtimeErrorEvent {

1847

type: "error";

1848

event_id: string;

1849

error: RealtimeError;

1850

}

1851

1852

interface RealtimeError {

1853

/** Error type */

1854

type: string;

1855

/** Error code (optional) */

1856

code?: string | null;

1857

/** Human-readable message */

1858

message: string;

1859

/** Related parameter (optional) */

1860

param?: string | null;

1861

/** Client event ID that caused error (optional) */

1862

event_id?: string | null;

1863

}

1864

1865

/**

1866

* OpenAI Realtime error class

1867

*/

1868

class OpenAIRealtimeError extends Error {

1869

constructor(message: string);

1870

}

1871

```

1872

1873

**Common Error Types:**

1874

1875

```typescript

1876

// Invalid request errors

1877

{

1878

type: "invalid_request_error",

1879

code: "invalid_value",

1880

message: "Invalid value for 'audio_format'",

1881

param: "audio_format"

1882

}

1883

1884

// Server errors

1885

{

1886

type: "server_error",

1887

message: "Internal server error"

1888

}

1889

1890

// Rate limit errors

1891

{

1892

type: "rate_limit_error",

1893

message: "Rate limit exceeded"

1894

}

1895

```

1896

1897

**Usage:**

1898

1899

```typescript

1900

ws.on("error", (event: RealtimeErrorEvent) => {

1901

console.error("Realtime error:", event.error);

1902

1903

if (event.error.type === "rate_limit_error") {

1904

// Handle rate limiting

1905

} else if (event.error.type === "invalid_request_error") {

1906

// Handle validation errors

1907

console.error("Invalid:", event.error.param, event.error.message);

1908

}

1909

});

1910

1911

// WebSocket errors

1912

ws.socket.addEventListener("error", (error) => {

1913

console.error("WebSocket error:", error);

1914

});

1915

```

1916

1917

[Error Handling](./realtime.md#error-handling)

1918

1919

### Rate Limits

1920

1921

Monitor rate limits during conversations.

1922

1923

```typescript { .api }

1924

/**

1925

* Rate limits updated event

1926

*/

1927

interface RateLimitsUpdatedEvent {

1928

type: "rate_limits.updated";

1929

event_id: string;

1930

rate_limits: Array<{

1931

/** Rate limit name: 'requests' or 'tokens' */

1932

name?: "requests" | "tokens";

1933

/** Maximum allowed value */

1934

limit?: number;

1935

/** Remaining before limit reached */

1936

remaining?: number;

1937

/** Seconds until reset */

1938

reset_seconds?: number;

1939

}>;

1940

}

1941

```

1942

1943

**Usage:**

1944

1945

```typescript

1946

ws.on("rate_limits.updated", (event: RateLimitsUpdatedEvent) => {

1947

event.rate_limits.forEach((limit) => {

1948

console.log(`${limit.name}: ${limit.remaining}/${limit.limit}`);

1949

console.log(`Resets in ${limit.reset_seconds}s`);

1950

});

1951

});

1952

```

1953

1954

[Rate Limits](./realtime.md#rate-limits)

1955

1956

### Tracing

1957

1958

Configure distributed tracing for debugging and monitoring.

1959

1960

```typescript { .api }

1961

/**

1962

* Tracing configuration

1963

*/

1964

type RealtimeTracingConfig =

1965

| "auto"

1966

| {

1967

/** Workflow name in Traces Dashboard */

1968

workflow_name?: string;

1969

/** Group ID for filtering */

1970

group_id?: string;

1971

/** Arbitrary metadata */

1972

metadata?: unknown;

1973

}

1974

| null;

1975

```

1976

1977

**Usage:**

1978

1979

```typescript

1980

// Auto tracing with defaults

1981

{

1982

tracing: "auto";

1983

}

1984

1985

// Custom tracing configuration

1986

{

1987

tracing: {

1988

workflow_name: "customer-support-bot",

1989

group_id: "prod-us-west",

1990

metadata: {

1991

customer_id: "cust_123",

1992

agent_version: "2.1.0"

1993

}

1994

}

1995

}

1996

1997

// Disable tracing

1998

{

1999

tracing: null;

2000

}

2001

```

2002

2003

[Tracing](./realtime.md#tracing)

2004

2005

## Complete Example: Voice Assistant

2006

2007

```typescript

2008

import OpenAI from "openai";

2009

import { OpenAIRealtimeWebSocket } from "openai/realtime/websocket";

2010

2011

const client = new OpenAI();

2012

2013

// Create session token

2014

const secret = await client.realtime.clientSecrets.create({

2015

session: {

2016

type: "realtime",

2017

model: "gpt-realtime",

2018

audio: {

2019

input: {

2020

format: { type: "audio/pcm", rate: 24000 },

2021

turn_detection: {

2022

type: "server_vad",

2023

threshold: 0.5,

2024

silence_duration_ms: 500,

2025

interrupt_response: true,

2026

},

2027

transcription: {

2028

model: "gpt-4o-transcribe",

2029

},

2030

},

2031

output: {

2032

format: { type: "audio/pcm", rate: 24000 },

2033

voice: "marin",

2034

},

2035

},

2036

instructions:

2037

"You are a helpful voice assistant. Speak naturally and concisely.",

2038

tools: [

2039

{

2040

type: "function",

2041

name: "get_weather",

2042

description: "Get weather for a location",

2043

parameters: {

2044

type: "object",

2045

properties: {

2046

location: { type: "string" },

2047

},

2048

required: ["location"],

2049

},

2050

},

2051

],

2052

},

2053

});

2054

2055

// Connect WebSocket

2056

const ws = await OpenAIRealtimeWebSocket.create(client, {

2057

model: "gpt-realtime",

2058

});

2059

2060

// Handle session

2061

ws.on("session.created", (event) => {

2062

console.log("Session created:", event.session.id);

2063

});

2064

2065

// Handle conversation

2066

ws.on("conversation.item.created", (event) => {

2067

console.log("Item created:", event.item.type);

2068

});

2069

2070

// Handle audio output

2071

ws.on("response.audio.delta", (event) => {

2072

const audioData = Buffer.from(event.delta, "base64");

2073

playAudio(audioData); // Play to speaker

2074

});

2075

2076

// Handle transcripts

2077

ws.on("conversation.item.input_audio_transcription.completed", (event) => {

2078

console.log("User said:", event.transcript);

2079

});

2080

2081

ws.on("response.audio_transcript.delta", (event) => {

2082

process.stdout.write(event.delta);

2083

});

2084

2085

// Handle VAD

2086

ws.on("input_audio_buffer.speech_started", () => {

2087

console.log("User started speaking");

2088

stopAudioPlayback(); // Interrupt assistant

2089

});

2090

2091

ws.on("input_audio_buffer.speech_stopped", () => {

2092

console.log("User stopped speaking");

2093

});

2094

2095

// Handle function calls

2096

ws.on("response.function_call_arguments.done", async (event) => {

2097

console.log("Function call:", event.call_id);

2098

2099

const args = JSON.parse(event.arguments);

2100

const result = await getWeather(args.location);

2101

2102

// Send result

2103

ws.send({

2104

type: "conversation.item.create",

2105

item: {

2106

type: "function_call_output",

2107

call_id: event.call_id,

2108

output: JSON.stringify(result),

2109

},

2110

});

2111

2112

// Continue conversation

2113

ws.send({

2114

type: "response.create",

2115

});

2116

});

2117

2118

// Handle errors

2119

ws.on("error", (event) => {

2120

console.error("Error:", event.error.message);

2121

});

2122

2123

// Capture and send microphone audio

2124

const audioStream = captureMicrophone();

2125

audioStream.on("data", (chunk) => {

2126

const base64 = chunk.toString("base64");

2127

ws.send({

2128

type: "input_audio_buffer.append",

2129

audio: base64,

2130

});

2131

});

2132

2133

// Cleanup

2134

process.on("SIGINT", () => {

2135

ws.close();

2136

process.exit(0);

2137

});

2138

```

2139

2140

## Complete Example: Phone Call Handler

2141

2142

```typescript

2143

import OpenAI from "openai";

2144

import express from "express";

2145

2146

const client = new OpenAI();

2147

const app = express();

2148

2149

app.use(express.json());

2150

2151

// Webhook for incoming calls

2152

app.post("/realtime/webhook/incoming_call", async (req, res) => {

2153

const event = req.body;

2154

2155

if (event.type === "realtime.call.incoming") {

2156

const callId = event.data.id;

2157

2158

// Accept the call

2159

await client.realtime.calls.accept(callId, {

2160

type: "realtime",

2161

model: "gpt-realtime",

2162

instructions:

2163

"You are a customer service agent. Be professional and helpful.",

2164

audio: {

2165

input: {

2166

format: { type: "audio/pcmu" }, // G.711 for telephony

2167

turn_detection: {

2168

type: "server_vad",

2169

silence_duration_ms: 700,

2170

},

2171

},

2172

output: {

2173

format: { type: "audio/pcmu" },

2174

voice: "marin",

2175

},

2176

},

2177

tools: [

2178

{

2179

type: "function",

2180

name: "transfer_to_agent",

2181

description: "Transfer to human agent",

2182

parameters: {

2183

type: "object",

2184

properties: {

2185

reason: { type: "string" },

2186

},

2187

},

2188

},

2189

],

2190

});

2191

2192

console.log(`Accepted call: ${callId}`);

2193

}

2194

2195

res.sendStatus(200);

2196

});

2197

2198

// Webhook for call events

2199

app.post("/realtime/webhook/call_events", async (req, res) => {

2200

const event = req.body;

2201

2202

if (event.type === "realtime.response.function_call_output.done") {

2203

const { call_id, function_name, arguments: args } = event.data;

2204

2205

if (function_name === "transfer_to_agent") {

2206

// Transfer call

2207

await client.realtime.calls.refer(call_id, {

2208

target_uri: "sip:support@example.com",

2209

});

2210

}

2211

}

2212

2213

res.sendStatus(200);

2214

});

2215

2216

app.listen(3000, () => {

2217

console.log("Webhook server running on port 3000");

2218

});

2219

```

2220

2221

## Type Reference

2222

2223

### Core Types

2224

2225

```typescript { .api }

2226

type RealtimeClientEvent =

2227

| ConversationItemCreateEvent

2228

| ConversationItemDeleteEvent

2229

| ConversationItemRetrieveEvent

2230

| ConversationItemTruncateEvent

2231

| InputAudioBufferAppendEvent

2232

| InputAudioBufferClearEvent

2233

| OutputAudioBufferClearEvent

2234

| InputAudioBufferCommitEvent

2235

| ResponseCancelEvent

2236

| ResponseCreateEvent

2237

| SessionUpdateEvent;

2238

2239

type RealtimeServerEvent =

2240

| ConversationCreatedEvent

2241

| ConversationItemCreatedEvent

2242

| ConversationItemDeletedEvent

2243

| ConversationItemAdded

2244

| ConversationItemDone

2245

| ConversationItemRetrieved

2246

| ConversationItemTruncatedEvent

2247

| ConversationItemInputAudioTranscriptionCompletedEvent

2248

| ConversationItemInputAudioTranscriptionDeltaEvent

2249

| ConversationItemInputAudioTranscriptionFailedEvent

2250

| ConversationItemInputAudioTranscriptionSegment

2251

| InputAudioBufferClearedEvent

2252

| InputAudioBufferCommittedEvent

2253

| InputAudioBufferSpeechStartedEvent

2254

| InputAudioBufferSpeechStoppedEvent

2255

| InputAudioBufferTimeoutTriggered

2256

| OutputAudioBufferStarted

2257

| OutputAudioBufferStopped

2258

| OutputAudioBufferCleared

2259

| ResponseCreatedEvent

2260

| ResponseDoneEvent

2261

| ResponseOutputItemAddedEvent

2262

| ResponseOutputItemDoneEvent

2263

| ResponseContentPartAddedEvent

2264

| ResponseContentPartDoneEvent

2265

| ResponseAudioDeltaEvent

2266

| ResponseAudioDoneEvent

2267

| ResponseAudioTranscriptDeltaEvent

2268

| ResponseAudioTranscriptDoneEvent

2269

| ResponseTextDeltaEvent

2270

| ResponseTextDoneEvent

2271

| ResponseFunctionCallArgumentsDeltaEvent

2272

| ResponseFunctionCallArgumentsDoneEvent

2273

| ResponseMcpCallArgumentsDelta

2274

| ResponseMcpCallArgumentsDone

2275

| ResponseMcpCallInProgress

2276

| ResponseMcpCallCompleted

2277

| ResponseMcpCallFailed

2278

| McpListToolsInProgress

2279

| McpListToolsCompleted

2280

| McpListToolsFailed

2281

| SessionCreatedEvent

2282

| SessionUpdatedEvent

2283

| RateLimitsUpdatedEvent

2284

| RealtimeErrorEvent;

2285

2286

type ConversationItem =

2287

| RealtimeConversationItemSystemMessage

2288

| RealtimeConversationItemUserMessage

2289

| RealtimeConversationItemAssistantMessage

2290

| RealtimeConversationItemFunctionCall

2291

| RealtimeConversationItemFunctionCallOutput

2292

| RealtimeMcpApprovalResponse

2293

| RealtimeMcpListTools

2294

| RealtimeMcpToolCall

2295

| RealtimeMcpApprovalRequest;

2296

2297

interface RealtimeSession {

2298

id?: string;

2299

object?: "realtime.session";

2300

model?: string;

2301

expires_at?: number;

2302

modalities?: Array<"text" | "audio">;

2303

instructions?: string;

2304

voice?: string;

2305

input_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";

2306

output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";

2307

input_audio_transcription?: AudioTranscription | null;

2308

turn_detection?: RealtimeAudioInputTurnDetection | null;

2309

tools?: Array<RealtimeFunctionTool>;

2310

tool_choice?: string;

2311

temperature?: number;

2312

max_response_output_tokens?: number | "inf";

2313

speed?: number;

2314

input_audio_noise_reduction?: {

2315

type?: NoiseReductionType;

2316

};

2317

include?: Array<"item.input_audio_transcription.logprobs"> | null;

2318

prompt?: ResponsePrompt | null;

2319

tracing?: RealtimeTracingConfig | null;

2320

truncation?: RealtimeTruncation;

2321

}

2322

2323

interface RealtimeResponse {

2324

id?: string;

2325

object?: "realtime.response";

2326

status?: RealtimeResponseStatus;

2327

conversation_id?: string;

2328

output?: Array<ConversationItem>;

2329

usage?: RealtimeResponseUsage;

2330

status_details?: {

2331

type?: "incomplete" | "failed" | "cancelled";

2332

reason?: string;

2333

error?: RealtimeError | null;

2334

} | null;

2335

max_output_tokens?: number | "inf";

2336

modalities?: Array<"text" | "audio">;

2337

instructions?: string;

2338

voice?: string;

2339

audio?: {

2340

format?: RealtimeAudioFormats;

2341

speed?: number;

2342

voice?: string;

2343

};

2344

metadata?: Record<string, string> | null;

2345

tool_choice?: RealtimeToolChoiceConfig;

2346

tools?: RealtimeToolsConfig;

2347

temperature?: number;

2348

}

2349

2350

interface AudioTranscription {

2351

language?: string;

2352

model?:

2353

| "whisper-1"

2354

| "gpt-4o-mini-transcribe"

2355

| "gpt-4o-transcribe"

2356

| "gpt-4o-transcribe-diarize";

2357

prompt?: string;

2358

}

2359

2360

type RealtimeAudioFormats =

2361

| { type?: "audio/pcm"; rate?: 24000 }

2362

| { type?: "audio/pcmu" }

2363

| { type?: "audio/pcma" };

2364

2365

type NoiseReductionType = "near_field" | "far_field";

2366

2367

type RealtimeAudioInputTurnDetection =

2368

| {

2369

type: "server_vad";

2370

threshold?: number;

2371

prefix_padding_ms?: number;

2372

silence_duration_ms?: number;

2373

create_response?: boolean;

2374

interrupt_response?: boolean;

2375

idle_timeout_ms?: number | null;

2376

}

2377

| {

2378

type: "semantic_vad";

2379

eagerness?: "low" | "medium" | "high" | "auto";

2380

create_response?: boolean;

2381

interrupt_response?: boolean;

2382

};

2383

2384

type RealtimeTruncation =

2385

| "auto"

2386

| "disabled"

2387

| { type: "retention_ratio"; retention_ratio: number };

2388

2389

type RealtimeToolsConfig = Array<RealtimeFunctionTool | McpTool>;

2390

2391

type RealtimeToolChoiceConfig =

2392

| "auto"

2393

| "none"

2394

| "required"

2395

| { type: "function"; function: { name: string } }

2396

| { type: "mcp"; mcp: { server_label: string; name: string } };

2397

2398

type RealtimeTracingConfig =

2399

| "auto"

2400

| {

2401

workflow_name?: string;

2402

group_id?: string;

2403

metadata?: unknown;

2404

}

2405

| null;

2406

2407

interface RealtimeError {

2408

type: string;

2409

code?: string | null;

2410

message: string;

2411

param?: string | null;

2412

event_id?: string | null;

2413

}

2414

2415

interface RealtimeResponseUsage {

2416

total_tokens?: number;

2417

input_tokens?: number;

2418

output_tokens?: number;

2419

input_token_details?: {

2420

text_tokens?: number;

2421

audio_tokens?: number;

2422

image_tokens?: number;

2423

cached_tokens?: number;

2424

cached_tokens_details?: {

2425

text_tokens?: number;

2426

audio_tokens?: number;

2427

image_tokens?: number;

2428

};

2429

};

2430

output_token_details?: {

2431

text_tokens?: number;

2432

audio_tokens?: number;

2433

};

2434

}

2435

2436

interface RealtimeResponseStatus {

2437

type: "in_progress" | "completed" | "cancelled" | "failed" | "incomplete";

2438

reason?: string;

2439

}

2440

```

2441

2442

## Models

2443

2444

Available Realtime API models:

2445

2446

- `gpt-realtime` (latest)

2447

- `gpt-realtime-2025-08-28`

2448

- `gpt-4o-realtime-preview`

2449

- `gpt-4o-realtime-preview-2024-10-01`

2450

- `gpt-4o-realtime-preview-2024-12-17`

2451

- `gpt-4o-realtime-preview-2025-06-03`

2452

- `gpt-4o-mini-realtime-preview`

2453

- `gpt-4o-mini-realtime-preview-2024-12-17`

2454

- `gpt-realtime-mini`

2455

- `gpt-realtime-mini-2025-10-06`

2456

- `gpt-audio-mini`

2457

- `gpt-audio-mini-2025-10-06`

2458

2459

## Best Practices

2460

2461

### Security

2462

2463

- **Never expose API keys in browser**: Always use ephemeral session tokens

2464

- **Token expiration**: Default 10 minutes, max 2 hours

2465

- **Server-side validation**: Validate all tool calls server-side

2466

- **Rate limiting**: Monitor rate limit events and handle gracefully

2467

2468

### Performance

2469

2470

- **Audio chunking**: Send audio in chunks (1-5 seconds recommended)

2471

- **VAD tuning**: Adjust threshold and silence duration for your environment

2472

- **Voice selection**: Use `marin` or `cedar` for best quality

2473

- **Caching**: Enable context caching for repeated conversations

2474

2475

### Audio Quality

2476

2477

- **Noise reduction**: Enable for far-field or noisy environments

2478

- **Sample rate**: Always use 24kHz for PCM audio

2479

- **Format selection**: Use G.711 (pcmu/pcma) for telephony, PCM for quality

2480

- **Interrupt handling**: Clear audio buffers on interruption

2481

2482

### Conversation Management

2483

2484

- **Context length**: Monitor token usage, configure truncation

2485

- **Function calling**: Keep tool outputs concise

2486

- **System messages**: Use for mid-conversation context updates

2487

- **Item ordering**: Use `previous_item_id` for precise insertion

2488

2489

### Error Handling

2490

2491

- **Graceful degradation**: Handle WebSocket disconnections

2492

- **Retry logic**: Implement exponential backoff for transient errors

2493

- **Error logging**: Log all error events for debugging

2494

- **User feedback**: Provide clear feedback on connection/processing status

2495

2496

## Common Patterns

2497

2498

### Voice-to-Voice Assistant

2499

2500

```typescript

2501

const ws = await OpenAIRealtimeWebSocket.create(client, {

2502

model: "gpt-realtime",

2503

});

2504

2505

// Microphone → Input Buffer

2506

micStream.on("data", (chunk) => {

2507

ws.send({

2508

type: "input_audio_buffer.append",

2509

audio: chunk.toString("base64"),

2510

});

2511

});

2512

2513

// Output Audio → Speaker

2514

ws.on("response.audio.delta", (event) => {

2515

playAudio(Buffer.from(event.delta, "base64"));

2516

});

2517

2518

// VAD-based interruption

2519

ws.on("input_audio_buffer.speech_started", () => {

2520

stopPlayback();

2521

});

2522

```

2523

2524

### Text-to-Voice Assistant

2525

2526

```typescript

2527

// Send text message

2528

ws.send({

2529

type: "conversation.item.create",

2530

item: {

2531

type: "message",

2532

role: "user",

2533

content: [{ type: "input_text", text: "Hello!" }],

2534

},

2535

});

2536

2537

// Request audio response

2538

ws.send({

2539

type: "response.create",

2540

response: {

2541

modalities: ["audio"],

2542

},

2543

});

2544

```

2545

2546

### Streaming Transcripts

2547

2548

```typescript

2549

ws.on("response.audio_transcript.delta", (event) => {

2550

updateSubtitles(event.delta);

2551

});

2552

2553

ws.on("conversation.item.input_audio_transcription.delta", (event) => {

2554

updateUserTranscript(event.delta);

2555

});

2556

```

2557

2558

### Multi-Tool Assistant

2559

2560

```typescript

2561

const tools = [

2562

{

2563

type: "function",

2564

name: "search_database",

2565

description: "Search customer database",

2566

parameters: {

2567

/* ... */

2568

},

2569

},

2570

{

2571

type: "mcp",

2572

server_label: "calendar",

2573

connector_id: "connector_googlecalendar",

2574

},

2575

];

2576

2577

ws.send({

2578

type: "session.update",

2579

session: { tools, tool_choice: "auto" },

2580

});

2581

```

2582