or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

assistants.mdaudio.mdbatches.mdchat-completions.mdchatkit.mdclient-initialization.mdcompletions.mdcontainers.mdconversations.mdembeddings.mdevals.mdfiles.mdfine-tuning.mdimages.mdindex.mdmodels.mdmoderations.mdrealtime.mdresponses.mdruns.mdthreads-messages.mduploads.mdvector-stores.mdvideos.mdwebhooks.md
KNOWN_ISSUES.md

chat-completions.mddocs/

0

# Chat Completions

1

2

Create conversational responses using OpenAI's language models with support for text, vision inputs, audio, function calling, structured outputs, and streaming. Chat completions are the primary interface for interacting with models like GPT-4 and GPT-3.5.

3

4

## Access Patterns

5

6

Chat completions are accessible via:

7

- `client.chat.completions` - Primary access pattern

8

- `client.beta.chat.completions` - Beta namespace alias (identical to `client.chat`)

9

10

## Capabilities

11

12

### Create Chat Completion

13

14

Generate a response for a chat conversation with extensive configuration options.

15

16

```python { .api }

17

def create(

18

self,

19

*,

20

messages: Iterable[ChatCompletionMessageParam],

21

model: str | ChatModel,

22

audio: ChatCompletionAudioParam | None | Omit = omit,

23

frequency_penalty: float | None | Omit = omit,

24

function_call: dict | str | Omit = omit,

25

functions: Iterable[dict] | Omit = omit,

26

logit_bias: dict[str, int] | None | Omit = omit,

27

logprobs: bool | None | Omit = omit,

28

max_completion_tokens: int | None | Omit = omit,

29

max_tokens: int | None | Omit = omit,

30

metadata: dict[str, str] | None | Omit = omit,

31

modalities: list[Literal["text", "audio"]] | None | Omit = omit,

32

n: int | None | Omit = omit,

33

parallel_tool_calls: bool | Omit = omit,

34

prediction: dict | None | Omit = omit,

35

presence_penalty: float | None | Omit = omit,

36

prompt_cache_key: str | Omit = omit,

37

prompt_cache_retention: Literal["in-memory", "24h"] | None | Omit = omit,

38

reasoning_effort: Literal["none", "minimal", "low", "medium", "high"] | None | Omit = omit,

39

response_format: completion_create_params.ResponseFormat | Omit = omit,

40

safety_identifier: str | Omit = omit,

41

seed: int | None | Omit = omit,

42

service_tier: Literal["auto", "default", "flex", "scale", "priority"] | None | Omit = omit,

43

stop: str | list[str] | None | Omit = omit,

44

store: bool | None | Omit = omit,

45

stream: bool | Omit = omit,

46

stream_options: dict | None | Omit = omit,

47

temperature: float | None | Omit = omit,

48

tool_choice: ChatCompletionToolChoiceOptionParam | Omit = omit,

49

tools: Iterable[ChatCompletionToolUnionParam] | Omit = omit,

50

top_logprobs: int | None | Omit = omit,

51

top_p: float | None | Omit = omit,

52

user: str | Omit = omit,

53

verbosity: Literal["low", "medium", "high"] | None | Omit = omit,

54

web_search_options: dict | Omit = omit,

55

extra_headers: dict[str, str] | None = None,

56

extra_query: dict[str, object] | None = None,

57

extra_body: dict[str, object] | None = None,

58

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

59

) -> ChatCompletion | Stream[ChatCompletionChunk]:

60

"""

61

Create a model response for the given chat conversation.

62

63

Args:

64

messages: List of messages comprising the conversation. Each message can be:

65

- System message: {"role": "system", "content": "..."}

66

- User message: {"role": "user", "content": "..."}

67

- Assistant message: {"role": "assistant", "content": "..."}

68

- Tool message: {"role": "tool", "content": "...", "tool_call_id": "..."}

69

Supports text, images, and audio content.

70

71

model: Model ID like "gpt-4", "gpt-4-turbo", "gpt-3.5-turbo", or "o1".

72

See https://platform.openai.com/docs/models for available models.

73

74

audio: Parameters for audio output when modalities includes "audio".

75

{"voice": "alloy|echo|fable|onyx|nova|shimmer", "format": "wav|mp3|flac|opus|pcm16"}

76

77

frequency_penalty: Number between -2.0 and 2.0. Penalizes tokens based on frequency

78

in the text, reducing repetition. Default 0.

79

80

function_call: (Deprecated, use tool_choice) Controls function calling.

81

- "none": No function calls

82

- "auto": Model decides

83

- {"name": "function_name"}: Force specific function

84

85

functions: (Deprecated, use tools) List of function definitions.

86

87

logit_bias: Modify token probabilities. Maps token IDs to bias values from

88

-100 to 100. Values near ±1 slightly adjust probability, ±100 ban/force tokens.

89

90

logprobs: If true, returns log probabilities of output tokens.

91

92

max_completion_tokens: Maximum tokens for completion including reasoning tokens.

93

Preferred over max_tokens for o-series models.

94

95

max_tokens: (Deprecated for o-series) Maximum tokens in completion.

96

Use max_completion_tokens instead.

97

98

metadata: Up to 16 key-value pairs for storing additional object information.

99

Keys max 64 chars, values max 512 chars.

100

101

modalities: Output types to generate. Options: ["text"], ["audio"], ["text", "audio"].

102

Default ["text"]. Audio requires gpt-4o-audio-preview model.

103

104

n: Number of completion choices to generate. Default 1.

105

Costs scale with number of choices.

106

107

parallel_tool_calls: Enable parallel function calling during tool use.

108

Default true when tools are present.

109

110

prediction: Static predicted output for regeneration tasks (e.g., file content).

111

112

presence_penalty: Number between -2.0 and 2.0. Penalizes tokens based on

113

presence in text, encouraging new topics. Default 0.

114

115

prompt_cache_key: Cache identifier for optimizing similar requests.

116

Replaces the user field for caching.

117

118

prompt_cache_retention: Cache retention policy. "24h" enables extended caching

119

up to 24 hours. Default "in-memory".

120

121

reasoning_effort: Effort level for reasoning models (o-series).

122

Options: "none", "minimal", "low", "medium", "high".

123

- gpt-5.1: defaults to "none"

124

- Other models: default "medium"

125

- gpt-5-pro: only supports "high"

126

127

response_format: Output format specification.

128

- {"type": "text"}: Plain text (default)

129

- {"type": "json_object"}: Valid JSON

130

- {"type": "json_schema", "json_schema": {...}}: Structured Outputs

131

132

safety_identifier: Stable user identifier (hashed) for policy violation detection.

133

134

seed: For deterministic sampling (Beta). Same seed + parameters should return

135

same result, but not guaranteed. Check system_fingerprint for changes.

136

137

service_tier: Processing type for serving. Options: "auto", "default", "flex",

138

"scale", "priority". Affects latency and pricing.

139

140

stop: Up to 4 sequences where generation stops. Can be string or list of strings.

141

142

store: If true, stores the completion for model distillation/evals.

143

144

stream: If true, returns SSE stream of ChatCompletionChunk objects.

145

Returns Stream[ChatCompletionChunk] instead of ChatCompletion.

146

147

stream_options: Streaming configuration. Accepts dict with:

148

- "include_usage": bool - If true, includes token usage in final chunk

149

- "include_obfuscation": bool - If true (default), adds random characters

150

to obfuscation field on streaming delta events to normalize payload sizes

151

as mitigation for side-channel attacks. Set to false to optimize bandwidth.

152

153

temperature: Sampling temperature between 0 and 2. Higher values (e.g., 0.8)

154

make output more random, lower values (e.g., 0.2) more deterministic.

155

Default 1. Alter this or top_p, not both.

156

157

tool_choice: Controls tool/function calling.

158

- "none": No tools called

159

- "auto": Model decides (default when tools present)

160

- "required": Model must call at least one tool

161

- {"type": "function", "function": {"name": "..."}}: Force specific tool

162

163

tools: List of tools/functions available to the model. Each tool:

164

{

165

"type": "function",

166

"function": {

167

"name": "function_name",

168

"description": "What it does",

169

"parameters": {...} # JSON Schema

170

}

171

}

172

173

top_logprobs: Number of most likely tokens to return with logprobs (0-20).

174

Requires logprobs=true.

175

176

top_p: Nucleus sampling parameter between 0 and 1. Model considers tokens

177

with top_p probability mass. E.g., 0.1 means only tokens in top 10%.

178

Default 1. Alter this or temperature, not both.

179

180

user: (Deprecated for caching, use prompt_cache_key) Unique user identifier

181

for abuse monitoring.

182

183

verbosity: Output detail level for reasoning models. Options: "low", "medium", "high".

184

185

web_search_options: Web search configuration (if available for model).

186

187

extra_headers: Additional HTTP headers for the request.

188

189

extra_query: Additional query parameters for the request.

190

191

extra_body: Additional JSON fields for the request body.

192

193

timeout: Request timeout in seconds.

194

195

Returns:

196

ChatCompletion: If stream=False (default), returns complete response.

197

Stream[ChatCompletionChunk]: If stream=True, returns streaming response.

198

199

Raises:

200

BadRequestError: Invalid parameters

201

AuthenticationError: Invalid API key

202

RateLimitError: Rate limit exceeded

203

APIError: Other API errors

204

"""

205

```

206

207

Usage examples:

208

209

```python

210

from openai import OpenAI

211

212

client = OpenAI()

213

214

# Basic chat completion

215

response = client.chat.completions.create(

216

model="gpt-4",

217

messages=[

218

{"role": "system", "content": "You are a helpful assistant."},

219

{"role": "user", "content": "What is the capital of France?"}

220

]

221

)

222

print(response.choices[0].message.content)

223

224

# With multiple messages and temperature

225

response = client.chat.completions.create(

226

model="gpt-3.5-turbo",

227

messages=[

228

{"role": "system", "content": "You are a creative writer."},

229

{"role": "user", "content": "Write a haiku about coding."},

230

],

231

temperature=0.8,

232

max_tokens=100

233

)

234

235

# With vision (image input)

236

response = client.chat.completions.create(

237

model="gpt-4-turbo",

238

messages=[

239

{

240

"role": "user",

241

"content": [

242

{"type": "text", "text": "What's in this image?"},

243

{

244

"type": "image_url",

245

"image_url": {"url": "https://example.com/image.jpg"}

246

}

247

]

248

}

249

]

250

)

251

252

# With function calling / tools

253

tools = [

254

{

255

"type": "function",

256

"function": {

257

"name": "get_weather",

258

"description": "Get current weather for a location",

259

"parameters": {

260

"type": "object",

261

"properties": {

262

"location": {

263

"type": "string",

264

"description": "City name, e.g., San Francisco"

265

},

266

"unit": {

267

"type": "string",

268

"enum": ["celsius", "fahrenheit"]

269

}

270

},

271

"required": ["location"]

272

}

273

}

274

}

275

]

276

277

response = client.chat.completions.create(

278

model="gpt-4",

279

messages=[{"role": "user", "content": "What's the weather in Boston?"}],

280

tools=tools,

281

tool_choice="auto"

282

)

283

284

# Check if model wants to call a function

285

if response.choices[0].message.tool_calls:

286

for tool_call in response.choices[0].message.tool_calls:

287

print(f"Function: {tool_call.function.name}")

288

print(f"Arguments: {tool_call.function.arguments}")

289

290

# Streaming response

291

stream = client.chat.completions.create(

292

model="gpt-4",

293

messages=[{"role": "user", "content": "Tell me a story."}],

294

stream=True

295

)

296

297

for chunk in stream:

298

if chunk.choices[0].delta.content:

299

print(chunk.choices[0].delta.content, end="")

300

301

# With reasoning effort (o-series models)

302

response = client.chat.completions.create(

303

model="o1",

304

messages=[

305

{"role": "user", "content": "Solve this complex math problem: ..."}

306

],

307

reasoning_effort="high"

308

)

309

310

# With structured output (JSON Schema)

311

response = client.chat.completions.create(

312

model="gpt-4",

313

messages=[{"role": "user", "content": "List 3 colors"}],

314

response_format={

315

"type": "json_schema",

316

"json_schema": {

317

"name": "colors_response",

318

"strict": True,

319

"schema": {

320

"type": "object",

321

"properties": {

322

"colors": {

323

"type": "array",

324

"items": {"type": "string"}

325

}

326

},

327

"required": ["colors"],

328

"additionalProperties": False

329

}

330

}

331

}

332

)

333

```

334

335

### Parse with Structured Output

336

337

Create chat completion with automatic Pydantic model parsing for structured outputs.

338

339

```python { .api }

340

def parse(

341

self,

342

*,

343

messages: Iterable[ChatCompletionMessageParam],

344

model: str | ChatModel,

345

response_format: type[ResponseFormatT] | Omit = omit,

346

audio: ChatCompletionAudioParam | None | Omit = omit,

347

frequency_penalty: float | None | Omit = omit,

348

function_call: dict | str | Omit = omit,

349

functions: Iterable[dict] | Omit = omit,

350

logit_bias: dict[str, int] | None | Omit = omit,

351

logprobs: bool | None | Omit = omit,

352

max_completion_tokens: int | None | Omit = omit,

353

max_tokens: int | None | Omit = omit,

354

metadata: dict[str, str] | None | Omit = omit,

355

modalities: list[Literal["text", "audio"]] | None | Omit = omit,

356

n: int | None | Omit = omit,

357

parallel_tool_calls: bool | Omit = omit,

358

prediction: dict | None | Omit = omit,

359

presence_penalty: float | None | Omit = omit,

360

prompt_cache_key: str | Omit = omit,

361

prompt_cache_retention: Literal["in-memory", "24h"] | None | Omit = omit,

362

reasoning_effort: Literal["none", "minimal", "low", "medium", "high"] | None | Omit = omit,

363

safety_identifier: str | Omit = omit,

364

seed: int | None | Omit = omit,

365

service_tier: Literal["auto", "default", "flex", "scale", "priority"] | None | Omit = omit,

366

stop: str | list[str] | None | Omit = omit,

367

store: bool | None | Omit = omit,

368

stream_options: dict | None | Omit = omit,

369

temperature: float | None | Omit = omit,

370

tool_choice: ChatCompletionToolChoiceOptionParam | Omit = omit,

371

tools: Iterable[ChatCompletionToolUnionParam] | Omit = omit,

372

top_logprobs: int | None | Omit = omit,

373

top_p: float | None | Omit = omit,

374

user: str | Omit = omit,

375

verbosity: Literal["low", "medium", "high"] | None | Omit = omit,

376

web_search_options: dict | Omit = omit,

377

extra_headers: dict[str, str] | None = None,

378

extra_query: dict[str, object] | None = None,

379

extra_body: dict[str, object] | None = None,

380

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

381

) -> ParsedChatCompletion[ResponseFormatT]:

382

"""

383

Create chat completion with automatic Pydantic model parsing.

384

385

Converts Pydantic model to JSON schema, sends to API, and parses response

386

back into the model. Also automatically parses function tool calls when

387

using pydantic_function_tool() or strict mode.

388

389

Args:

390

messages: List of conversation messages.

391

model: Model ID to use.

392

response_format: Pydantic model class for structured output.

393

(other parameters same as create method)

394

395

Returns:

396

ParsedChatCompletion[ResponseFormatT]: Completion with parsed content.

397

Access via completion.choices[0].message.parsed

398

399

Raises:

400

Same as create method, plus validation errors for malformed responses.

401

"""

402

```

403

404

Usage example:

405

406

```python

407

from pydantic import BaseModel

408

from openai import OpenAI

409

410

# Define response structure

411

class Step(BaseModel):

412

explanation: str

413

output: str

414

415

class MathResponse(BaseModel):

416

steps: list[Step]

417

final_answer: str

418

419

client = OpenAI()

420

421

# Parse returns strongly-typed response

422

completion = client.chat.completions.parse(

423

model="gpt-4",

424

messages=[

425

{"role": "system", "content": "You are a helpful math tutor."},

426

{"role": "user", "content": "Solve: 8x + 31 = 2"}

427

],

428

response_format=MathResponse

429

)

430

431

# Access parsed content with full type safety

432

message = completion.choices[0].message

433

if message.parsed:

434

for step in message.parsed.steps:

435

print(f"{step.explanation}: {step.output}")

436

print(f"Answer: {message.parsed.final_answer}")

437

438

# With function tools using pydantic

439

from openai import pydantic_function_tool

440

441

class WeatherParams(BaseModel):

442

location: str

443

unit: Literal["celsius", "fahrenheit"] = "celsius"

444

445

tool = pydantic_function_tool(

446

WeatherParams,

447

name="get_weather",

448

description="Get current weather"

449

)

450

451

completion = client.chat.completions.parse(

452

model="gpt-4",

453

messages=[{"role": "user", "content": "What's the weather in NYC?"}],

454

tools=[tool],

455

response_format=MathResponse # For assistant's final response

456

)

457

458

# Tool calls are automatically parsed

459

if completion.choices[0].message.tool_calls:

460

for call in completion.choices[0].message.tool_calls:

461

if call.type == "function":

462

# call.function.parsed_arguments is a WeatherParams instance

463

params = call.function.parsed_arguments

464

print(f"Location: {params.location}, Unit: {params.unit}")

465

```

466

467

### Stored Chat Completions

468

469

Store, retrieve, update, and delete chat completions for persistent conversation management.

470

471

```python { .api }

472

def retrieve(

473

self,

474

completion_id: str,

475

*,

476

extra_headers: dict[str, str] | None = None,

477

extra_query: dict[str, object] | None = None,

478

extra_body: dict[str, object] | None = None,

479

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

480

) -> ChatCompletion:

481

"""

482

Retrieve a previously stored chat completion by its ID.

483

484

Args:

485

completion_id: The ID of the stored chat completion to retrieve.

486

487

Returns:

488

ChatCompletion: The stored completion object.

489

"""

490

491

def update(

492

self,

493

completion_id: str,

494

*,

495

metadata: dict[str, str] | None,

496

extra_headers: dict[str, str] | None = None,

497

extra_query: dict[str, object] | None = None,

498

extra_body: dict[str, object] | None = None,

499

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

500

) -> ChatCompletion:

501

"""

502

Update metadata for a stored chat completion.

503

504

Args:

505

completion_id: The ID of the stored completion to update.

506

metadata: Updated metadata key-value pairs (max 16 pairs). Required parameter.

507

508

Returns:

509

ChatCompletion: The updated completion object.

510

"""

511

512

def list(

513

self,

514

*,

515

after: str | Omit = omit,

516

limit: int | Omit = omit,

517

metadata: dict[str, str] | None | Omit = omit,

518

model: str | Omit = omit,

519

order: Literal["asc", "desc"] | Omit = omit,

520

extra_headers: dict[str, str] | None = None,

521

extra_query: dict[str, object] | None = None,

522

extra_body: dict[str, object] | None = None,

523

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

524

) -> SyncCursorPage[ChatCompletion]:

525

"""

526

List stored chat completions with cursor-based pagination.

527

528

Args:

529

after: Cursor for pagination (ID of object to start after).

530

limit: Maximum number of completions to return (default 20, max 100).

531

metadata: Filter by metadata key-value pairs. Only returns completions with matching metadata.

532

model: Filter by model. Only returns completions generated with the specified model.

533

order: Sort order: "asc" (oldest first) or "desc" (newest first).

534

535

Returns:

536

SyncCursorPage[ChatCompletion]: Paginated list of completions.

537

"""

538

539

def delete(

540

self,

541

completion_id: str,

542

*,

543

extra_headers: dict[str, str] | None = None,

544

extra_query: dict[str, object] | None = None,

545

extra_body: dict[str, object] | None = None,

546

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

547

) -> ChatCompletionDeleted:

548

"""

549

Delete a stored chat completion.

550

551

Args:

552

completion_id: The ID of the stored completion to delete.

553

554

Returns:

555

ChatCompletionDeleted: Confirmation of deletion with deleted=True.

556

"""

557

```

558

559

Access stored completion messages:

560

561

```python { .api }

562

# Via client.chat.completions.messages.list()

563

def list(

564

self,

565

completion_id: str,

566

*,

567

after: str | Omit = omit,

568

limit: int | Omit = omit,

569

order: Literal["asc", "desc"] | Omit = omit,

570

extra_headers: dict[str, str] | None = None,

571

extra_query: dict[str, object] | None = None,

572

extra_body: dict[str, object] | None = None,

573

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

574

) -> SyncCursorPage[ChatCompletionStoreMessage]:

575

"""

576

List messages from a stored chat completion.

577

578

Args:

579

completion_id: The ID of the stored completion.

580

after: Cursor for pagination.

581

limit: Maximum number of messages to return.

582

order: Sort order: "asc" or "desc".

583

584

Returns:

585

SyncCursorPage[ChatCompletionStoreMessage]: Paginated list of messages.

586

"""

587

```

588

589

#### Usage Example

590

591

```python

592

from openai import OpenAI

593

594

client = OpenAI()

595

596

# Create a chat completion with store=True

597

response = client.chat.completions.create(

598

model="gpt-4",

599

messages=[{"role": "user", "content": "Tell me about Python"}],

600

store=True,

601

metadata={"user_id": "user123", "session": "abc"}

602

)

603

604

completion_id = response.id

605

print(f"Stored completion: {completion_id}")

606

607

# Retrieve the stored completion later

608

stored = client.chat.completions.retrieve(completion_id)

609

print(stored.choices[0].message.content)

610

611

# Update metadata

612

updated = client.chat.completions.update(

613

completion_id,

614

metadata={"user_id": "user123", "session": "abc", "reviewed": "true"}

615

)

616

617

# List all stored completions

618

page = client.chat.completions.list(limit=10, order="desc")

619

for completion in page.data:

620

print(f"{completion.id}: {completion.created}")

621

622

# List messages from a specific completion

623

messages_page = client.chat.completions.messages.list(completion_id)

624

for message in messages_page.data:

625

print(f"{message.role}: {message.content}")

626

627

# Delete when no longer needed

628

deleted = client.chat.completions.delete(completion_id)

629

print(f"Deleted: {deleted.deleted}")

630

```

631

632

### Stream Chat Completions

633

634

Wrapper over `create(stream=True)` that provides a more granular event API and automatic accumulation of deltas. Requires use within a context manager.

635

636

```python { .api }

637

def stream(

638

self,

639

*,

640

messages: Iterable[ChatCompletionMessageParam],

641

model: str | ChatModel,

642

audio: ChatCompletionAudioParam | None | Omit = omit,

643

frequency_penalty: float | None | Omit = omit,

644

function_call: dict | str | Omit = omit,

645

functions: Iterable[dict] | Omit = omit,

646

logit_bias: dict[str, int] | None | Omit = omit,

647

logprobs: bool | None | Omit = omit,

648

max_completion_tokens: int | None | Omit = omit,

649

max_tokens: int | None | Omit = omit,

650

metadata: dict[str, str] | None | Omit = omit,

651

modalities: list[Literal["text", "audio"]] | None | Omit = omit,

652

n: int | None | Omit = omit,

653

parallel_tool_calls: bool | Omit = omit,

654

prediction: dict | None | Omit = omit,

655

presence_penalty: float | None | Omit = omit,

656

prompt_cache_key: str | Omit = omit,

657

prompt_cache_retention: Literal["in-memory", "24h"] | None | Omit = omit,

658

reasoning_effort: Literal["none", "minimal", "low", "medium", "high"] | None | Omit = omit,

659

response_format: completion_create_params.ResponseFormat | Omit = omit,

660

safety_identifier: str | Omit = omit,

661

seed: int | None | Omit = omit,

662

service_tier: Literal["auto", "default", "flex", "scale", "priority"] | None | Omit = omit,

663

stop: str | list[str] | None | Omit = omit,

664

store: bool | None | Omit = omit,

665

stream_options: dict | None | Omit = omit,

666

temperature: float | None | Omit = omit,

667

tool_choice: ChatCompletionToolChoiceOptionParam | Omit = omit,

668

tools: Iterable[ChatCompletionToolUnionParam] | Omit = omit,

669

top_logprobs: int | None | Omit = omit,

670

top_p: float | None | Omit = omit,

671

user: str | Omit = omit,

672

verbosity: Literal["low", "medium", "high"] | None | Omit = omit,

673

web_search_options: dict | Omit = omit,

674

extra_headers: dict[str, str] | None = None,

675

extra_query: dict[str, object] | None = None,

676

extra_body: dict[str, object] | None = None,

677

timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,

678

) -> ChatCompletionStreamManager:

679

"""

680

Streaming wrapper with granular event API and automatic delta accumulation.

681

682

Unlike create(stream=True), this method requires a context manager to prevent

683

resource leaks. Yields detailed events including content.delta, content.done,

684

and provides accumulated snapshots.

685

686

Args:

687

Same parameters as create() method.

688

689

Returns:

690

ChatCompletionStreamManager: Context manager yielding stream events.

691

"""

692

```

693

694

**Usage Example:**

695

696

```python

697

from openai import OpenAI

698

699

client = OpenAI()

700

701

# Must use within context manager

702

with client.chat.completions.stream(

703

model="gpt-4",

704

messages=[{"role": "user", "content": "Tell me a story"}],

705

) as stream:

706

for event in stream:

707

if event.type == "content.delta":

708

print(event.delta, flush=True, end="")

709

elif event.type == "content.done":

710

print(f"\nFinal content: {event.content}")

711

712

# Access accumulated result after streaming

713

print(f"Model: {stream.response.model}")

714

print(f"Total tokens: {stream.response.usage.total_tokens}")

715

```

716

717

### Streaming with Helpers

718

719

Advanced streaming with context managers for easier handling.

720

721

```python

722

from openai import OpenAI

723

724

client = OpenAI()

725

726

# Using stream() method with context manager

727

with client.chat.completions.create(

728

model="gpt-4",

729

messages=[{"role": "user", "content": "Count to 5"}],

730

stream=True

731

) as stream:

732

for chunk in stream:

733

print(chunk.choices[0].delta.content or "", end="")

734

735

# Using stream context manager explicitly

736

stream = client.chat.completions.create(

737

model="gpt-4",

738

messages=[{"role": "user", "content": "Tell me a joke"}],

739

stream=True

740

)

741

742

# Access streaming response

743

for chunk in stream:

744

if chunk.choices[0].delta.content:

745

print(chunk.choices[0].delta.content, end="", flush=True)

746

747

# Stream with usage information

748

stream = client.chat.completions.create(

749

model="gpt-4",

750

messages=[{"role": "user", "content": "Hello!"}],

751

stream=True,

752

stream_options={"include_usage": True}

753

)

754

755

for chunk in stream:

756

if chunk.choices[0].delta.content:

757

print(chunk.choices[0].delta.content, end="")

758

# Usage included in final chunk

759

if hasattr(chunk, 'usage') and chunk.usage:

760

print(f"\nTokens used: {chunk.usage.total_tokens}")

761

```

762

763

## Types

764

765

```python { .api }

766

from typing import Literal, TypeVar, Generic

767

from pydantic import BaseModel

768

from openai.types.chat import (

769

ChatCompletionToolUnionParam,

770

ChatCompletionToolChoiceOptionParam,

771

completion_create_params,

772

)

773

774

# Message types

775

ChatCompletionMessageParam = dict[str, Any] # Union of message types

776

777

class ChatCompletionSystemMessageParam(TypedDict):

778

role: Literal["system"]

779

content: str

780

name: NotRequired[str]

781

782

class ChatCompletionUserMessageParam(TypedDict):

783

role: Literal["user"]

784

content: str | list[dict] # Text or multimodal content

785

name: NotRequired[str]

786

787

class ChatCompletionAssistantMessageParam(TypedDict):

788

role: Literal["assistant"]

789

content: str | None

790

name: NotRequired[str]

791

tool_calls: NotRequired[list[dict]]

792

793

class ChatCompletionToolMessageParam(TypedDict):

794

role: Literal["tool"]

795

content: str

796

tool_call_id: str

797

798

# Response types

799

class ChatCompletion(BaseModel):

800

id: str

801

choices: list[Choice]

802

created: int

803

model: str

804

object: Literal["chat.completion"]

805

system_fingerprint: str | None

806

usage: CompletionUsage | None

807

808

class Choice(BaseModel):

809

finish_reason: Literal["stop", "length", "tool_calls", "content_filter", "function_call"]

810

index: int

811

logprobs: Logprobs | None

812

message: ChatCompletionMessage

813

814

class ChatCompletionMessage(BaseModel):

815

content: str | None

816

role: Literal["assistant"]

817

tool_calls: list[ChatCompletionMessageToolCall] | None

818

function_call: FunctionCall | None # Deprecated

819

audio: Audio | None # When modalities includes audio

820

821

class ChatCompletionStoreMessage(BaseModel):

822

"""Message from a stored chat completion."""

823

content: str | None

824

role: Literal["system", "user", "assistant", "tool"]

825

tool_calls: list[ChatCompletionMessageToolCall] | None

826

tool_call_id: str | None # For tool messages

827

828

class ChatCompletionMessageToolCall(BaseModel):

829

id: str

830

function: Function

831

type: Literal["function"]

832

833

class Function(BaseModel):

834

arguments: str # JSON string

835

name: str

836

837

class CompletionUsage(BaseModel):

838

completion_tokens: int

839

prompt_tokens: int

840

total_tokens: int

841

completion_tokens_details: CompletionTokensDetails | None

842

843

# Streaming types

844

class ChatCompletionChunk(BaseModel):

845

id: str

846

choices: list[ChunkChoice]

847

created: int

848

model: str

849

object: Literal["chat.completion.chunk"]

850

system_fingerprint: str | None

851

usage: CompletionUsage | None # Only in final chunk with include_usage

852

853

class ChunkChoice(BaseModel):

854

delta: ChoiceDelta

855

finish_reason: str | None

856

index: int

857

logprobs: Logprobs | None

858

859

class ChoiceDelta(BaseModel):

860

content: str | None

861

role: Literal["assistant"] | None

862

tool_calls: list[ChoiceDeltaToolCall] | None

863

864

# Parsed completion types

865

ResponseFormatT = TypeVar("ResponseFormatT", bound=BaseModel)

866

867

class ParsedChatCompletion(Generic[ResponseFormatT], ChatCompletion):

868

"""ChatCompletion with parsed content."""

869

choices: list[ParsedChoice[ResponseFormatT]]

870

871

class ParsedChoice(Generic[ResponseFormatT], Choice):

872

message: ParsedChatCompletionMessage[ResponseFormatT]

873

874

class ParsedChatCompletionMessage(Generic[ResponseFormatT], ChatCompletionMessage):

875

parsed: ResponseFormatT | None

876

tool_calls: list[ParsedFunctionToolCall] | None

877

878

class ParsedFunctionToolCall(ChatCompletionMessageToolCall):

879

function: ParsedFunction

880

type: Literal["function"]

881

882

class ParsedFunction(Function):

883

parsed_arguments: BaseModel | None

884

885

# Deletion type

886

class ChatCompletionDeleted(BaseModel):

887

id: str

888

deleted: bool

889

object: Literal["chat.completion"]

890

891

# Tool/function definitions

892

class ChatCompletionToolParam(TypedDict):

893

type: Literal["function"]

894

function: FunctionDefinition

895

896

class FunctionDefinition(TypedDict):

897

name: str

898

description: NotRequired[str]

899

parameters: dict # JSON Schema

900

strict: NotRequired[bool] # Enable strict schema adherence

901

902

# Response format types

903

class ResponseFormatText(TypedDict):

904

type: Literal["text"]

905

906

class ResponseFormatJSONObject(TypedDict):

907

type: Literal["json_object"]

908

909

class ResponseFormatJSONSchema(TypedDict):

910

type: Literal["json_schema"]

911

json_schema: JSONSchema

912

913

class JSONSchema(TypedDict):

914

name: str

915

description: NotRequired[str]

916

schema: dict # JSON Schema object

917

strict: NotRequired[bool]

918

919

# Audio types

920

class ChatCompletionAudioParam(TypedDict):

921

voice: Literal["alloy", "echo", "fable", "onyx", "nova", "shimmer"]

922

format: Literal["wav", "mp3", "flac", "opus", "pcm16"]

923

924

# Streaming options

925

class ChatCompletionStreamOptionsParam(TypedDict):

926

include_usage: NotRequired[bool]

927

928

# Tool choice types

929

ChatCompletionToolChoiceOptionParam = (

930

Literal["none", "auto", "required"] | dict

931

)

932

933

class ToolChoiceFunction(TypedDict):

934

type: Literal["function"]

935

function: FunctionChoice

936

937

class FunctionChoice(TypedDict):

938

name: str

939

940

# Stream wrapper type

941

class Stream(Generic[T]):

942

def __iter__(self) -> Iterator[T]: ...

943

def __next__(self) -> T: ...

944

def __enter__(self) -> Stream[T]: ...

945

def __exit__(self, *args) -> None: ...

946

def close(self) -> None: ...

947

```

948

949

## Async Usage

950

951

All chat completion methods are available in async variants through `AsyncOpenAI`:

952

953

```python

954

import asyncio

955

from openai import AsyncOpenAI

956

957

async def main():

958

client = AsyncOpenAI()

959

960

# Async create - returns ChatCompletion or AsyncStream[ChatCompletionChunk]

961

response = await client.chat.completions.create(

962

model="gpt-4",

963

messages=[{"role": "user", "content": "Hello!"}]

964

)

965

print(response.choices[0].message.content)

966

967

# Async streaming

968

async for chunk in await client.chat.completions.create(

969

model="gpt-4",

970

messages=[{"role": "user", "content": "Tell me a story"}],

971

stream=True

972

):

973

if chunk.choices[0].delta.content:

974

print(chunk.choices[0].delta.content, end="")

975

976

# Async parse - returns ParsedChatCompletion with structured output

977

from pydantic import BaseModel

978

979

class CalendarEvent(BaseModel):

980

name: str

981

date: str

982

participants: list[str]

983

984

response = await client.chat.completions.parse(

985

model="gpt-4o-2024-08-06",

986

messages=[{"role": "user", "content": "Alice and Bob are meeting on Friday"}],

987

response_format=CalendarEvent

988

)

989

event = response.choices[0].message.parsed

990

991

# Other async methods: retrieve, update, list, delete, stream

992

# All have the same signatures as sync versions

993

994

asyncio.run(main())

995

```

996

997

**Note**: AsyncOpenAI uses `AsyncStream[ChatCompletionChunk]` for streaming responses instead of `Stream[ChatCompletionChunk]`.

998

999

## Error Handling

1000

1001

```python

1002

from openai import OpenAI, APIError, RateLimitError

1003

1004

client = OpenAI()

1005

1006

try:

1007

response = client.chat.completions.create(

1008

model="gpt-4",

1009

messages=[{"role": "user", "content": "Hello"}]

1010

)

1011

except RateLimitError as e:

1012

print(f"Rate limit hit: {e}")

1013

# Handle rate limiting (e.g., retry with backoff)

1014

except APIError as e:

1015

print(f"API error: {e.status_code} - {e.message}")

1016

# Handle other API errors

1017

```

1018