docs
0
# Chat Completions
1
2
Create conversational responses using OpenAI's language models with support for text, vision inputs, audio, function calling, structured outputs, and streaming. Chat completions are the primary interface for interacting with models like GPT-4 and GPT-3.5.
3
4
## Access Patterns
5
6
Chat completions are accessible via:
7
- `client.chat.completions` - Primary access pattern
8
- `client.beta.chat.completions` - Beta namespace alias (identical to `client.chat`)
9
10
## Capabilities
11
12
### Create Chat Completion
13
14
Generate a response for a chat conversation with extensive configuration options.
15
16
```python { .api }
17
def create(
18
self,
19
*,
20
messages: Iterable[ChatCompletionMessageParam],
21
model: str | ChatModel,
22
audio: ChatCompletionAudioParam | None | Omit = omit,
23
frequency_penalty: float | None | Omit = omit,
24
function_call: dict | str | Omit = omit,
25
functions: Iterable[dict] | Omit = omit,
26
logit_bias: dict[str, int] | None | Omit = omit,
27
logprobs: bool | None | Omit = omit,
28
max_completion_tokens: int | None | Omit = omit,
29
max_tokens: int | None | Omit = omit,
30
metadata: dict[str, str] | None | Omit = omit,
31
modalities: list[Literal["text", "audio"]] | None | Omit = omit,
32
n: int | None | Omit = omit,
33
parallel_tool_calls: bool | Omit = omit,
34
prediction: dict | None | Omit = omit,
35
presence_penalty: float | None | Omit = omit,
36
prompt_cache_key: str | Omit = omit,
37
prompt_cache_retention: Literal["in-memory", "24h"] | None | Omit = omit,
38
reasoning_effort: Literal["none", "minimal", "low", "medium", "high"] | None | Omit = omit,
39
response_format: completion_create_params.ResponseFormat | Omit = omit,
40
safety_identifier: str | Omit = omit,
41
seed: int | None | Omit = omit,
42
service_tier: Literal["auto", "default", "flex", "scale", "priority"] | None | Omit = omit,
43
stop: str | list[str] | None | Omit = omit,
44
store: bool | None | Omit = omit,
45
stream: bool | Omit = omit,
46
stream_options: dict | None | Omit = omit,
47
temperature: float | None | Omit = omit,
48
tool_choice: ChatCompletionToolChoiceOptionParam | Omit = omit,
49
tools: Iterable[ChatCompletionToolUnionParam] | Omit = omit,
50
top_logprobs: int | None | Omit = omit,
51
top_p: float | None | Omit = omit,
52
user: str | Omit = omit,
53
verbosity: Literal["low", "medium", "high"] | None | Omit = omit,
54
web_search_options: dict | Omit = omit,
55
extra_headers: dict[str, str] | None = None,
56
extra_query: dict[str, object] | None = None,
57
extra_body: dict[str, object] | None = None,
58
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
59
) -> ChatCompletion | Stream[ChatCompletionChunk]:
60
"""
61
Create a model response for the given chat conversation.
62
63
Args:
64
messages: List of messages comprising the conversation. Each message can be:
65
- System message: {"role": "system", "content": "..."}
66
- User message: {"role": "user", "content": "..."}
67
- Assistant message: {"role": "assistant", "content": "..."}
68
- Tool message: {"role": "tool", "content": "...", "tool_call_id": "..."}
69
Supports text, images, and audio content.
70
71
model: Model ID like "gpt-4", "gpt-4-turbo", "gpt-3.5-turbo", or "o1".
72
See https://platform.openai.com/docs/models for available models.
73
74
audio: Parameters for audio output when modalities includes "audio".
75
{"voice": "alloy|echo|fable|onyx|nova|shimmer", "format": "wav|mp3|flac|opus|pcm16"}
76
77
frequency_penalty: Number between -2.0 and 2.0. Penalizes tokens based on frequency
78
in the text, reducing repetition. Default 0.
79
80
function_call: (Deprecated, use tool_choice) Controls function calling.
81
- "none": No function calls
82
- "auto": Model decides
83
- {"name": "function_name"}: Force specific function
84
85
functions: (Deprecated, use tools) List of function definitions.
86
87
logit_bias: Modify token probabilities. Maps token IDs to bias values from
88
-100 to 100. Values near ±1 slightly adjust probability, ±100 ban/force tokens.
89
90
logprobs: If true, returns log probabilities of output tokens.
91
92
max_completion_tokens: Maximum tokens for completion including reasoning tokens.
93
Preferred over max_tokens for o-series models.
94
95
max_tokens: (Deprecated for o-series) Maximum tokens in completion.
96
Use max_completion_tokens instead.
97
98
metadata: Up to 16 key-value pairs for storing additional object information.
99
Keys max 64 chars, values max 512 chars.
100
101
modalities: Output types to generate. Options: ["text"], ["audio"], ["text", "audio"].
102
Default ["text"]. Audio requires gpt-4o-audio-preview model.
103
104
n: Number of completion choices to generate. Default 1.
105
Costs scale with number of choices.
106
107
parallel_tool_calls: Enable parallel function calling during tool use.
108
Default true when tools are present.
109
110
prediction: Static predicted output for regeneration tasks (e.g., file content).
111
112
presence_penalty: Number between -2.0 and 2.0. Penalizes tokens based on
113
presence in text, encouraging new topics. Default 0.
114
115
prompt_cache_key: Cache identifier for optimizing similar requests.
116
Replaces the user field for caching.
117
118
prompt_cache_retention: Cache retention policy. "24h" enables extended caching
119
up to 24 hours. Default "in-memory".
120
121
reasoning_effort: Effort level for reasoning models (o-series).
122
Options: "none", "minimal", "low", "medium", "high".
123
- gpt-5.1: defaults to "none"
124
- Other models: default "medium"
125
- gpt-5-pro: only supports "high"
126
127
response_format: Output format specification.
128
- {"type": "text"}: Plain text (default)
129
- {"type": "json_object"}: Valid JSON
130
- {"type": "json_schema", "json_schema": {...}}: Structured Outputs
131
132
safety_identifier: Stable user identifier (hashed) for policy violation detection.
133
134
seed: For deterministic sampling (Beta). Same seed + parameters should return
135
same result, but not guaranteed. Check system_fingerprint for changes.
136
137
service_tier: Processing type for serving. Options: "auto", "default", "flex",
138
"scale", "priority". Affects latency and pricing.
139
140
stop: Up to 4 sequences where generation stops. Can be string or list of strings.
141
142
store: If true, stores the completion for model distillation/evals.
143
144
stream: If true, returns SSE stream of ChatCompletionChunk objects.
145
Returns Stream[ChatCompletionChunk] instead of ChatCompletion.
146
147
stream_options: Streaming configuration. Accepts dict with:
148
- "include_usage": bool - If true, includes token usage in final chunk
149
- "include_obfuscation": bool - If true (default), adds random characters
150
to obfuscation field on streaming delta events to normalize payload sizes
151
as mitigation for side-channel attacks. Set to false to optimize bandwidth.
152
153
temperature: Sampling temperature between 0 and 2. Higher values (e.g., 0.8)
154
make output more random, lower values (e.g., 0.2) more deterministic.
155
Default 1. Alter this or top_p, not both.
156
157
tool_choice: Controls tool/function calling.
158
- "none": No tools called
159
- "auto": Model decides (default when tools present)
160
- "required": Model must call at least one tool
161
- {"type": "function", "function": {"name": "..."}}: Force specific tool
162
163
tools: List of tools/functions available to the model. Each tool:
164
{
165
"type": "function",
166
"function": {
167
"name": "function_name",
168
"description": "What it does",
169
"parameters": {...} # JSON Schema
170
}
171
}
172
173
top_logprobs: Number of most likely tokens to return with logprobs (0-20).
174
Requires logprobs=true.
175
176
top_p: Nucleus sampling parameter between 0 and 1. Model considers tokens
177
with top_p probability mass. E.g., 0.1 means only tokens in top 10%.
178
Default 1. Alter this or temperature, not both.
179
180
user: (Deprecated for caching, use prompt_cache_key) Unique user identifier
181
for abuse monitoring.
182
183
verbosity: Output detail level for reasoning models. Options: "low", "medium", "high".
184
185
web_search_options: Web search configuration (if available for model).
186
187
extra_headers: Additional HTTP headers for the request.
188
189
extra_query: Additional query parameters for the request.
190
191
extra_body: Additional JSON fields for the request body.
192
193
timeout: Request timeout in seconds.
194
195
Returns:
196
ChatCompletion: If stream=False (default), returns complete response.
197
Stream[ChatCompletionChunk]: If stream=True, returns streaming response.
198
199
Raises:
200
BadRequestError: Invalid parameters
201
AuthenticationError: Invalid API key
202
RateLimitError: Rate limit exceeded
203
APIError: Other API errors
204
"""
205
```
206
207
Usage examples:
208
209
```python
210
from openai import OpenAI
211
212
client = OpenAI()
213
214
# Basic chat completion
215
response = client.chat.completions.create(
216
model="gpt-4",
217
messages=[
218
{"role": "system", "content": "You are a helpful assistant."},
219
{"role": "user", "content": "What is the capital of France?"}
220
]
221
)
222
print(response.choices[0].message.content)
223
224
# With multiple messages and temperature
225
response = client.chat.completions.create(
226
model="gpt-3.5-turbo",
227
messages=[
228
{"role": "system", "content": "You are a creative writer."},
229
{"role": "user", "content": "Write a haiku about coding."},
230
],
231
temperature=0.8,
232
max_tokens=100
233
)
234
235
# With vision (image input)
236
response = client.chat.completions.create(
237
model="gpt-4-turbo",
238
messages=[
239
{
240
"role": "user",
241
"content": [
242
{"type": "text", "text": "What's in this image?"},
243
{
244
"type": "image_url",
245
"image_url": {"url": "https://example.com/image.jpg"}
246
}
247
]
248
}
249
]
250
)
251
252
# With function calling / tools
253
tools = [
254
{
255
"type": "function",
256
"function": {
257
"name": "get_weather",
258
"description": "Get current weather for a location",
259
"parameters": {
260
"type": "object",
261
"properties": {
262
"location": {
263
"type": "string",
264
"description": "City name, e.g., San Francisco"
265
},
266
"unit": {
267
"type": "string",
268
"enum": ["celsius", "fahrenheit"]
269
}
270
},
271
"required": ["location"]
272
}
273
}
274
}
275
]
276
277
response = client.chat.completions.create(
278
model="gpt-4",
279
messages=[{"role": "user", "content": "What's the weather in Boston?"}],
280
tools=tools,
281
tool_choice="auto"
282
)
283
284
# Check if model wants to call a function
285
if response.choices[0].message.tool_calls:
286
for tool_call in response.choices[0].message.tool_calls:
287
print(f"Function: {tool_call.function.name}")
288
print(f"Arguments: {tool_call.function.arguments}")
289
290
# Streaming response
291
stream = client.chat.completions.create(
292
model="gpt-4",
293
messages=[{"role": "user", "content": "Tell me a story."}],
294
stream=True
295
)
296
297
for chunk in stream:
298
if chunk.choices[0].delta.content:
299
print(chunk.choices[0].delta.content, end="")
300
301
# With reasoning effort (o-series models)
302
response = client.chat.completions.create(
303
model="o1",
304
messages=[
305
{"role": "user", "content": "Solve this complex math problem: ..."}
306
],
307
reasoning_effort="high"
308
)
309
310
# With structured output (JSON Schema)
311
response = client.chat.completions.create(
312
model="gpt-4",
313
messages=[{"role": "user", "content": "List 3 colors"}],
314
response_format={
315
"type": "json_schema",
316
"json_schema": {
317
"name": "colors_response",
318
"strict": True,
319
"schema": {
320
"type": "object",
321
"properties": {
322
"colors": {
323
"type": "array",
324
"items": {"type": "string"}
325
}
326
},
327
"required": ["colors"],
328
"additionalProperties": False
329
}
330
}
331
}
332
)
333
```
334
335
### Parse with Structured Output
336
337
Create chat completion with automatic Pydantic model parsing for structured outputs.
338
339
```python { .api }
340
def parse(
341
self,
342
*,
343
messages: Iterable[ChatCompletionMessageParam],
344
model: str | ChatModel,
345
response_format: type[ResponseFormatT] | Omit = omit,
346
audio: ChatCompletionAudioParam | None | Omit = omit,
347
frequency_penalty: float | None | Omit = omit,
348
function_call: dict | str | Omit = omit,
349
functions: Iterable[dict] | Omit = omit,
350
logit_bias: dict[str, int] | None | Omit = omit,
351
logprobs: bool | None | Omit = omit,
352
max_completion_tokens: int | None | Omit = omit,
353
max_tokens: int | None | Omit = omit,
354
metadata: dict[str, str] | None | Omit = omit,
355
modalities: list[Literal["text", "audio"]] | None | Omit = omit,
356
n: int | None | Omit = omit,
357
parallel_tool_calls: bool | Omit = omit,
358
prediction: dict | None | Omit = omit,
359
presence_penalty: float | None | Omit = omit,
360
prompt_cache_key: str | Omit = omit,
361
prompt_cache_retention: Literal["in-memory", "24h"] | None | Omit = omit,
362
reasoning_effort: Literal["none", "minimal", "low", "medium", "high"] | None | Omit = omit,
363
safety_identifier: str | Omit = omit,
364
seed: int | None | Omit = omit,
365
service_tier: Literal["auto", "default", "flex", "scale", "priority"] | None | Omit = omit,
366
stop: str | list[str] | None | Omit = omit,
367
store: bool | None | Omit = omit,
368
stream_options: dict | None | Omit = omit,
369
temperature: float | None | Omit = omit,
370
tool_choice: ChatCompletionToolChoiceOptionParam | Omit = omit,
371
tools: Iterable[ChatCompletionToolUnionParam] | Omit = omit,
372
top_logprobs: int | None | Omit = omit,
373
top_p: float | None | Omit = omit,
374
user: str | Omit = omit,
375
verbosity: Literal["low", "medium", "high"] | None | Omit = omit,
376
web_search_options: dict | Omit = omit,
377
extra_headers: dict[str, str] | None = None,
378
extra_query: dict[str, object] | None = None,
379
extra_body: dict[str, object] | None = None,
380
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
381
) -> ParsedChatCompletion[ResponseFormatT]:
382
"""
383
Create chat completion with automatic Pydantic model parsing.
384
385
Converts Pydantic model to JSON schema, sends to API, and parses response
386
back into the model. Also automatically parses function tool calls when
387
using pydantic_function_tool() or strict mode.
388
389
Args:
390
messages: List of conversation messages.
391
model: Model ID to use.
392
response_format: Pydantic model class for structured output.
393
(other parameters same as create method)
394
395
Returns:
396
ParsedChatCompletion[ResponseFormatT]: Completion with parsed content.
397
Access via completion.choices[0].message.parsed
398
399
Raises:
400
Same as create method, plus validation errors for malformed responses.
401
"""
402
```
403
404
Usage example:
405
406
```python
407
from pydantic import BaseModel
408
from openai import OpenAI
409
410
# Define response structure
411
class Step(BaseModel):
412
explanation: str
413
output: str
414
415
class MathResponse(BaseModel):
416
steps: list[Step]
417
final_answer: str
418
419
client = OpenAI()
420
421
# Parse returns strongly-typed response
422
completion = client.chat.completions.parse(
423
model="gpt-4",
424
messages=[
425
{"role": "system", "content": "You are a helpful math tutor."},
426
{"role": "user", "content": "Solve: 8x + 31 = 2"}
427
],
428
response_format=MathResponse
429
)
430
431
# Access parsed content with full type safety
432
message = completion.choices[0].message
433
if message.parsed:
434
for step in message.parsed.steps:
435
print(f"{step.explanation}: {step.output}")
436
print(f"Answer: {message.parsed.final_answer}")
437
438
# With function tools using pydantic
439
from openai import pydantic_function_tool
440
441
class WeatherParams(BaseModel):
442
location: str
443
unit: Literal["celsius", "fahrenheit"] = "celsius"
444
445
tool = pydantic_function_tool(
446
WeatherParams,
447
name="get_weather",
448
description="Get current weather"
449
)
450
451
completion = client.chat.completions.parse(
452
model="gpt-4",
453
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
454
tools=[tool],
455
response_format=MathResponse # For assistant's final response
456
)
457
458
# Tool calls are automatically parsed
459
if completion.choices[0].message.tool_calls:
460
for call in completion.choices[0].message.tool_calls:
461
if call.type == "function":
462
# call.function.parsed_arguments is a WeatherParams instance
463
params = call.function.parsed_arguments
464
print(f"Location: {params.location}, Unit: {params.unit}")
465
```
466
467
### Stored Chat Completions
468
469
Store, retrieve, update, and delete chat completions for persistent conversation management.
470
471
```python { .api }
472
def retrieve(
473
self,
474
completion_id: str,
475
*,
476
extra_headers: dict[str, str] | None = None,
477
extra_query: dict[str, object] | None = None,
478
extra_body: dict[str, object] | None = None,
479
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
480
) -> ChatCompletion:
481
"""
482
Retrieve a previously stored chat completion by its ID.
483
484
Args:
485
completion_id: The ID of the stored chat completion to retrieve.
486
487
Returns:
488
ChatCompletion: The stored completion object.
489
"""
490
491
def update(
492
self,
493
completion_id: str,
494
*,
495
metadata: dict[str, str] | None,
496
extra_headers: dict[str, str] | None = None,
497
extra_query: dict[str, object] | None = None,
498
extra_body: dict[str, object] | None = None,
499
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
500
) -> ChatCompletion:
501
"""
502
Update metadata for a stored chat completion.
503
504
Args:
505
completion_id: The ID of the stored completion to update.
506
metadata: Updated metadata key-value pairs (max 16 pairs). Required parameter.
507
508
Returns:
509
ChatCompletion: The updated completion object.
510
"""
511
512
def list(
513
self,
514
*,
515
after: str | Omit = omit,
516
limit: int | Omit = omit,
517
metadata: dict[str, str] | None | Omit = omit,
518
model: str | Omit = omit,
519
order: Literal["asc", "desc"] | Omit = omit,
520
extra_headers: dict[str, str] | None = None,
521
extra_query: dict[str, object] | None = None,
522
extra_body: dict[str, object] | None = None,
523
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
524
) -> SyncCursorPage[ChatCompletion]:
525
"""
526
List stored chat completions with cursor-based pagination.
527
528
Args:
529
after: Cursor for pagination (ID of object to start after).
530
limit: Maximum number of completions to return (default 20, max 100).
531
metadata: Filter by metadata key-value pairs. Only returns completions with matching metadata.
532
model: Filter by model. Only returns completions generated with the specified model.
533
order: Sort order: "asc" (oldest first) or "desc" (newest first).
534
535
Returns:
536
SyncCursorPage[ChatCompletion]: Paginated list of completions.
537
"""
538
539
def delete(
540
self,
541
completion_id: str,
542
*,
543
extra_headers: dict[str, str] | None = None,
544
extra_query: dict[str, object] | None = None,
545
extra_body: dict[str, object] | None = None,
546
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
547
) -> ChatCompletionDeleted:
548
"""
549
Delete a stored chat completion.
550
551
Args:
552
completion_id: The ID of the stored completion to delete.
553
554
Returns:
555
ChatCompletionDeleted: Confirmation of deletion with deleted=True.
556
"""
557
```
558
559
Access stored completion messages:
560
561
```python { .api }
562
# Via client.chat.completions.messages.list()
563
def list(
564
self,
565
completion_id: str,
566
*,
567
after: str | Omit = omit,
568
limit: int | Omit = omit,
569
order: Literal["asc", "desc"] | Omit = omit,
570
extra_headers: dict[str, str] | None = None,
571
extra_query: dict[str, object] | None = None,
572
extra_body: dict[str, object] | None = None,
573
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
574
) -> SyncCursorPage[ChatCompletionStoreMessage]:
575
"""
576
List messages from a stored chat completion.
577
578
Args:
579
completion_id: The ID of the stored completion.
580
after: Cursor for pagination.
581
limit: Maximum number of messages to return.
582
order: Sort order: "asc" or "desc".
583
584
Returns:
585
SyncCursorPage[ChatCompletionStoreMessage]: Paginated list of messages.
586
"""
587
```
588
589
#### Usage Example
590
591
```python
592
from openai import OpenAI
593
594
client = OpenAI()
595
596
# Create a chat completion with store=True
597
response = client.chat.completions.create(
598
model="gpt-4",
599
messages=[{"role": "user", "content": "Tell me about Python"}],
600
store=True,
601
metadata={"user_id": "user123", "session": "abc"}
602
)
603
604
completion_id = response.id
605
print(f"Stored completion: {completion_id}")
606
607
# Retrieve the stored completion later
608
stored = client.chat.completions.retrieve(completion_id)
609
print(stored.choices[0].message.content)
610
611
# Update metadata
612
updated = client.chat.completions.update(
613
completion_id,
614
metadata={"user_id": "user123", "session": "abc", "reviewed": "true"}
615
)
616
617
# List all stored completions
618
page = client.chat.completions.list(limit=10, order="desc")
619
for completion in page.data:
620
print(f"{completion.id}: {completion.created}")
621
622
# List messages from a specific completion
623
messages_page = client.chat.completions.messages.list(completion_id)
624
for message in messages_page.data:
625
print(f"{message.role}: {message.content}")
626
627
# Delete when no longer needed
628
deleted = client.chat.completions.delete(completion_id)
629
print(f"Deleted: {deleted.deleted}")
630
```
631
632
### Stream Chat Completions
633
634
Wrapper over `create(stream=True)` that provides a more granular event API and automatic accumulation of deltas. Requires use within a context manager.
635
636
```python { .api }
637
def stream(
638
self,
639
*,
640
messages: Iterable[ChatCompletionMessageParam],
641
model: str | ChatModel,
642
audio: ChatCompletionAudioParam | None | Omit = omit,
643
frequency_penalty: float | None | Omit = omit,
644
function_call: dict | str | Omit = omit,
645
functions: Iterable[dict] | Omit = omit,
646
logit_bias: dict[str, int] | None | Omit = omit,
647
logprobs: bool | None | Omit = omit,
648
max_completion_tokens: int | None | Omit = omit,
649
max_tokens: int | None | Omit = omit,
650
metadata: dict[str, str] | None | Omit = omit,
651
modalities: list[Literal["text", "audio"]] | None | Omit = omit,
652
n: int | None | Omit = omit,
653
parallel_tool_calls: bool | Omit = omit,
654
prediction: dict | None | Omit = omit,
655
presence_penalty: float | None | Omit = omit,
656
prompt_cache_key: str | Omit = omit,
657
prompt_cache_retention: Literal["in-memory", "24h"] | None | Omit = omit,
658
reasoning_effort: Literal["none", "minimal", "low", "medium", "high"] | None | Omit = omit,
659
response_format: completion_create_params.ResponseFormat | Omit = omit,
660
safety_identifier: str | Omit = omit,
661
seed: int | None | Omit = omit,
662
service_tier: Literal["auto", "default", "flex", "scale", "priority"] | None | Omit = omit,
663
stop: str | list[str] | None | Omit = omit,
664
store: bool | None | Omit = omit,
665
stream_options: dict | None | Omit = omit,
666
temperature: float | None | Omit = omit,
667
tool_choice: ChatCompletionToolChoiceOptionParam | Omit = omit,
668
tools: Iterable[ChatCompletionToolUnionParam] | Omit = omit,
669
top_logprobs: int | None | Omit = omit,
670
top_p: float | None | Omit = omit,
671
user: str | Omit = omit,
672
verbosity: Literal["low", "medium", "high"] | None | Omit = omit,
673
web_search_options: dict | Omit = omit,
674
extra_headers: dict[str, str] | None = None,
675
extra_query: dict[str, object] | None = None,
676
extra_body: dict[str, object] | None = None,
677
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
678
) -> ChatCompletionStreamManager:
679
"""
680
Streaming wrapper with granular event API and automatic delta accumulation.
681
682
Unlike create(stream=True), this method requires a context manager to prevent
683
resource leaks. Yields detailed events including content.delta, content.done,
684
and provides accumulated snapshots.
685
686
Args:
687
Same parameters as create() method.
688
689
Returns:
690
ChatCompletionStreamManager: Context manager yielding stream events.
691
"""
692
```
693
694
**Usage Example:**
695
696
```python
697
from openai import OpenAI
698
699
client = OpenAI()
700
701
# Must use within context manager
702
with client.chat.completions.stream(
703
model="gpt-4",
704
messages=[{"role": "user", "content": "Tell me a story"}],
705
) as stream:
706
for event in stream:
707
if event.type == "content.delta":
708
print(event.delta, flush=True, end="")
709
elif event.type == "content.done":
710
print(f"\nFinal content: {event.content}")
711
712
# Access accumulated result after streaming
713
print(f"Model: {stream.response.model}")
714
print(f"Total tokens: {stream.response.usage.total_tokens}")
715
```
716
717
### Streaming with Helpers
718
719
Advanced streaming with context managers for easier handling.
720
721
```python
722
from openai import OpenAI
723
724
client = OpenAI()
725
726
# Using stream() method with context manager
727
with client.chat.completions.create(
728
model="gpt-4",
729
messages=[{"role": "user", "content": "Count to 5"}],
730
stream=True
731
) as stream:
732
for chunk in stream:
733
print(chunk.choices[0].delta.content or "", end="")
734
735
# Using stream context manager explicitly
736
stream = client.chat.completions.create(
737
model="gpt-4",
738
messages=[{"role": "user", "content": "Tell me a joke"}],
739
stream=True
740
)
741
742
# Access streaming response
743
for chunk in stream:
744
if chunk.choices[0].delta.content:
745
print(chunk.choices[0].delta.content, end="", flush=True)
746
747
# Stream with usage information
748
stream = client.chat.completions.create(
749
model="gpt-4",
750
messages=[{"role": "user", "content": "Hello!"}],
751
stream=True,
752
stream_options={"include_usage": True}
753
)
754
755
for chunk in stream:
756
if chunk.choices[0].delta.content:
757
print(chunk.choices[0].delta.content, end="")
758
# Usage included in final chunk
759
if hasattr(chunk, 'usage') and chunk.usage:
760
print(f"\nTokens used: {chunk.usage.total_tokens}")
761
```
762
763
## Types
764
765
```python { .api }
766
from typing import Literal, TypeVar, Generic
767
from pydantic import BaseModel
768
from openai.types.chat import (
769
ChatCompletionToolUnionParam,
770
ChatCompletionToolChoiceOptionParam,
771
completion_create_params,
772
)
773
774
# Message types
775
ChatCompletionMessageParam = dict[str, Any] # Union of message types
776
777
class ChatCompletionSystemMessageParam(TypedDict):
778
role: Literal["system"]
779
content: str
780
name: NotRequired[str]
781
782
class ChatCompletionUserMessageParam(TypedDict):
783
role: Literal["user"]
784
content: str | list[dict] # Text or multimodal content
785
name: NotRequired[str]
786
787
class ChatCompletionAssistantMessageParam(TypedDict):
788
role: Literal["assistant"]
789
content: str | None
790
name: NotRequired[str]
791
tool_calls: NotRequired[list[dict]]
792
793
class ChatCompletionToolMessageParam(TypedDict):
794
role: Literal["tool"]
795
content: str
796
tool_call_id: str
797
798
# Response types
799
class ChatCompletion(BaseModel):
800
id: str
801
choices: list[Choice]
802
created: int
803
model: str
804
object: Literal["chat.completion"]
805
system_fingerprint: str | None
806
usage: CompletionUsage | None
807
808
class Choice(BaseModel):
809
finish_reason: Literal["stop", "length", "tool_calls", "content_filter", "function_call"]
810
index: int
811
logprobs: Logprobs | None
812
message: ChatCompletionMessage
813
814
class ChatCompletionMessage(BaseModel):
815
content: str | None
816
role: Literal["assistant"]
817
tool_calls: list[ChatCompletionMessageToolCall] | None
818
function_call: FunctionCall | None # Deprecated
819
audio: Audio | None # When modalities includes audio
820
821
class ChatCompletionStoreMessage(BaseModel):
822
"""Message from a stored chat completion."""
823
content: str | None
824
role: Literal["system", "user", "assistant", "tool"]
825
tool_calls: list[ChatCompletionMessageToolCall] | None
826
tool_call_id: str | None # For tool messages
827
828
class ChatCompletionMessageToolCall(BaseModel):
829
id: str
830
function: Function
831
type: Literal["function"]
832
833
class Function(BaseModel):
834
arguments: str # JSON string
835
name: str
836
837
class CompletionUsage(BaseModel):
838
completion_tokens: int
839
prompt_tokens: int
840
total_tokens: int
841
completion_tokens_details: CompletionTokensDetails | None
842
843
# Streaming types
844
class ChatCompletionChunk(BaseModel):
845
id: str
846
choices: list[ChunkChoice]
847
created: int
848
model: str
849
object: Literal["chat.completion.chunk"]
850
system_fingerprint: str | None
851
usage: CompletionUsage | None # Only in final chunk with include_usage
852
853
class ChunkChoice(BaseModel):
854
delta: ChoiceDelta
855
finish_reason: str | None
856
index: int
857
logprobs: Logprobs | None
858
859
class ChoiceDelta(BaseModel):
860
content: str | None
861
role: Literal["assistant"] | None
862
tool_calls: list[ChoiceDeltaToolCall] | None
863
864
# Parsed completion types
865
ResponseFormatT = TypeVar("ResponseFormatT", bound=BaseModel)
866
867
class ParsedChatCompletion(Generic[ResponseFormatT], ChatCompletion):
868
"""ChatCompletion with parsed content."""
869
choices: list[ParsedChoice[ResponseFormatT]]
870
871
class ParsedChoice(Generic[ResponseFormatT], Choice):
872
message: ParsedChatCompletionMessage[ResponseFormatT]
873
874
class ParsedChatCompletionMessage(Generic[ResponseFormatT], ChatCompletionMessage):
875
parsed: ResponseFormatT | None
876
tool_calls: list[ParsedFunctionToolCall] | None
877
878
class ParsedFunctionToolCall(ChatCompletionMessageToolCall):
879
function: ParsedFunction
880
type: Literal["function"]
881
882
class ParsedFunction(Function):
883
parsed_arguments: BaseModel | None
884
885
# Deletion type
886
class ChatCompletionDeleted(BaseModel):
887
id: str
888
deleted: bool
889
object: Literal["chat.completion"]
890
891
# Tool/function definitions
892
class ChatCompletionToolParam(TypedDict):
893
type: Literal["function"]
894
function: FunctionDefinition
895
896
class FunctionDefinition(TypedDict):
897
name: str
898
description: NotRequired[str]
899
parameters: dict # JSON Schema
900
strict: NotRequired[bool] # Enable strict schema adherence
901
902
# Response format types
903
class ResponseFormatText(TypedDict):
904
type: Literal["text"]
905
906
class ResponseFormatJSONObject(TypedDict):
907
type: Literal["json_object"]
908
909
class ResponseFormatJSONSchema(TypedDict):
910
type: Literal["json_schema"]
911
json_schema: JSONSchema
912
913
class JSONSchema(TypedDict):
914
name: str
915
description: NotRequired[str]
916
schema: dict # JSON Schema object
917
strict: NotRequired[bool]
918
919
# Audio types
920
class ChatCompletionAudioParam(TypedDict):
921
voice: Literal["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
922
format: Literal["wav", "mp3", "flac", "opus", "pcm16"]
923
924
# Streaming options
925
class ChatCompletionStreamOptionsParam(TypedDict):
926
include_usage: NotRequired[bool]
927
928
# Tool choice types
929
ChatCompletionToolChoiceOptionParam = (
930
Literal["none", "auto", "required"] | dict
931
)
932
933
class ToolChoiceFunction(TypedDict):
934
type: Literal["function"]
935
function: FunctionChoice
936
937
class FunctionChoice(TypedDict):
938
name: str
939
940
# Stream wrapper type
941
class Stream(Generic[T]):
942
def __iter__(self) -> Iterator[T]: ...
943
def __next__(self) -> T: ...
944
def __enter__(self) -> Stream[T]: ...
945
def __exit__(self, *args) -> None: ...
946
def close(self) -> None: ...
947
```
948
949
## Async Usage
950
951
All chat completion methods are available in async variants through `AsyncOpenAI`:
952
953
```python
954
import asyncio
955
from openai import AsyncOpenAI
956
957
async def main():
958
client = AsyncOpenAI()
959
960
# Async create - returns ChatCompletion or AsyncStream[ChatCompletionChunk]
961
response = await client.chat.completions.create(
962
model="gpt-4",
963
messages=[{"role": "user", "content": "Hello!"}]
964
)
965
print(response.choices[0].message.content)
966
967
# Async streaming
968
async for chunk in await client.chat.completions.create(
969
model="gpt-4",
970
messages=[{"role": "user", "content": "Tell me a story"}],
971
stream=True
972
):
973
if chunk.choices[0].delta.content:
974
print(chunk.choices[0].delta.content, end="")
975
976
# Async parse - returns ParsedChatCompletion with structured output
977
from pydantic import BaseModel
978
979
class CalendarEvent(BaseModel):
980
name: str
981
date: str
982
participants: list[str]
983
984
response = await client.chat.completions.parse(
985
model="gpt-4o-2024-08-06",
986
messages=[{"role": "user", "content": "Alice and Bob are meeting on Friday"}],
987
response_format=CalendarEvent
988
)
989
event = response.choices[0].message.parsed
990
991
# Other async methods: retrieve, update, list, delete, stream
992
# All have the same signatures as sync versions
993
994
asyncio.run(main())
995
```
996
997
**Note**: AsyncOpenAI uses `AsyncStream[ChatCompletionChunk]` for streaming responses instead of `Stream[ChatCompletionChunk]`.
998
999
## Error Handling
1000
1001
```python
1002
from openai import OpenAI, APIError, RateLimitError
1003
1004
client = OpenAI()
1005
1006
try:
1007
response = client.chat.completions.create(
1008
model="gpt-4",
1009
messages=[{"role": "user", "content": "Hello"}]
1010
)
1011
except RateLimitError as e:
1012
print(f"Rate limit hit: {e}")
1013
# Handle rate limiting (e.g., retry with backoff)
1014
except APIError as e:
1015
print(f"API error: {e.status_code} - {e.message}")
1016
# Handle other API errors
1017
```
1018