docs
0
# Uploads
1
2
Upload large files in chunks for use with Assistants, Fine-tuning, and Batch processing. The Uploads API enables efficient multipart upload of files up to 8 GB, splitting them into 64 MB parts that can be uploaded in parallel.
3
4
## Capabilities
5
6
### Create Upload
7
8
Create an intermediate Upload object that accepts multiple parts.
9
10
```python { .api }
11
def create(
12
self,
13
*,
14
bytes: int,
15
filename: str,
16
mime_type: str,
17
purpose: FilePurpose,
18
expires_after: dict | Omit = omit,
19
extra_headers: dict[str, str] | None = None,
20
extra_query: dict[str, object] | None = None,
21
extra_body: dict[str, object] | None = None,
22
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
23
) -> Upload:
24
"""
25
Create an intermediate Upload object for adding file parts.
26
27
Args:
28
bytes: Total number of bytes in the file being uploaded.
29
30
filename: Name of the file to upload.
31
32
mime_type: MIME type of the file. Must be supported for the specified purpose.
33
See https://platform.openai.com/docs/assistants/tools/file-search#supported-files
34
35
purpose: Intended purpose of the file. Options:
36
- "assistants": For use with Assistants API
37
- "batch": For batch processing
38
- "fine-tune": For fine-tuning
39
- "vision": For vision capabilities
40
See https://platform.openai.com/docs/api-reference/files/create#files-create-purpose
41
42
expires_after: Expiration policy for the file. Default: files with purpose=batch
43
expire after 30 days, others persist until manually deleted.
44
{"anchor": "created_at", "days": 7} expires 7 days after creation.
45
46
extra_headers: Additional HTTP headers.
47
extra_query: Additional query parameters.
48
extra_body: Additional JSON fields.
49
timeout: Request timeout in seconds.
50
51
Returns:
52
Upload: Upload object with ID to use for adding parts.
53
Contains status, expires_at, and other metadata.
54
55
Notes:
56
- Maximum upload size: 8 GB
57
- Upload expires 1 hour after creation
58
- Must complete upload before expiration
59
- Each part can be at most 64 MB
60
"""
61
```
62
63
### Complete Upload
64
65
Finalize the upload after all parts have been added.
66
67
```python { .api }
68
def complete(
69
self,
70
upload_id: str,
71
*,
72
part_ids: list[str],
73
md5: str | Omit = omit,
74
extra_headers: dict[str, str] | None = None,
75
extra_query: dict[str, object] | None = None,
76
extra_body: dict[str, object] | None = None,
77
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
78
) -> Upload:
79
"""
80
Complete the Upload and create a File object.
81
82
Args:
83
upload_id: ID of the Upload to complete.
84
85
part_ids: Ordered list of Part IDs. Order determines how parts are assembled.
86
87
md5: Optional MD5 checksum to verify uploaded bytes match expectations.
88
89
extra_headers: Additional HTTP headers.
90
extra_query: Additional query parameters.
91
extra_body: Additional JSON fields.
92
timeout: Request timeout in seconds.
93
94
Returns:
95
Upload: Completed Upload object containing a nested File object
96
ready for use in the rest of the platform.
97
98
Notes:
99
- Total bytes uploaded must match bytes specified in create()
100
- No parts can be added after completion
101
- Upload must not be cancelled or expired
102
"""
103
```
104
105
### Cancel Upload
106
107
Cancel an upload that is no longer needed.
108
109
```python { .api }
110
def cancel(
111
self,
112
upload_id: str,
113
*,
114
extra_headers: dict[str, str] | None = None,
115
extra_query: dict[str, object] | None = None,
116
extra_body: dict[str, object] | None = None,
117
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
118
) -> Upload:
119
"""
120
Cancel an Upload.
121
122
Args:
123
upload_id: ID of the Upload to cancel.
124
125
extra_headers: Additional HTTP headers.
126
extra_query: Additional query parameters.
127
extra_body: Additional JSON fields.
128
timeout: Request timeout in seconds.
129
130
Returns:
131
Upload: Cancelled Upload object with status="cancelled".
132
133
Notes:
134
- No parts can be added after cancellation
135
- Previously uploaded parts are discarded
136
"""
137
```
138
139
### Upload File Chunked
140
141
High-level helper that handles the entire upload process automatically.
142
143
```python { .api }
144
def upload_file_chunked(
145
self,
146
*,
147
file: str | os.PathLike | bytes,
148
mime_type: str,
149
purpose: FilePurpose,
150
filename: str | None = None,
151
bytes: int | None = None,
152
part_size: int | None = None,
153
md5: str | Omit = omit,
154
) -> Upload:
155
"""
156
Upload a file in chunks automatically.
157
158
This convenience method handles:
159
1. Creating the Upload
160
2. Splitting file into parts
161
3. Uploading each part sequentially
162
4. Completing the Upload
163
164
Args:
165
file: File to upload. Can be:
166
- Path-like object: Path("my-paper.pdf")
167
- String path: "my-paper.pdf"
168
- bytes: In-memory file data (requires filename and bytes args)
169
170
mime_type: MIME type of the file (e.g., "application/pdf").
171
172
purpose: Intended purpose ("assistants", "batch", "fine-tune", "vision").
173
174
filename: Filename (required if file is bytes, optional otherwise).
175
176
bytes: Total file size in bytes (required if file is bytes, optional otherwise).
177
If not provided for path, automatically determined from file.
178
179
part_size: Size of each part in bytes. Default: 64 MB (64 * 1024 * 1024).
180
Each part uploads as a separate request.
181
182
md5: Optional MD5 checksum for verification.
183
184
Returns:
185
Upload: Completed Upload object containing the File.
186
187
Raises:
188
TypeError: If filename or bytes not provided for in-memory files.
189
ValueError: If file path is invalid or file cannot be read.
190
"""
191
```
192
193
Usage examples:
194
195
```python
196
from pathlib import Path
197
from openai import OpenAI
198
199
client = OpenAI()
200
201
# Upload a file from disk (simplest approach)
202
upload = client.uploads.upload_file_chunked(
203
file=Path("training_data.jsonl"),
204
mime_type="application/jsonl",
205
purpose="fine-tune"
206
)
207
208
print(f"Upload complete! File ID: {upload.file.id}")
209
210
# Upload with custom part size (e.g., 32 MB parts)
211
upload = client.uploads.upload_file_chunked(
212
file="large_dataset.jsonl",
213
mime_type="application/jsonl",
214
purpose="batch",
215
part_size=32 * 1024 * 1024
216
)
217
218
# Upload in-memory bytes
219
file_data = b"..." # Your file data
220
upload = client.uploads.upload_file_chunked(
221
file=file_data,
222
filename="document.pdf",
223
bytes=len(file_data),
224
mime_type="application/pdf",
225
purpose="assistants"
226
)
227
228
# Upload with MD5 verification
229
upload = client.uploads.upload_file_chunked(
230
file="important_data.csv",
231
mime_type="text/csv",
232
purpose="assistants",
233
md5="5d41402abc4b2a76b9719d911017c592"
234
)
235
```
236
237
### Create Part
238
239
Add a single part to an Upload.
240
241
```python { .api }
242
def create(
243
self,
244
upload_id: str,
245
*,
246
data: FileTypes,
247
extra_headers: dict[str, str] | None = None,
248
extra_query: dict[str, object] | None = None,
249
extra_body: dict[str, object] | None = None,
250
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
251
) -> UploadPart:
252
"""
253
Add a Part to an Upload.
254
255
Args:
256
upload_id: ID of the Upload to add this Part to.
257
258
data: Chunk of bytes for this Part. Maximum 64 MB.
259
260
extra_headers: Additional HTTP headers.
261
extra_query: Additional query parameters.
262
extra_body: Additional JSON fields.
263
timeout: Request timeout in seconds.
264
265
Returns:
266
UploadPart: Part object with ID to use when completing the Upload.
267
268
Notes:
269
- Each Part can be at most 64 MB
270
- Total size across all parts cannot exceed 8 GB
271
- Parts can be added in parallel for faster uploads
272
- Order is determined when completing the Upload
273
"""
274
```
275
276
Advanced manual upload example:
277
278
```python
279
from openai import OpenAI
280
import io
281
282
client = OpenAI()
283
284
# Step 1: Create the Upload
285
file_path = "large_file.pdf"
286
file_size = os.path.getsize(file_path)
287
288
upload = client.uploads.create(
289
bytes=file_size,
290
filename="large_file.pdf",
291
mime_type="application/pdf",
292
purpose="assistants"
293
)
294
295
print(f"Created upload: {upload.id}")
296
297
# Step 2: Upload parts
298
part_size = 64 * 1024 * 1024 # 64 MB
299
part_ids = []
300
301
with open(file_path, "rb") as f:
302
while True:
303
chunk = f.read(part_size)
304
if not chunk:
305
break
306
307
part = client.uploads.parts.create(
308
upload_id=upload.id,
309
data=chunk
310
)
311
part_ids.append(part.id)
312
print(f"Uploaded part {len(part_ids)}: {part.id}")
313
314
# Step 3: Complete the Upload
315
completed = client.uploads.complete(
316
upload_id=upload.id,
317
part_ids=part_ids
318
)
319
320
print(f"Upload complete! File ID: {completed.file.id}")
321
322
# Handle errors by cancelling
323
try:
324
# ... upload process ...
325
pass
326
except Exception as e:
327
print(f"Error during upload: {e}")
328
client.uploads.cancel(upload_id=upload.id)
329
print("Upload cancelled")
330
```
331
332
Parallel upload example:
333
334
```python
335
from openai import OpenAI
336
from concurrent.futures import ThreadPoolExecutor
337
import io
338
339
client = OpenAI()
340
341
def upload_part(upload_id: str, part_data: bytes) -> str:
342
"""Upload a single part and return its ID."""
343
part = client.uploads.parts.create(
344
upload_id=upload_id,
345
data=part_data
346
)
347
return part.id
348
349
# Create upload
350
file_path = "large_file.pdf"
351
file_size = os.path.getsize(file_path)
352
353
upload = client.uploads.create(
354
bytes=file_size,
355
filename="large_file.pdf",
356
mime_type="application/pdf",
357
purpose="assistants"
358
)
359
360
# Split file into chunks
361
part_size = 64 * 1024 * 1024
362
chunks = []
363
364
with open(file_path, "rb") as f:
365
while True:
366
chunk = f.read(part_size)
367
if not chunk:
368
break
369
chunks.append(chunk)
370
371
# Upload parts in parallel
372
with ThreadPoolExecutor(max_workers=4) as executor:
373
part_ids = list(executor.map(
374
lambda chunk: upload_part(upload.id, chunk),
375
chunks
376
))
377
378
# Complete upload
379
completed = client.uploads.complete(
380
upload_id=upload.id,
381
part_ids=part_ids
382
)
383
384
print(f"Parallel upload complete! File ID: {completed.file.id}")
385
```
386
387
## Async Usage
388
389
```python
390
import asyncio
391
from openai import AsyncOpenAI
392
393
async def upload_file():
394
client = AsyncOpenAI()
395
396
# Async upload
397
upload = await client.uploads.upload_file_chunked(
398
file="data.jsonl",
399
mime_type="application/jsonl",
400
purpose="fine-tune"
401
)
402
403
return upload.file.id
404
405
file_id = asyncio.run(upload_file())
406
```
407
408
## Types
409
410
```python { .api }
411
from typing import Literal
412
from pydantic import BaseModel
413
414
class Upload(BaseModel):
415
"""Upload object containing metadata and status."""
416
id: str
417
bytes: int
418
created_at: int
419
expires_at: int
420
filename: str
421
object: Literal["upload"]
422
purpose: FilePurpose
423
status: Literal["pending", "completed", "cancelled", "expired"]
424
file: FileObject | None # Present when status="completed"
425
426
class UploadPart(BaseModel):
427
"""Part object representing a chunk of an upload."""
428
id: str
429
created_at: int
430
object: Literal["upload.part"]
431
upload_id: str
432
433
FilePurpose = Literal["assistants", "batch", "fine-tune", "vision"]
434
435
FileTypes = Union[
436
FileContent,
437
Tuple[Optional[str], FileContent],
438
Tuple[Optional[str], FileContent, Optional[str]]
439
]
440
441
class Omit:
442
"""Sentinel value for omitted parameters."""
443
```
444
445
## Access Pattern
446
447
```python
448
# Synchronous
449
from openai import OpenAI
450
client = OpenAI()
451
client.uploads.create(...)
452
client.uploads.complete(...)
453
client.uploads.cancel(...)
454
client.uploads.upload_file_chunked(...)
455
client.uploads.parts.create(...)
456
457
# Asynchronous
458
from openai import AsyncOpenAI
459
client = AsyncOpenAI()
460
await client.uploads.create(...)
461
await client.uploads.complete(...)
462
await client.uploads.cancel(...)
463
await client.uploads.upload_file_chunked(...)
464
await client.uploads.parts.create(...)
465
```
466