0
# Folder Operations
1
2
Recursive downloading of Google Drive folders with directory structure preservation and batch file handling.
3
4
## Capabilities
5
6
### Folder Download Function
7
8
Downloads entire Google Drive folders with recursive structure preservation, supporting up to 50 files per folder.
9
10
```python { .api }
11
from typing import Union, List
12
13
def download_folder(
14
url=None,
15
id=None,
16
output=None,
17
quiet=False,
18
proxy=None,
19
speed=None,
20
use_cookies=True,
21
remaining_ok=False,
22
verify=True,
23
user_agent=None,
24
skip_download: bool = False,
25
resume=False
26
) -> Union[List[str], List[GoogleDriveFileToDownload], None]:
27
"""
28
Downloads entire folder from Google Drive URL.
29
30
Parameters:
31
- url (str): Google Drive folder URL. Must be format 'https://drive.google.com/drive/folders/{id}'.
32
- id (str): Google Drive folder ID. Cannot be used with url parameter.
33
- output (str): Output directory path. If None, uses folder name from Google Drive.
34
- quiet (bool): Suppress terminal output. Default: False.
35
- proxy (str): Proxy configuration in format 'protocol://host:port'.
36
- speed (float): Download speed limit in bytes per second.
37
- use_cookies (bool): Use cookies from ~/.cache/gdown/cookies.txt. Default: True.
38
- remaining_ok (bool): Allow downloading folders at maximum file limit (50 files). Default: False.
39
- verify (bool/str): TLS certificate verification. True/False or path to CA bundle. Default: True.
40
- user_agent (str): Custom user agent string.
41
- skip_download (bool): Return file list without downloading (dry run). Default: False.
42
- resume (bool): Resume interrupted downloads, skip completed files. Default: False.
43
44
Returns:
45
Union[List[str], List[GoogleDriveFileToDownload], None]:
46
- If skip_download=False: List of downloaded file paths or None if failed.
47
- If skip_download=True: List of GoogleDriveFileToDownload objects.
48
49
Raises:
50
FolderContentsMaximumLimitError: When folder contains more than 50 files.
51
FileURLRetrievalError: When unable to access folder or retrieve file URLs.
52
ValueError: When both url and id are specified or neither.
53
"""
54
```
55
56
### Data Types
57
58
```python { .api }
59
import collections
60
61
GoogleDriveFileToDownload = collections.namedtuple(
62
"GoogleDriveFileToDownload",
63
("id", "path", "local_path")
64
)
65
```
66
67
Named tuple container for file download information with the following fields:
68
- **id** (str): Google Drive file ID
69
- **path** (str): Relative path within folder structure
70
- **local_path** (str): Local filesystem path where file will be saved
71
72
## Usage Examples
73
74
### Basic Folder Download
75
76
```python
77
import gdown
78
79
# Download entire folder
80
folder_url = "https://drive.google.com/drive/folders/15uNXeRBIhVvZJIhL4yTw4IsStMhUaaxl"
81
downloaded_files = gdown.download_folder(folder_url, output="./my_folder")
82
83
print(f"Downloaded {len(downloaded_files)} files:")
84
for file_path in downloaded_files:
85
print(f" {file_path}")
86
```
87
88
### Folder Download with ID
89
90
```python
91
# Using folder ID directly
92
folder_id = "15uNXeRBIhVvZJIhL4yTw4IsStMhUaaxl"
93
downloaded_files = gdown.download_folder(id=folder_id, output="./dataset")
94
```
95
96
### Dry Run (List Files Without Downloading)
97
98
```python
99
# Get file list without downloading
100
folder_url = "https://drive.google.com/drive/folders/FOLDER_ID"
101
file_info = gdown.download_folder(folder_url, skip_download=True)
102
103
print("Files in folder:")
104
for file_obj in file_info:
105
print(f"ID: {file_obj.id}")
106
print(f"Path: {file_obj.path}")
107
print(f"Local path: {file_obj.local_path}")
108
print("---")
109
```
110
111
### Resume Interrupted Downloads
112
113
```python
114
# Resume partial folder download
115
gdown.download_folder(
116
folder_url,
117
output="./large_dataset",
118
resume=True,
119
quiet=False # Show progress for resumed files
120
)
121
```
122
123
### Advanced Configuration
124
125
```python
126
# Folder download with speed limit and proxy
127
gdown.download_folder(
128
url=folder_url,
129
output="./data",
130
speed=2*1024*1024, # 2MB/s limit
131
proxy="http://corporate-proxy:8080",
132
use_cookies=True,
133
remaining_ok=True # Allow folders with 50 files
134
)
135
```
136
137
## Folder Structure Preservation
138
139
gdown maintains the original Google Drive folder structure:
140
141
```
142
Original Google Drive:
143
π Dataset/
144
βββ π train/
145
β βββ image1.jpg
146
β βββ image2.jpg
147
βββ π test/
148
β βββ image3.jpg
149
βββ README.txt
150
151
Downloaded Structure:
152
./my_folder/
153
βββ train/
154
β βββ image1.jpg
155
β βββ image2.jpg
156
βββ test/
157
β βββ image3.jpg
158
βββ README.txt
159
```
160
161
## Limitations and Constraints
162
163
### File Count Limit
164
165
- **Maximum**: 50 files per folder (Google Drive API restriction)
166
- **Behavior**: Raises `FolderContentsMaximumLimitError` by default
167
- **Override**: Use `remaining_ok=True` to allow download at limit
168
169
### Supported File Types
170
171
- All file types supported by Google Drive
172
- Google Workspace documents (Docs/Sheets/Slides) downloaded in default formats
173
- Binary files, images, archives, etc.
174
175
### Authentication
176
177
```python
178
# For private folders, place cookies in ~/.cache/gdown/cookies.txt
179
# Format: Mozilla/Netscape cookie jar
180
181
# Or disable cookies for public folders only
182
gdown.download_folder(url, use_cookies=False)
183
```
184
185
## Error Handling
186
187
```python
188
from gdown.exceptions import FolderContentsMaximumLimitError, FileURLRetrievalError
189
190
try:
191
files = gdown.download_folder("https://drive.google.com/drive/folders/FOLDER_ID")
192
print(f"Successfully downloaded {len(files)} files")
193
194
except FolderContentsMaximumLimitError:
195
print("Folder contains more than 50 files. Use remaining_ok=True to download anyway.")
196
197
except FileURLRetrievalError as e:
198
print(f"Failed to access folder: {e}")
199
# Check folder permissions, URL validity, or network connectivity
200
201
except ValueError as e:
202
print(f"Invalid parameters: {e}")
203
```
204
205
### Handling Large Folders
206
207
```python
208
def download_large_folder(folder_url, output_dir):
209
"""Download folder with proper error handling for size limits."""
210
try:
211
# First try normal download
212
return gdown.download_folder(folder_url, output=output_dir)
213
214
except FolderContentsMaximumLimitError:
215
print("Folder at maximum size limit (50 files)")
216
217
# Option 1: Download anyway
218
response = input("Download anyway? (y/n): ")
219
if response.lower() == 'y':
220
return gdown.download_folder(
221
folder_url,
222
output=output_dir,
223
remaining_ok=True
224
)
225
226
# Option 2: Get file list for manual selection
227
file_list = gdown.download_folder(folder_url, skip_download=True)
228
print(f"Folder contains {len(file_list)} files:")
229
for i, file_obj in enumerate(file_list[:10]): # Show first 10
230
print(f"{i+1}. {file_obj.path}")
231
232
return None
233
```
234
235
## Best Practices
236
237
### Batch Processing
238
239
```python
240
def process_dataset_folder(folder_url):
241
"""Download and process entire dataset folder."""
242
243
# Download with resume support
244
files = gdown.download_folder(
245
folder_url,
246
output="./dataset",
247
resume=True,
248
quiet=False
249
)
250
251
# Process files by type
252
for file_path in files:
253
if file_path.endswith('.csv'):
254
# Process CSV files
255
print(f"Processing CSV: {file_path}")
256
elif file_path.endswith(('.jpg', '.png')):
257
# Process images
258
print(f"Processing image: {file_path}")
259
260
return files
261
```
262
263
### Monitoring Progress
264
265
```python
266
# For large folders, monitor download progress
267
import os
268
269
def monitor_folder_download(folder_url, output_dir):
270
"""Download folder with progress monitoring."""
271
272
# Get file list first
273
file_list = gdown.download_folder(folder_url, skip_download=True)
274
total_files = len(file_list)
275
276
print(f"Preparing to download {total_files} files...")
277
278
# Start actual download
279
downloaded_files = gdown.download_folder(
280
folder_url,
281
output=output_dir,
282
quiet=False,
283
resume=True
284
)
285
286
if downloaded_files:
287
print(f"β Successfully downloaded {len(downloaded_files)}/{total_files} files")
288
289
# Verify all files exist
290
missing = []
291
for expected_file in file_list:
292
if not os.path.exists(expected_file.local_path):
293
missing.append(expected_file.path)
294
295
if missing:
296
print(f"β οΈ Missing {len(missing)} files:")
297
for path in missing[:5]: # Show first 5
298
print(f" - {path}")
299
300
return downloaded_files
301
```