Tessl Tile for pypi/gdown@5.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

archive-utilities.md caching-integrity.md file-downloads.md folder-operations.md index.md

folder-operations.mddocs/

0
# Folder Operations
1

2
Recursive downloading of Google Drive folders with directory structure preservation and batch file handling.
3

4
## Capabilities
5

6
### Folder Download Function
7

8
Downloads entire Google Drive folders with recursive structure preservation, supporting up to 50 files per folder.
9

10
```python { .api }
11
from typing import Union, List
12

13
def download_folder(
14
    url=None,
15
    id=None,
16
    output=None,
17
    quiet=False,
18
    proxy=None,
19
    speed=None,
20
    use_cookies=True,
21
    remaining_ok=False,
22
    verify=True,
23
    user_agent=None,
24
    skip_download: bool = False,
25
    resume=False
26
) -> Union[List[str], List[GoogleDriveFileToDownload], None]:
27
    """
28
    Downloads entire folder from Google Drive URL.
29

30
    Parameters:
31
    - url (str): Google Drive folder URL. Must be format 'https://drive.google.com/drive/folders/{id}'.
32
    - id (str): Google Drive folder ID. Cannot be used with url parameter.
33
    - output (str): Output directory path. If None, uses folder name from Google Drive.
34
    - quiet (bool): Suppress terminal output. Default: False.
35
    - proxy (str): Proxy configuration in format 'protocol://host:port'.
36
    - speed (float): Download speed limit in bytes per second.
37
    - use_cookies (bool): Use cookies from ~/.cache/gdown/cookies.txt. Default: True.
38
    - remaining_ok (bool): Allow downloading folders at maximum file limit (50 files). Default: False.
39
    - verify (bool/str): TLS certificate verification. True/False or path to CA bundle. Default: True.
40
    - user_agent (str): Custom user agent string.
41
    - skip_download (bool): Return file list without downloading (dry run). Default: False.
42
    - resume (bool): Resume interrupted downloads, skip completed files. Default: False.
43

44
    Returns:
45
    Union[List[str], List[GoogleDriveFileToDownload], None]: 
46
        - If skip_download=False: List of downloaded file paths or None if failed.
47
        - If skip_download=True: List of GoogleDriveFileToDownload objects.
48

49
    Raises:
50
    FolderContentsMaximumLimitError: When folder contains more than 50 files.
51
    FileURLRetrievalError: When unable to access folder or retrieve file URLs.
52
    ValueError: When both url and id are specified or neither.
53
    """
54
```
55

56
### Data Types
57

58
```python { .api }
59
import collections
60

61
GoogleDriveFileToDownload = collections.namedtuple(
62
    "GoogleDriveFileToDownload",
63
    ("id", "path", "local_path")
64
)
65
```
66

67
Named tuple container for file download information with the following fields:
68
- **id** (str): Google Drive file ID
69
- **path** (str): Relative path within folder structure
70
- **local_path** (str): Local filesystem path where file will be saved
71

72
## Usage Examples
73

74
### Basic Folder Download
75

76
```python
77
import gdown
78

79
# Download entire folder
80
folder_url = "https://drive.google.com/drive/folders/15uNXeRBIhVvZJIhL4yTw4IsStMhUaaxl"
81
downloaded_files = gdown.download_folder(folder_url, output="./my_folder")
82

83
print(f"Downloaded {len(downloaded_files)} files:")
84
for file_path in downloaded_files:
85
    print(f"  {file_path}")
86
```
87

88
### Folder Download with ID
89

90
```python
91
# Using folder ID directly
92
folder_id = "15uNXeRBIhVvZJIhL4yTw4IsStMhUaaxl"
93
downloaded_files = gdown.download_folder(id=folder_id, output="./dataset")
94
```
95

96
### Dry Run (List Files Without Downloading)
97

98
```python
99
# Get file list without downloading
100
folder_url = "https://drive.google.com/drive/folders/FOLDER_ID"
101
file_info = gdown.download_folder(folder_url, skip_download=True)
102

103
print("Files in folder:")
104
for file_obj in file_info:
105
    print(f"ID: {file_obj.id}")
106
    print(f"Path: {file_obj.path}")
107
    print(f"Local path: {file_obj.local_path}")
108
    print("---")
109
```
110

111
### Resume Interrupted Downloads
112

113
```python
114
# Resume partial folder download
115
gdown.download_folder(
116
    folder_url,
117
    output="./large_dataset",
118
    resume=True,
119
    quiet=False  # Show progress for resumed files
120
)
121
```
122

123
### Advanced Configuration
124

125
```python
126
# Folder download with speed limit and proxy
127
gdown.download_folder(
128
    url=folder_url,
129
    output="./data",
130
    speed=2*1024*1024,  # 2MB/s limit
131
    proxy="http://corporate-proxy:8080",
132
    use_cookies=True,
133
    remaining_ok=True  # Allow folders with 50 files
134
)
135
```
136

137
## Folder Structure Preservation
138

139
gdown maintains the original Google Drive folder structure:
140

141
```
142
Original Google Drive:
143
📁 Dataset/
144
├── 📁 train/
145
│   ├── image1.jpg
146
│   └── image2.jpg
147
├── 📁 test/
148
│   └── image3.jpg
149
└── README.txt
150

151
Downloaded Structure:
152
./my_folder/
153
├── train/
154
│   ├── image1.jpg
155
│   └── image2.jpg
156
├── test/
157
│   └── image3.jpg
158
└── README.txt
159
```
160

161
## Limitations and Constraints
162

163
### File Count Limit
164

165
- **Maximum**: 50 files per folder (Google Drive API restriction)
166
- **Behavior**: Raises `FolderContentsMaximumLimitError` by default
167
- **Override**: Use `remaining_ok=True` to allow download at limit
168

169
### Supported File Types
170

171
- All file types supported by Google Drive
172
- Google Workspace documents (Docs/Sheets/Slides) downloaded in default formats
173
- Binary files, images, archives, etc.
174

175
### Authentication
176

177
```python
178
# For private folders, place cookies in ~/.cache/gdown/cookies.txt
179
# Format: Mozilla/Netscape cookie jar
180

181
# Or disable cookies for public folders only
182
gdown.download_folder(url, use_cookies=False)
183
```
184

185
## Error Handling
186

187
```python
188
from gdown.exceptions import FolderContentsMaximumLimitError, FileURLRetrievalError
189

190
try:
191
    files = gdown.download_folder("https://drive.google.com/drive/folders/FOLDER_ID")
192
    print(f"Successfully downloaded {len(files)} files")
193
    
194
except FolderContentsMaximumLimitError:
195
    print("Folder contains more than 50 files. Use remaining_ok=True to download anyway.")
196
    
197
except FileURLRetrievalError as e:
198
    print(f"Failed to access folder: {e}")
199
    # Check folder permissions, URL validity, or network connectivity
200
    
201
except ValueError as e:
202
    print(f"Invalid parameters: {e}")
203
```
204

205
### Handling Large Folders
206

207
```python
208
def download_large_folder(folder_url, output_dir):
209
    """Download folder with proper error handling for size limits."""
210
    try:
211
        # First try normal download
212
        return gdown.download_folder(folder_url, output=output_dir)
213
        
214
    except FolderContentsMaximumLimitError:
215
        print("Folder at maximum size limit (50 files)")
216
        
217
        # Option 1: Download anyway
218
        response = input("Download anyway? (y/n): ")
219
        if response.lower() == 'y':
220
            return gdown.download_folder(
221
                folder_url, 
222
                output=output_dir, 
223
                remaining_ok=True
224
            )
225
        
226
        # Option 2: Get file list for manual selection
227
        file_list = gdown.download_folder(folder_url, skip_download=True)
228
        print(f"Folder contains {len(file_list)} files:")
229
        for i, file_obj in enumerate(file_list[:10]):  # Show first 10
230
            print(f"{i+1}. {file_obj.path}")
231
        
232
        return None
233
```
234

235
## Best Practices
236

237
### Batch Processing
238

239
```python
240
def process_dataset_folder(folder_url):
241
    """Download and process entire dataset folder."""
242
    
243
    # Download with resume support
244
    files = gdown.download_folder(
245
        folder_url,
246
        output="./dataset",
247
        resume=True,
248
        quiet=False
249
    )
250
    
251
    # Process files by type
252
    for file_path in files:
253
        if file_path.endswith('.csv'):
254
            # Process CSV files
255
            print(f"Processing CSV: {file_path}")
256
        elif file_path.endswith(('.jpg', '.png')):
257
            # Process images
258
            print(f"Processing image: {file_path}")
259
    
260
    return files
261
```
262

263
### Monitoring Progress
264

265
```python
266
# For large folders, monitor download progress
267
import os
268

269
def monitor_folder_download(folder_url, output_dir):
270
    """Download folder with progress monitoring."""
271
    
272
    # Get file list first
273
    file_list = gdown.download_folder(folder_url, skip_download=True)
274
    total_files = len(file_list)
275
    
276
    print(f"Preparing to download {total_files} files...")
277
    
278
    # Start actual download
279
    downloaded_files = gdown.download_folder(
280
        folder_url,
281
        output=output_dir,
282
        quiet=False,
283
        resume=True
284
    )
285
    
286
    if downloaded_files:
287
        print(f"✅ Successfully downloaded {len(downloaded_files)}/{total_files} files")
288
        
289
        # Verify all files exist
290
        missing = []
291
        for expected_file in file_list:
292
            if not os.path.exists(expected_file.local_path):
293
                missing.append(expected_file.path)
294
        
295
        if missing:
296
            print(f"⚠️  Missing {len(missing)} files:")
297
            for path in missing[:5]:  # Show first 5
298
                print(f"  - {path}")
299
    
300
    return downloaded_files
301
```

Version

Tile

Files

folder-operations.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

folder-operations.mddocs/