or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

archive-utilities.mdcaching-integrity.mdfile-downloads.mdfolder-operations.mdindex.md

folder-operations.mddocs/

0

# Folder Operations

1

2

Recursive downloading of Google Drive folders with directory structure preservation and batch file handling.

3

4

## Capabilities

5

6

### Folder Download Function

7

8

Downloads entire Google Drive folders with recursive structure preservation, supporting up to 50 files per folder.

9

10

```python { .api }

11

from typing import Union, List

12

13

def download_folder(

14

url=None,

15

id=None,

16

output=None,

17

quiet=False,

18

proxy=None,

19

speed=None,

20

use_cookies=True,

21

remaining_ok=False,

22

verify=True,

23

user_agent=None,

24

skip_download: bool = False,

25

resume=False

26

) -> Union[List[str], List[GoogleDriveFileToDownload], None]:

27

"""

28

Downloads entire folder from Google Drive URL.

29

30

Parameters:

31

- url (str): Google Drive folder URL. Must be format 'https://drive.google.com/drive/folders/{id}'.

32

- id (str): Google Drive folder ID. Cannot be used with url parameter.

33

- output (str): Output directory path. If None, uses folder name from Google Drive.

34

- quiet (bool): Suppress terminal output. Default: False.

35

- proxy (str): Proxy configuration in format 'protocol://host:port'.

36

- speed (float): Download speed limit in bytes per second.

37

- use_cookies (bool): Use cookies from ~/.cache/gdown/cookies.txt. Default: True.

38

- remaining_ok (bool): Allow downloading folders at maximum file limit (50 files). Default: False.

39

- verify (bool/str): TLS certificate verification. True/False or path to CA bundle. Default: True.

40

- user_agent (str): Custom user agent string.

41

- skip_download (bool): Return file list without downloading (dry run). Default: False.

42

- resume (bool): Resume interrupted downloads, skip completed files. Default: False.

43

44

Returns:

45

Union[List[str], List[GoogleDriveFileToDownload], None]:

46

- If skip_download=False: List of downloaded file paths or None if failed.

47

- If skip_download=True: List of GoogleDriveFileToDownload objects.

48

49

Raises:

50

FolderContentsMaximumLimitError: When folder contains more than 50 files.

51

FileURLRetrievalError: When unable to access folder or retrieve file URLs.

52

ValueError: When both url and id are specified or neither.

53

"""

54

```

55

56

### Data Types

57

58

```python { .api }

59

import collections

60

61

GoogleDriveFileToDownload = collections.namedtuple(

62

"GoogleDriveFileToDownload",

63

("id", "path", "local_path")

64

)

65

```

66

67

Named tuple container for file download information with the following fields:

68

- **id** (str): Google Drive file ID

69

- **path** (str): Relative path within folder structure

70

- **local_path** (str): Local filesystem path where file will be saved

71

72

## Usage Examples

73

74

### Basic Folder Download

75

76

```python

77

import gdown

78

79

# Download entire folder

80

folder_url = "https://drive.google.com/drive/folders/15uNXeRBIhVvZJIhL4yTw4IsStMhUaaxl"

81

downloaded_files = gdown.download_folder(folder_url, output="./my_folder")

82

83

print(f"Downloaded {len(downloaded_files)} files:")

84

for file_path in downloaded_files:

85

print(f" {file_path}")

86

```

87

88

### Folder Download with ID

89

90

```python

91

# Using folder ID directly

92

folder_id = "15uNXeRBIhVvZJIhL4yTw4IsStMhUaaxl"

93

downloaded_files = gdown.download_folder(id=folder_id, output="./dataset")

94

```

95

96

### Dry Run (List Files Without Downloading)

97

98

```python

99

# Get file list without downloading

100

folder_url = "https://drive.google.com/drive/folders/FOLDER_ID"

101

file_info = gdown.download_folder(folder_url, skip_download=True)

102

103

print("Files in folder:")

104

for file_obj in file_info:

105

print(f"ID: {file_obj.id}")

106

print(f"Path: {file_obj.path}")

107

print(f"Local path: {file_obj.local_path}")

108

print("---")

109

```

110

111

### Resume Interrupted Downloads

112

113

```python

114

# Resume partial folder download

115

gdown.download_folder(

116

folder_url,

117

output="./large_dataset",

118

resume=True,

119

quiet=False # Show progress for resumed files

120

)

121

```

122

123

### Advanced Configuration

124

125

```python

126

# Folder download with speed limit and proxy

127

gdown.download_folder(

128

url=folder_url,

129

output="./data",

130

speed=2*1024*1024, # 2MB/s limit

131

proxy="http://corporate-proxy:8080",

132

use_cookies=True,

133

remaining_ok=True # Allow folders with 50 files

134

)

135

```

136

137

## Folder Structure Preservation

138

139

gdown maintains the original Google Drive folder structure:

140

141

```

142

Original Google Drive:

143

πŸ“ Dataset/

144

β”œβ”€β”€ πŸ“ train/

145

β”‚ β”œβ”€β”€ image1.jpg

146

β”‚ └── image2.jpg

147

β”œβ”€β”€ πŸ“ test/

148

β”‚ └── image3.jpg

149

└── README.txt

150

151

Downloaded Structure:

152

./my_folder/

153

β”œβ”€β”€ train/

154

β”‚ β”œβ”€β”€ image1.jpg

155

β”‚ └── image2.jpg

156

β”œβ”€β”€ test/

157

β”‚ └── image3.jpg

158

└── README.txt

159

```

160

161

## Limitations and Constraints

162

163

### File Count Limit

164

165

- **Maximum**: 50 files per folder (Google Drive API restriction)

166

- **Behavior**: Raises `FolderContentsMaximumLimitError` by default

167

- **Override**: Use `remaining_ok=True` to allow download at limit

168

169

### Supported File Types

170

171

- All file types supported by Google Drive

172

- Google Workspace documents (Docs/Sheets/Slides) downloaded in default formats

173

- Binary files, images, archives, etc.

174

175

### Authentication

176

177

```python

178

# For private folders, place cookies in ~/.cache/gdown/cookies.txt

179

# Format: Mozilla/Netscape cookie jar

180

181

# Or disable cookies for public folders only

182

gdown.download_folder(url, use_cookies=False)

183

```

184

185

## Error Handling

186

187

```python

188

from gdown.exceptions import FolderContentsMaximumLimitError, FileURLRetrievalError

189

190

try:

191

files = gdown.download_folder("https://drive.google.com/drive/folders/FOLDER_ID")

192

print(f"Successfully downloaded {len(files)} files")

193

194

except FolderContentsMaximumLimitError:

195

print("Folder contains more than 50 files. Use remaining_ok=True to download anyway.")

196

197

except FileURLRetrievalError as e:

198

print(f"Failed to access folder: {e}")

199

# Check folder permissions, URL validity, or network connectivity

200

201

except ValueError as e:

202

print(f"Invalid parameters: {e}")

203

```

204

205

### Handling Large Folders

206

207

```python

208

def download_large_folder(folder_url, output_dir):

209

"""Download folder with proper error handling for size limits."""

210

try:

211

# First try normal download

212

return gdown.download_folder(folder_url, output=output_dir)

213

214

except FolderContentsMaximumLimitError:

215

print("Folder at maximum size limit (50 files)")

216

217

# Option 1: Download anyway

218

response = input("Download anyway? (y/n): ")

219

if response.lower() == 'y':

220

return gdown.download_folder(

221

folder_url,

222

output=output_dir,

223

remaining_ok=True

224

)

225

226

# Option 2: Get file list for manual selection

227

file_list = gdown.download_folder(folder_url, skip_download=True)

228

print(f"Folder contains {len(file_list)} files:")

229

for i, file_obj in enumerate(file_list[:10]): # Show first 10

230

print(f"{i+1}. {file_obj.path}")

231

232

return None

233

```

234

235

## Best Practices

236

237

### Batch Processing

238

239

```python

240

def process_dataset_folder(folder_url):

241

"""Download and process entire dataset folder."""

242

243

# Download with resume support

244

files = gdown.download_folder(

245

folder_url,

246

output="./dataset",

247

resume=True,

248

quiet=False

249

)

250

251

# Process files by type

252

for file_path in files:

253

if file_path.endswith('.csv'):

254

# Process CSV files

255

print(f"Processing CSV: {file_path}")

256

elif file_path.endswith(('.jpg', '.png')):

257

# Process images

258

print(f"Processing image: {file_path}")

259

260

return files

261

```

262

263

### Monitoring Progress

264

265

```python

266

# For large folders, monitor download progress

267

import os

268

269

def monitor_folder_download(folder_url, output_dir):

270

"""Download folder with progress monitoring."""

271

272

# Get file list first

273

file_list = gdown.download_folder(folder_url, skip_download=True)

274

total_files = len(file_list)

275

276

print(f"Preparing to download {total_files} files...")

277

278

# Start actual download

279

downloaded_files = gdown.download_folder(

280

folder_url,

281

output=output_dir,

282

quiet=False,

283

resume=True

284

)

285

286

if downloaded_files:

287

print(f"βœ… Successfully downloaded {len(downloaded_files)}/{total_files} files")

288

289

# Verify all files exist

290

missing = []

291

for expected_file in file_list:

292

if not os.path.exists(expected_file.local_path):

293

missing.append(expected_file.path)

294

295

if missing:

296

print(f"⚠️ Missing {len(missing)} files:")

297

for path in missing[:5]: # Show first 5

298

print(f" - {path}")

299

300

return downloaded_files

301

```