or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-data-management.mddownload-protocols.mdfile-processing.mdindex.mdutilities-helpers.md

download-protocols.mddocs/

0

# Download Protocols

1

2

Specialized downloader classes for different protocols and authentication methods. These downloaders handle specific protocols and provide customization options for authentication, headers, and connection parameters.

3

4

## Capabilities

5

6

### Automatic Downloader Selection

7

8

Automatically chooses the appropriate downloader based on URL protocol.

9

10

```python { .api }

11

def choose_downloader(url: str, progressbar: bool = False) -> callable:

12

"""

13

Choose the appropriate downloader for the given URL.

14

15

Parameters:

16

- url: The URL for which to choose a downloader

17

- progressbar: If True, will use a downloader that displays a progress bar

18

19

Returns:

20

A downloader function appropriate for the URL's protocol

21

"""

22

```

23

24

### HTTP/HTTPS Downloads

25

26

Downloads files over HTTP/HTTPS with support for authentication, custom headers, and progress bars.

27

28

```python { .api }

29

class HTTPDownloader:

30

"""Download files over HTTP/HTTPS with optional authentication."""

31

32

def __init__(

33

self,

34

progressbar: bool = False,

35

chunk_size: int = 1024,

36

**kwargs

37

):

38

"""

39

Parameters:

40

- progressbar: If True, will display a progress bar during download. Requires tqdm

41

- chunk_size: Files are streamed/downloaded in chunks of this size (in bytes)

42

- **kwargs: Extra keyword arguments to forward to requests.get

43

"""

44

45

def __call__(self, url: str, output_file: str, pooch: object) -> None:

46

"""

47

Download the given URL to the given output file.

48

49

Parameters:

50

- url: The URL to the file that will be downloaded

51

- output_file: Path (and file name) to which the file will be downloaded

52

- pooch: The Pooch instance that is calling this method

53

"""

54

```

55

56

### FTP Downloads

57

58

Downloads files over FTP with optional authentication.

59

60

```python { .api }

61

class FTPDownloader:

62

"""Download files over FTP with optional authentication."""

63

64

def __init__(

65

self,

66

port: int = 21,

67

username: str = "anonymous",

68

password: str = "",

69

account: str = "",

70

timeout: float | None = None,

71

progressbar: bool = False,

72

chunk_size: int = 1024

73

):

74

"""

75

Parameters:

76

- port: Port used by the FTP server. Defaults to 21

77

- username: The username used to login to the FTP server. Defaults to 'anonymous'

78

- password: The password used to login to the FTP server. Defaults to empty string

79

- account: Account information for the FTP server. Usually not required

80

- timeout: Timeout in seconds for blocking operations

81

- progressbar: If True, will display a progress bar during download. Requires tqdm

82

- chunk_size: Files are streamed/downloaded in chunks of this size (in bytes)

83

"""

84

85

def __call__(self, url: str, output_file: str, pooch: object) -> None:

86

"""

87

Download the given URL to the given output file.

88

89

Parameters:

90

- url: The URL to the file that will be downloaded

91

- output_file: Path (and file name) to which the file will be downloaded

92

- pooch: The Pooch instance that is calling this method

93

"""

94

```

95

96

### SFTP Downloads

97

98

Downloads files over SFTP (SSH File Transfer Protocol) with authentication.

99

100

```python { .api }

101

class SFTPDownloader:

102

"""Download files over SFTP (SSH File Transfer Protocol)."""

103

104

def __init__(

105

self,

106

port: int = 22,

107

username: str = "anonymous",

108

password: str = "",

109

account: str = "",

110

timeout: float | None = None,

111

progressbar: bool = False

112

):

113

"""

114

Parameters:

115

- port: Port used by the SFTP server. Defaults to 22

116

- username: The username used to login to the SFTP server. Defaults to 'anonymous'

117

- password: The password used to login to the SFTP server. Defaults to empty string

118

- account: Account information for the SFTP server. Usually not required

119

- timeout: Timeout in seconds for the connection

120

- progressbar: If True, will display a progress bar during download. Requires tqdm

121

"""

122

123

def __call__(self, url: str, output_file: str, pooch: object) -> None:

124

"""

125

Download the given URL to the given output file.

126

127

Parameters:

128

- url: The URL to the file that will be downloaded

129

- output_file: Path (and file name) to which the file will be downloaded

130

- pooch: The Pooch instance that is calling this method

131

"""

132

```

133

134

### DOI-based Repository Downloads

135

136

Downloads files from data repositories (Zenodo, Figshare, Dataverse) using DOI identifiers. Uses repository APIs to resolve DOI URLs to actual HTTP download links.

137

138

```python { .api }

139

class DOIDownloader:

140

"""

141

Download files from data repositories using DOI identifiers.

142

143

Supported repositories:

144

- figshare (www.figshare.com)

145

- Zenodo (www.zenodo.org)

146

- Dataverse instances (dataverse.org)

147

148

DOI URL format: doi:{DOI}/{filename}

149

Example: doi:10.5281/zenodo.3939050/data.csv

150

"""

151

152

def __init__(

153

self,

154

progressbar: bool = False,

155

chunk_size: int = 1024,

156

**kwargs

157

):

158

"""

159

Parameters:

160

- progressbar: If True, will display a progress bar during download. Requires tqdm

161

- chunk_size: Files are streamed/downloaded in chunks of this size (in bytes)

162

- **kwargs: Extra keyword arguments to forward to requests.get for HTTP requests

163

"""

164

165

def __call__(self, url: str, output_file: str, pooch: object) -> None:

166

"""

167

Download the given DOI URL to the given output file.

168

169

Parameters:

170

- url: The DOI URL in format 'doi:{DOI}/{filename}' pointing to a file in a supported repository

171

- output_file: Path (and file name) to which the file will be downloaded

172

- pooch: The Pooch instance that is calling this method

173

"""

174

```

175

176

### DOI Helper Functions

177

178

Utility functions for working with DOI-based downloads.

179

180

```python { .api }

181

def doi_to_url(doi: str) -> str:

182

"""

183

Follow a DOI link to resolve the URL of the archive.

184

185

Parameters:

186

- doi: The DOI of the archive

187

188

Returns:

189

The URL of the archive in the data repository

190

"""

191

192

def doi_to_repository(doi: str) -> object:

193

"""

194

Instantiate a data repository instance from a given DOI.

195

196

Parameters:

197

- doi: The DOI of the archive

198

199

Returns:

200

The data repository object for the DOI

201

"""

202

```

203

204

## Usage Examples

205

206

### HTTP Downloads with Authentication

207

208

```python

209

import pooch

210

211

# Create HTTP downloader with custom headers

212

downloader = pooch.HTTPDownloader(

213

progressbar=True,

214

auth=("username", "password"),

215

headers={"User-Agent": "MyApp/1.0"}

216

)

217

218

# Use with retrieve

219

fname = pooch.retrieve(

220

"https://example.com/protected/data.csv",

221

known_hash="md5:abc123...",

222

downloader=downloader

223

)

224

```

225

226

### FTP Downloads

227

228

```python

229

import pooch

230

231

# Create FTP downloader with authentication

232

downloader = pooch.FTPDownloader(

233

port=21,

234

username="myuser",

235

password="mypassword",

236

progressbar=True

237

)

238

239

# Use with retrieve

240

fname = pooch.retrieve(

241

"ftp://ftp.example.com/data/dataset.zip",

242

known_hash="sha256:def456...",

243

downloader=downloader

244

)

245

```

246

247

### SFTP Downloads

248

249

```python

250

import pooch

251

252

# Create SFTP downloader

253

downloader = pooch.SFTPDownloader(

254

port=22,

255

username="myuser",

256

password="mypassword",

257

progressbar=True

258

)

259

260

# Use with retrieve

261

fname = pooch.retrieve(

262

"sftp://secure.example.com/data/dataset.tar.gz",

263

known_hash="sha256:ghi789...",

264

downloader=downloader

265

)

266

```

267

268

### DOI-based Downloads

269

270

```python

271

import pooch

272

273

# Create DOI downloader

274

downloader = pooch.DOIDownloader(progressbar=True)

275

276

# Download from Zenodo using DOI

277

fname = pooch.retrieve(

278

"doi:10.5281/zenodo.3939050/tiny-data.txt",

279

known_hash="md5:70e2afd3fd7e336ae478b1e740a5f08e",

280

downloader=downloader

281

)

282

283

# Or use automatic downloader selection

284

fname = pooch.retrieve(

285

"doi:10.5281/zenodo.3939050/tiny-data.txt",

286

known_hash="md5:70e2afd3fd7e336ae478b1e740a5f08e",

287

# Automatically chooses DOIDownloader for doi: URLs

288

)

289

```

290

291

### Custom Downloaders

292

293

```python

294

import pooch

295

296

def my_custom_downloader(url, output_file, pooch):

297

"""Custom downloader function."""

298

# Implement custom download logic

299

pass

300

301

# Use custom downloader

302

fname = pooch.retrieve(

303

"custom://example.com/data.txt",

304

known_hash="sha256:abc123...",

305

downloader=my_custom_downloader

306

)

307

```