0
# Download Protocols
1
2
Specialized downloader classes for different protocols and authentication methods. These downloaders handle specific protocols and provide customization options for authentication, headers, and connection parameters.
3
4
## Capabilities
5
6
### Automatic Downloader Selection
7
8
Automatically chooses the appropriate downloader based on URL protocol.
9
10
```python { .api }
11
def choose_downloader(url: str, progressbar: bool = False) -> callable:
12
"""
13
Choose the appropriate downloader for the given URL.
14
15
Parameters:
16
- url: The URL for which to choose a downloader
17
- progressbar: If True, will use a downloader that displays a progress bar
18
19
Returns:
20
A downloader function appropriate for the URL's protocol
21
"""
22
```
23
24
### HTTP/HTTPS Downloads
25
26
Downloads files over HTTP/HTTPS with support for authentication, custom headers, and progress bars.
27
28
```python { .api }
29
class HTTPDownloader:
30
"""Download files over HTTP/HTTPS with optional authentication."""
31
32
def __init__(
33
self,
34
progressbar: bool = False,
35
chunk_size: int = 1024,
36
**kwargs
37
):
38
"""
39
Parameters:
40
- progressbar: If True, will display a progress bar during download. Requires tqdm
41
- chunk_size: Files are streamed/downloaded in chunks of this size (in bytes)
42
- **kwargs: Extra keyword arguments to forward to requests.get
43
"""
44
45
def __call__(self, url: str, output_file: str, pooch: object) -> None:
46
"""
47
Download the given URL to the given output file.
48
49
Parameters:
50
- url: The URL to the file that will be downloaded
51
- output_file: Path (and file name) to which the file will be downloaded
52
- pooch: The Pooch instance that is calling this method
53
"""
54
```
55
56
### FTP Downloads
57
58
Downloads files over FTP with optional authentication.
59
60
```python { .api }
61
class FTPDownloader:
62
"""Download files over FTP with optional authentication."""
63
64
def __init__(
65
self,
66
port: int = 21,
67
username: str = "anonymous",
68
password: str = "",
69
account: str = "",
70
timeout: float | None = None,
71
progressbar: bool = False,
72
chunk_size: int = 1024
73
):
74
"""
75
Parameters:
76
- port: Port used by the FTP server. Defaults to 21
77
- username: The username used to login to the FTP server. Defaults to 'anonymous'
78
- password: The password used to login to the FTP server. Defaults to empty string
79
- account: Account information for the FTP server. Usually not required
80
- timeout: Timeout in seconds for blocking operations
81
- progressbar: If True, will display a progress bar during download. Requires tqdm
82
- chunk_size: Files are streamed/downloaded in chunks of this size (in bytes)
83
"""
84
85
def __call__(self, url: str, output_file: str, pooch: object) -> None:
86
"""
87
Download the given URL to the given output file.
88
89
Parameters:
90
- url: The URL to the file that will be downloaded
91
- output_file: Path (and file name) to which the file will be downloaded
92
- pooch: The Pooch instance that is calling this method
93
"""
94
```
95
96
### SFTP Downloads
97
98
Downloads files over SFTP (SSH File Transfer Protocol) with authentication.
99
100
```python { .api }
101
class SFTPDownloader:
102
"""Download files over SFTP (SSH File Transfer Protocol)."""
103
104
def __init__(
105
self,
106
port: int = 22,
107
username: str = "anonymous",
108
password: str = "",
109
account: str = "",
110
timeout: float | None = None,
111
progressbar: bool = False
112
):
113
"""
114
Parameters:
115
- port: Port used by the SFTP server. Defaults to 22
116
- username: The username used to login to the SFTP server. Defaults to 'anonymous'
117
- password: The password used to login to the SFTP server. Defaults to empty string
118
- account: Account information for the SFTP server. Usually not required
119
- timeout: Timeout in seconds for the connection
120
- progressbar: If True, will display a progress bar during download. Requires tqdm
121
"""
122
123
def __call__(self, url: str, output_file: str, pooch: object) -> None:
124
"""
125
Download the given URL to the given output file.
126
127
Parameters:
128
- url: The URL to the file that will be downloaded
129
- output_file: Path (and file name) to which the file will be downloaded
130
- pooch: The Pooch instance that is calling this method
131
"""
132
```
133
134
### DOI-based Repository Downloads
135
136
Downloads files from data repositories (Zenodo, Figshare, Dataverse) using DOI identifiers. Uses repository APIs to resolve DOI URLs to actual HTTP download links.
137
138
```python { .api }
139
class DOIDownloader:
140
"""
141
Download files from data repositories using DOI identifiers.
142
143
Supported repositories:
144
- figshare (www.figshare.com)
145
- Zenodo (www.zenodo.org)
146
- Dataverse instances (dataverse.org)
147
148
DOI URL format: doi:{DOI}/{filename}
149
Example: doi:10.5281/zenodo.3939050/data.csv
150
"""
151
152
def __init__(
153
self,
154
progressbar: bool = False,
155
chunk_size: int = 1024,
156
**kwargs
157
):
158
"""
159
Parameters:
160
- progressbar: If True, will display a progress bar during download. Requires tqdm
161
- chunk_size: Files are streamed/downloaded in chunks of this size (in bytes)
162
- **kwargs: Extra keyword arguments to forward to requests.get for HTTP requests
163
"""
164
165
def __call__(self, url: str, output_file: str, pooch: object) -> None:
166
"""
167
Download the given DOI URL to the given output file.
168
169
Parameters:
170
- url: The DOI URL in format 'doi:{DOI}/{filename}' pointing to a file in a supported repository
171
- output_file: Path (and file name) to which the file will be downloaded
172
- pooch: The Pooch instance that is calling this method
173
"""
174
```
175
176
### DOI Helper Functions
177
178
Utility functions for working with DOI-based downloads.
179
180
```python { .api }
181
def doi_to_url(doi: str) -> str:
182
"""
183
Follow a DOI link to resolve the URL of the archive.
184
185
Parameters:
186
- doi: The DOI of the archive
187
188
Returns:
189
The URL of the archive in the data repository
190
"""
191
192
def doi_to_repository(doi: str) -> object:
193
"""
194
Instantiate a data repository instance from a given DOI.
195
196
Parameters:
197
- doi: The DOI of the archive
198
199
Returns:
200
The data repository object for the DOI
201
"""
202
```
203
204
## Usage Examples
205
206
### HTTP Downloads with Authentication
207
208
```python
209
import pooch
210
211
# Create HTTP downloader with custom headers
212
downloader = pooch.HTTPDownloader(
213
progressbar=True,
214
auth=("username", "password"),
215
headers={"User-Agent": "MyApp/1.0"}
216
)
217
218
# Use with retrieve
219
fname = pooch.retrieve(
220
"https://example.com/protected/data.csv",
221
known_hash="md5:abc123...",
222
downloader=downloader
223
)
224
```
225
226
### FTP Downloads
227
228
```python
229
import pooch
230
231
# Create FTP downloader with authentication
232
downloader = pooch.FTPDownloader(
233
port=21,
234
username="myuser",
235
password="mypassword",
236
progressbar=True
237
)
238
239
# Use with retrieve
240
fname = pooch.retrieve(
241
"ftp://ftp.example.com/data/dataset.zip",
242
known_hash="sha256:def456...",
243
downloader=downloader
244
)
245
```
246
247
### SFTP Downloads
248
249
```python
250
import pooch
251
252
# Create SFTP downloader
253
downloader = pooch.SFTPDownloader(
254
port=22,
255
username="myuser",
256
password="mypassword",
257
progressbar=True
258
)
259
260
# Use with retrieve
261
fname = pooch.retrieve(
262
"sftp://secure.example.com/data/dataset.tar.gz",
263
known_hash="sha256:ghi789...",
264
downloader=downloader
265
)
266
```
267
268
### DOI-based Downloads
269
270
```python
271
import pooch
272
273
# Create DOI downloader
274
downloader = pooch.DOIDownloader(progressbar=True)
275
276
# Download from Zenodo using DOI
277
fname = pooch.retrieve(
278
"doi:10.5281/zenodo.3939050/tiny-data.txt",
279
known_hash="md5:70e2afd3fd7e336ae478b1e740a5f08e",
280
downloader=downloader
281
)
282
283
# Or use automatic downloader selection
284
fname = pooch.retrieve(
285
"doi:10.5281/zenodo.3939050/tiny-data.txt",
286
known_hash="md5:70e2afd3fd7e336ae478b1e740a5f08e",
287
# Automatically chooses DOIDownloader for doi: URLs
288
)
289
```
290
291
### Custom Downloaders
292
293
```python
294
import pooch
295
296
def my_custom_downloader(url, output_file, pooch):
297
"""Custom downloader function."""
298
# Implement custom download logic
299
pass
300
301
# Use custom downloader
302
fname = pooch.retrieve(
303
"custom://example.com/data.txt",
304
known_hash="sha256:abc123...",
305
downloader=my_custom_downloader
306
)
307
```