or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-data-management.mddownload-protocols.mdfile-processing.mdindex.mdutilities-helpers.md

index.mddocs/

0

# Pooch

1

2

A Python library that manages data by downloading files from servers (HTTP, FTP, data repositories like Zenodo and figshare) only when needed and storing them locally in a data cache. Pooch features pure Python implementation with minimal dependencies, built-in post-processors for unzipping/decompressing data, and is designed to be extended with custom downloaders and processors.

3

4

## Package Information

5

6

- **Package Name**: pooch

7

- **Language**: Python

8

- **Installation**: `pip install pooch`

9

10

## Core Imports

11

12

```python

13

import pooch

14

```

15

16

For common usage patterns:

17

18

```python

19

from pooch import retrieve, create, Pooch

20

```

21

22

## Basic Usage

23

24

```python

25

import pooch

26

27

# Download a single file with hash verification

28

fname = pooch.retrieve(

29

url="https://github.com/fatiando/pooch/raw/v1.8.2/data/tiny-data.txt",

30

known_hash="md5:70e2afd3fd7e336ae478b1e740a5f08e",

31

)

32

33

# For managing multiple files, create a Pooch instance

34

data_manager = pooch.create(

35

path=pooch.os_cache("myproject"),

36

base_url="https://github.com/myproject/data/raw/{version}/",

37

version="v1.0.0",

38

registry={

39

"dataset1.csv": "md5:ab12cd34ef56...",

40

"dataset2.zip": "sha256:12345abc...",

41

}

42

)

43

44

# Fetch files from the registry

45

data_file = data_manager.fetch("dataset1.csv")

46

```

47

48

## Architecture

49

50

Pooch is built around three main concepts:

51

52

- **Data Management**: Central `Pooch` class manages registries of files with their expected hashes and download URLs

53

- **Download Protocol**: Extensible downloader system supporting HTTP, FTP, SFTP, and DOI-based repositories

54

- **Post-Processing**: Processor chain for automatic decompression, unpacking, and custom transformations

55

56

This design enables scientific reproducibility by ensuring consistent data versions across different environments while supporting flexible data hosting and processing workflows.

57

58

## Capabilities

59

60

### Core Data Management

61

62

Primary functionality for downloading and caching individual files or managing collections of data files with version control and hash verification.

63

64

```python { .api }

65

def retrieve(url, known_hash, fname=None, path=None, processor=None, downloader=None, progressbar=False): ...

66

def create(path, base_url, version=None, version_dev="master", env=None, registry=None, urls=None, retry_if_failed=0, allow_updates=True): ...

67

class Pooch: ...

68

```

69

70

[Core Data Management](./core-data-management.md)

71

72

### File Download Protocols

73

74

Specialized downloader classes for different protocols and authentication methods, including HTTP/HTTPS with custom headers, FTP with authentication, SFTP, and DOI-based repository downloads.

75

76

```python { .api }

77

class HTTPDownloader: ...

78

class FTPDownloader: ...

79

class SFTPDownloader: ...

80

class DOIDownloader: ...

81

def choose_downloader(url, progressbar=False): ...

82

def doi_to_url(doi): ...

83

def doi_to_repository(doi): ...

84

```

85

86

[Download Protocols](./download-protocols.md)

87

88

### File Processing

89

90

Post-download processors for automatic decompression, archive extraction, and custom file transformations that execute after successful downloads.

91

92

```python { .api }

93

class Decompress: ...

94

class Unzip: ...

95

class Untar: ...

96

```

97

98

[File Processing](./file-processing.md)

99

100

### Utilities and Helpers

101

102

Helper functions for cache management, version handling, file hashing, and registry creation to support data management workflows.

103

104

```python { .api }

105

def os_cache(project): ...

106

def check_version(version, fallback="master"): ...

107

def file_hash(fname, alg="sha256"): ...

108

def make_registry(directory, output, recursive=True): ...

109

def get_logger(): ...

110

```

111

112

[Utilities and Helpers](./utilities-helpers.md)

113

114

## Version Information

115

116

```python { .api }

117

__version__: str # Package version string with 'v' prefix

118

```

119

120

## Testing

121

122

```python { .api }

123

def test(doctest=True, verbose=True, coverage=False): ...

124

```