or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-joblib

Lightweight pipelining with Python functions for disk-caching, parallel computing, and fast compressed persistence

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/joblib@1.5.x

To install, run

npx @tessl/cli install tessl/pypi-joblib@1.5.0

0

# Joblib

1

2

Lightweight pipelining with Python functions providing transparent disk-caching, embarrassingly parallel computing, and fast compressed persistence. Joblib is optimized for fast and robust operation on large data with specific optimizations for NumPy arrays, serving as a foundation for computational-heavy jobs requiring performance and reproducibility.

3

4

## Package Information

5

6

- **Package Name**: joblib

7

- **Language**: Python (3.9+)

8

- **Installation**: `pip install joblib`

9

10

## Core Imports

11

12

```python

13

import joblib

14

```

15

16

Common imports for specific functionality:

17

18

```python

19

from joblib import Memory, Parallel, delayed

20

from joblib import dump, load

21

```

22

23

## Basic Usage

24

25

```python

26

from joblib import Memory, Parallel, delayed, dump, load

27

import numpy as np

28

29

# 1. Caching expensive computations

30

mem = Memory(location='./cache')

31

32

@mem.cache

33

def expensive_function(x):

34

return x ** 2 + np.sum(x)

35

36

data = np.random.random(1000)

37

result = expensive_function(data) # Cached for future calls

38

39

# 2. Parallel processing

40

def process_item(item):

41

return item ** 2

42

43

results = Parallel(n_jobs=4)(delayed(process_item)(i) for i in range(10))

44

45

# 3. Fast persistence

46

large_data = {'array': np.random.random((1000, 1000)), 'metadata': {'version': 1}}

47

dump(large_data, 'data.pkl', compress=3)

48

loaded_data = load('data.pkl')

49

```

50

51

## Architecture

52

53

Joblib provides three core architectural components:

54

55

- **Memory System**: Transparent disk-caching with configurable storage backends and automatic invalidation

56

- **Parallel Execution**: Backend-agnostic parallel processing with automatic load balancing and error handling

57

- **Persistence Layer**: Optimized serialization for NumPy arrays with compression and memory mapping support

58

59

This design enables seamless integration into scientific computing workflows, providing performance optimizations while maintaining simple, readable code patterns.

60

61

## Capabilities

62

63

### Memory and Caching

64

65

Transparent disk-caching of function results using the memoize pattern. Provides automatic cache invalidation, configurable storage backends, and memory-mapped array support for handling large datasets efficiently.

66

67

```python { .api }

68

class Memory(Logger):

69

def __init__(self, location=None, backend="local", mmap_mode=None, compress=False, verbose=1, backend_options=None): ...

70

def cache(self, func=None, ignore=None, verbose=None, mmap_mode=False, cache_validation_callback=None): ...

71

72

class MemorizedResult:

73

def get(self): ...

74

def clear(self): ...

75

76

def expires_after(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0): ...

77

def register_store_backend(backend_name, backend): ...

78

```

79

80

[Memory and Caching](./memory-caching.md)

81

82

### Parallel Processing

83

84

Embarrassingly parallel computing with readable list comprehension syntax. Supports multiple backends (threading, multiprocessing, loky, dask) with automatic backend selection and comprehensive configuration options.

85

86

```python { .api }

87

class Parallel(Logger):

88

def __init__(self, n_jobs=None, backend=None, return_as="list", verbose=0, timeout=None,

89

pre_dispatch="2 * n_jobs", batch_size="auto", temp_folder=None,

90

max_nbytes="1M", mmap_mode="r", prefer=None, require=None, **backend_kwargs): ...

91

def __call__(self, iterable): ...

92

93

def delayed(function): ...

94

def cpu_count(only_physical_cores=False): ...

95

def effective_n_jobs(n_jobs=-1): ...

96

97

class parallel_config: ...

98

class parallel_backend: ...

99

def register_parallel_backend(name, factory, make_default=False): ...

100

```

101

102

[Parallel Processing](./parallel-processing.md)

103

104

### Persistence and Serialization

105

106

Fast compressed persistence optimized for Python objects containing large NumPy arrays. Provides memory mapping support, multiple compression algorithms, and cross-platform compatibility as a replacement for pickle.

107

108

```python { .api }

109

def dump(value, filename, compress=0, protocol=None): ...

110

def load(filename, mmap_mode=None, ensure_native_byte_order="auto"): ...

111

```

112

113

[Persistence and Serialization](./persistence-serialization.md)

114

115

### Utilities and Infrastructure

116

117

Core utilities including object hashing, logging with timing, backend infrastructure, and compression management for extending joblib's functionality.

118

119

```python { .api }

120

def hash(obj, hash_name="md5", coerce_mmap=False): ...

121

def wrap_non_picklable_objects(obj, keep_wrapper=True): ...

122

123

class Logger:

124

def __init__(self, depth=3, name=None): ...

125

126

class PrintTime:

127

def __init__(self, logfile=None, logdir=None): ...

128

129

class ParallelBackendBase: ...

130

class StoreBackendBase: ...

131

def register_compressor(compressor_name, compressor, force=False): ...

132

```

133

134

[Utilities and Infrastructure](./utilities-infrastructure.md)

135

136

## Types

137

138

```python { .api }

139

# Core backend types

140

class ParallelBackendBase:

141

"""Abstract base class for parallel execution backends."""

142

default_n_jobs = 1

143

supports_inner_max_num_threads = False

144

supports_retrieve_callback = False

145

supports_return_generator = False

146

supports_timeout = False

147

148

def effective_n_jobs(self, n_jobs): ...

149

def submit(self, func, callback=None): ...

150

def retrieve_result(self, futures, timeout=None): ...

151

152

class StoreBackendBase:

153

"""Abstract base class for storage backends."""

154

def _open_item(self, f, mode): ...

155

def _item_exists(self, location): ...

156

def _move_item(self, src, dst): ...

157

def clear_item(self, call_id): ...

158

def clear_path(self, path): ...

159

def clear(self): ...

160

161

# Configuration types

162

from typing import Union, Optional

163

164

BackendName = str # "threading", "multiprocessing", "loky", "sequential", "dask"

165

CompressionLevel = Union[int, bool, tuple] # 0-9, True/False, or (method, level)

166

MMapMode = Optional[str] # None, "r+", "r", "w+", "c"

167

```