Lightweight pipelining with Python functions for disk-caching, parallel computing, and fast compressed persistence
npx @tessl/cli install tessl/pypi-joblib@1.5.00
# Joblib
1
2
Lightweight pipelining with Python functions providing transparent disk-caching, embarrassingly parallel computing, and fast compressed persistence. Joblib is optimized for fast and robust operation on large data with specific optimizations for NumPy arrays, serving as a foundation for computational-heavy jobs requiring performance and reproducibility.
3
4
## Package Information
5
6
- **Package Name**: joblib
7
- **Language**: Python (3.9+)
8
- **Installation**: `pip install joblib`
9
10
## Core Imports
11
12
```python
13
import joblib
14
```
15
16
Common imports for specific functionality:
17
18
```python
19
from joblib import Memory, Parallel, delayed
20
from joblib import dump, load
21
```
22
23
## Basic Usage
24
25
```python
26
from joblib import Memory, Parallel, delayed, dump, load
27
import numpy as np
28
29
# 1. Caching expensive computations
30
mem = Memory(location='./cache')
31
32
@mem.cache
33
def expensive_function(x):
34
return x ** 2 + np.sum(x)
35
36
data = np.random.random(1000)
37
result = expensive_function(data) # Cached for future calls
38
39
# 2. Parallel processing
40
def process_item(item):
41
return item ** 2
42
43
results = Parallel(n_jobs=4)(delayed(process_item)(i) for i in range(10))
44
45
# 3. Fast persistence
46
large_data = {'array': np.random.random((1000, 1000)), 'metadata': {'version': 1}}
47
dump(large_data, 'data.pkl', compress=3)
48
loaded_data = load('data.pkl')
49
```
50
51
## Architecture
52
53
Joblib provides three core architectural components:
54
55
- **Memory System**: Transparent disk-caching with configurable storage backends and automatic invalidation
56
- **Parallel Execution**: Backend-agnostic parallel processing with automatic load balancing and error handling
57
- **Persistence Layer**: Optimized serialization for NumPy arrays with compression and memory mapping support
58
59
This design enables seamless integration into scientific computing workflows, providing performance optimizations while maintaining simple, readable code patterns.
60
61
## Capabilities
62
63
### Memory and Caching
64
65
Transparent disk-caching of function results using the memoize pattern. Provides automatic cache invalidation, configurable storage backends, and memory-mapped array support for handling large datasets efficiently.
66
67
```python { .api }
68
class Memory(Logger):
69
def __init__(self, location=None, backend="local", mmap_mode=None, compress=False, verbose=1, backend_options=None): ...
70
def cache(self, func=None, ignore=None, verbose=None, mmap_mode=False, cache_validation_callback=None): ...
71
72
class MemorizedResult:
73
def get(self): ...
74
def clear(self): ...
75
76
def expires_after(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0): ...
77
def register_store_backend(backend_name, backend): ...
78
```
79
80
[Memory and Caching](./memory-caching.md)
81
82
### Parallel Processing
83
84
Embarrassingly parallel computing with readable list comprehension syntax. Supports multiple backends (threading, multiprocessing, loky, dask) with automatic backend selection and comprehensive configuration options.
85
86
```python { .api }
87
class Parallel(Logger):
88
def __init__(self, n_jobs=None, backend=None, return_as="list", verbose=0, timeout=None,
89
pre_dispatch="2 * n_jobs", batch_size="auto", temp_folder=None,
90
max_nbytes="1M", mmap_mode="r", prefer=None, require=None, **backend_kwargs): ...
91
def __call__(self, iterable): ...
92
93
def delayed(function): ...
94
def cpu_count(only_physical_cores=False): ...
95
def effective_n_jobs(n_jobs=-1): ...
96
97
class parallel_config: ...
98
class parallel_backend: ...
99
def register_parallel_backend(name, factory, make_default=False): ...
100
```
101
102
[Parallel Processing](./parallel-processing.md)
103
104
### Persistence and Serialization
105
106
Fast compressed persistence optimized for Python objects containing large NumPy arrays. Provides memory mapping support, multiple compression algorithms, and cross-platform compatibility as a replacement for pickle.
107
108
```python { .api }
109
def dump(value, filename, compress=0, protocol=None): ...
110
def load(filename, mmap_mode=None, ensure_native_byte_order="auto"): ...
111
```
112
113
[Persistence and Serialization](./persistence-serialization.md)
114
115
### Utilities and Infrastructure
116
117
Core utilities including object hashing, logging with timing, backend infrastructure, and compression management for extending joblib's functionality.
118
119
```python { .api }
120
def hash(obj, hash_name="md5", coerce_mmap=False): ...
121
def wrap_non_picklable_objects(obj, keep_wrapper=True): ...
122
123
class Logger:
124
def __init__(self, depth=3, name=None): ...
125
126
class PrintTime:
127
def __init__(self, logfile=None, logdir=None): ...
128
129
class ParallelBackendBase: ...
130
class StoreBackendBase: ...
131
def register_compressor(compressor_name, compressor, force=False): ...
132
```
133
134
[Utilities and Infrastructure](./utilities-infrastructure.md)
135
136
## Types
137
138
```python { .api }
139
# Core backend types
140
class ParallelBackendBase:
141
"""Abstract base class for parallel execution backends."""
142
default_n_jobs = 1
143
supports_inner_max_num_threads = False
144
supports_retrieve_callback = False
145
supports_return_generator = False
146
supports_timeout = False
147
148
def effective_n_jobs(self, n_jobs): ...
149
def submit(self, func, callback=None): ...
150
def retrieve_result(self, futures, timeout=None): ...
151
152
class StoreBackendBase:
153
"""Abstract base class for storage backends."""
154
def _open_item(self, f, mode): ...
155
def _item_exists(self, location): ...
156
def _move_item(self, src, dst): ...
157
def clear_item(self, call_id): ...
158
def clear_path(self, path): ...
159
def clear(self): ...
160
161
# Configuration types
162
from typing import Union, Optional
163
164
BackendName = str # "threading", "multiprocessing", "loky", "sequential", "dask"
165
CompressionLevel = Union[int, bool, tuple] # 0-9, True/False, or (method, level)
166
MMapMode = Optional[str] # None, "r+", "r", "w+", "c"
167
```