Parallel scripting library for executing workflows across diverse computing resources
npx @tessl/cli install tessl/pypi-parsl@2024.12.00
# Parsl
1
2
Parsl (Parallel Scripting Library) is a Python library that extends parallelism in Python beyond a single computer. It enables users to chain functions together in multi-step workflows, automatically launching each function as inputs and computing resources become available. Parsl provides decorators to make functions parallel, supports execution across multiple cores and nodes, and offers configuration for various computing resources including local machines, clusters, and cloud platforms.
3
4
## Package Information
5
6
- **Package Name**: parsl
7
- **Language**: Python
8
- **Installation**: `pip install parsl`
9
10
## Core Imports
11
12
```python
13
import parsl
14
from parsl import *
15
```
16
17
For specific components:
18
19
```python
20
from parsl import python_app, bash_app, join_app
21
from parsl.config import Config
22
from parsl.data_provider.files import File
23
from parsl.executors import HighThroughputExecutor, ThreadPoolExecutor, WorkQueueExecutor, MPIExecutor, FluxExecutor
24
from parsl.providers import LocalProvider, SlurmProvider
25
from parsl.monitoring import MonitoringHub
26
27
# RadicalPilotExecutor requires separate import (optional dependency)
28
from parsl.executors.radical import RadicalPilotExecutor
29
```
30
31
## Basic Usage
32
33
Setting up Parsl with configuration and creating parallel apps:
34
35
```python
36
import parsl
37
from parsl import python_app, bash_app
38
from parsl.config import Config
39
from parsl.executors import ThreadPoolExecutor
40
41
# Configure Parsl
42
config = Config(
43
executors=[ThreadPoolExecutor(max_threads=4)]
44
)
45
parsl.load(config)
46
47
# Create parallel apps
48
@python_app
49
def add_numbers(a, b):
50
return a + b
51
52
@bash_app
53
def create_file(filename, contents, outputs=[]):
54
return f'echo "{contents}" > {outputs[0]}'
55
56
# Execute parallel tasks
57
future1 = add_numbers(10, 20)
58
future2 = add_numbers(30, 40)
59
60
# Get results
61
result1 = future1.result() # 30
62
result2 = future2.result() # 70
63
64
# Clean up
65
parsl.clear()
66
```
67
68
## Architecture
69
70
Parsl's architecture enables scalable parallel execution:
71
72
- **DataFlowKernel (DFK)**: Core workflow execution engine managing task dependencies, scheduling, and data flow
73
- **Apps**: Python functions decorated with `@python_app`, `@bash_app`, or `@join_app` that become parallel tasks
74
- **Executors**: Execution backends that run tasks on various resources (local threads, HPC clusters, cloud)
75
- **Providers**: Resource providers that interface with different computing platforms and schedulers
76
- **Launchers**: Job launchers that handle task startup on HPC systems
77
- **Config**: Configuration system binding executors, providers, and execution policies
78
- **File**: Data management system for handling file dependencies across distributed execution
79
- **Monitoring**: Optional system for tracking workflow execution, resource usage, and performance
80
81
This design allows workflows to scale from laptops to supercomputers while maintaining the same programming interface.
82
83
## Capabilities
84
85
### App Decorators
86
87
Core decorators that transform Python functions into parallel apps capable of distributed execution across various computing resources.
88
89
```python { .api }
90
def python_app(function=None, data_flow_kernel=None, cache=False,
91
executors='all', ignore_for_cache=None): ...
92
def bash_app(function=None, data_flow_kernel=None, cache=False,
93
executors='all', ignore_for_cache=None): ...
94
def join_app(function=None, data_flow_kernel=None, cache=False,
95
ignore_for_cache=None): ...
96
```
97
98
[App Decorators](./app-decorators.md)
99
100
### Configuration System
101
102
Parsl configuration system for specifying executors, monitoring, checkpointing, and workflow execution policies.
103
104
```python { .api }
105
class Config:
106
def __init__(self, executors=None, app_cache=True,
107
checkpoint_files=None, checkpoint_mode=None,
108
dependency_resolver=None, monitoring=None,
109
usage_tracking=None, initialize_logging=True): ...
110
```
111
112
[Configuration](./configuration.md)
113
114
### Execution Backends
115
116
Execution backends for running parallel tasks on different computing resources from local machines to HPC systems and cloud platforms.
117
118
```python { .api }
119
class HighThroughputExecutor: ...
120
class ThreadPoolExecutor: ...
121
class WorkQueueExecutor: ...
122
class MPIExecutor: ...
123
class FluxExecutor: ...
124
class RadicalPilotExecutor: ...
125
```
126
127
[Executors](./executors.md)
128
129
### Resource Providers
130
131
Resource providers that interface Parsl with various computing platforms, schedulers, and cloud services.
132
133
```python { .api }
134
class LocalProvider: ...
135
class SlurmProvider: ...
136
class AWSProvider: ...
137
class KubernetesProvider: ...
138
# ... and 8 more providers
139
```
140
141
[Providers](./providers.md)
142
143
### Data Management
144
145
File handling system supporting local and remote files with various protocols including Globus data transfer.
146
147
```python { .api }
148
class File:
149
def __init__(self, url): ...
150
@property
151
def filepath(self): ...
152
def cleancopy(self): ...
153
```
154
155
[Data Management](./data-management.md)
156
157
### Job Launchers
158
159
Command wrappers that handle job launching on different HPC systems and computing platforms, interfacing with various resource managers and execution environments.
160
161
```python { .api }
162
class SimpleLauncher: ...
163
class SingleNodeLauncher: ...
164
class SrunLauncher: ...
165
class AprunLauncher: ...
166
class JsrunLauncher: ...
167
# ... and 5 more launchers
168
```
169
170
[Launchers](./launchers.md)
171
172
### Workflow Management
173
174
Core workflow management functions for loading configurations, managing execution state, and controlling task execution.
175
176
```python { .api }
177
def load(config): ...
178
def clear(): ...
179
def wait_for_current_tasks(): ...
180
# Access to DataFlowKernel via parsl.dfk
181
```
182
183
[Workflow Management](./workflow-management.md)
184
185
### Monitoring and Logging
186
187
Monitoring system for tracking workflow execution, resource usage, and performance metrics with optional database storage.
188
189
```python { .api }
190
class MonitoringHub:
191
def __init__(self, hub_address=None, hub_port=None,
192
monitoring_debug=False, resource_monitoring_interval=30): ...
193
194
def set_stream_logger(name='parsl', level=logging.DEBUG): ...
195
def set_file_logger(filename, name='parsl', level=logging.DEBUG): ...
196
```
197
198
[Monitoring](./monitoring.md)
199
200
## Error Handling
201
202
```python { .api }
203
# Core Parsl Errors
204
class ParslError(Exception): ...
205
class ConfigurationError(ParslError): ...
206
class OptionalModuleMissing(ParslError): ...
207
class InternalConsistencyError(ParslError): ...
208
class NoDataFlowKernelError(ParslError): ...
209
210
# App Execution Errors
211
class AppException(ParslError): ...
212
class BashExitFailure(AppException): ...
213
class AppTimeout(AppException): ...
214
class BashAppNoReturn(AppException): ...
215
class MissingOutputs(ParslError): ...
216
class BadStdStreamFile(ParslError): ...
217
class AppBadFormatting(ParslError): ...
218
219
# DataFlow Errors
220
class DataFlowException(ParslError): ...
221
class BadCheckpoint(DataFlowException): ...
222
class DependencyError(DataFlowException): ...
223
class JoinError(DataFlowException): ...
224
225
# Executor Errors
226
class ExecutorError(ParslError): ...
227
class BadStateException(ExecutorError): ...
228
class UnsupportedFeatureError(ExecutorError): ...
229
class InvalidResourceSpecification(ExecutorError): ...
230
class ScalingFailed(ExecutorError): ...
231
232
# Provider Errors
233
class ExecutionProviderException(ParslError): ...
234
class ScaleOutFailed(ExecutionProviderException): ...
235
class SubmitException(ExecutionProviderException): ...
236
class BadLauncher(ExecutionProviderException): ...
237
238
# Serialization Errors
239
class SerializationError(ParslError): ...
240
class DeserializationError(ParslError): ...
241
242
# Monitoring Errors
243
class MonitoringHubStartError(ParslError): ...
244
```
245
246
Common error scenarios include configuration validation failures, app execution timeouts, dependency resolution errors, executor scaling issues, job submission failures, and serialization problems across distributed workers.
247
248
## Constants
249
250
```python { .api }
251
AUTO_LOGNAME = -1 # Special value for automatic log filename construction
252
```