or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

app-decorators.mdconfiguration.mddata-management.mdexecutors.mdindex.mdlaunchers.mdmonitoring.mdproviders.mdworkflow-management.md
tile.json

tessl/pypi-parsl

Parallel scripting library for executing workflows across diverse computing resources

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/parsl@2024.12.x

To install, run

npx @tessl/cli install tessl/pypi-parsl@2024.12.0

index.mddocs/

Parsl

Parsl (Parallel Scripting Library) is a Python library that extends parallelism in Python beyond a single computer. It enables users to chain functions together in multi-step workflows, automatically launching each function as inputs and computing resources become available. Parsl provides decorators to make functions parallel, supports execution across multiple cores and nodes, and offers configuration for various computing resources including local machines, clusters, and cloud platforms.

Package Information

  • Package Name: parsl
  • Language: Python
  • Installation: pip install parsl

Core Imports

import parsl
from parsl import *

For specific components:

from parsl import python_app, bash_app, join_app
from parsl.config import Config
from parsl.data_provider.files import File
from parsl.executors import HighThroughputExecutor, ThreadPoolExecutor, WorkQueueExecutor, MPIExecutor, FluxExecutor
from parsl.providers import LocalProvider, SlurmProvider
from parsl.monitoring import MonitoringHub

# RadicalPilotExecutor requires separate import (optional dependency)
from parsl.executors.radical import RadicalPilotExecutor

Basic Usage

Setting up Parsl with configuration and creating parallel apps:

import parsl
from parsl import python_app, bash_app
from parsl.config import Config
from parsl.executors import ThreadPoolExecutor

# Configure Parsl
config = Config(
    executors=[ThreadPoolExecutor(max_threads=4)]
)
parsl.load(config)

# Create parallel apps
@python_app
def add_numbers(a, b):
    return a + b

@bash_app
def create_file(filename, contents, outputs=[]):
    return f'echo "{contents}" > {outputs[0]}'

# Execute parallel tasks
future1 = add_numbers(10, 20)
future2 = add_numbers(30, 40)

# Get results
result1 = future1.result()  # 30
result2 = future2.result()  # 70

# Clean up
parsl.clear()

Architecture

Parsl's architecture enables scalable parallel execution:

  • DataFlowKernel (DFK): Core workflow execution engine managing task dependencies, scheduling, and data flow
  • Apps: Python functions decorated with @python_app, @bash_app, or @join_app that become parallel tasks
  • Executors: Execution backends that run tasks on various resources (local threads, HPC clusters, cloud)
  • Providers: Resource providers that interface with different computing platforms and schedulers
  • Launchers: Job launchers that handle task startup on HPC systems
  • Config: Configuration system binding executors, providers, and execution policies
  • File: Data management system for handling file dependencies across distributed execution
  • Monitoring: Optional system for tracking workflow execution, resource usage, and performance

This design allows workflows to scale from laptops to supercomputers while maintaining the same programming interface.

Capabilities

App Decorators

Core decorators that transform Python functions into parallel apps capable of distributed execution across various computing resources.

def python_app(function=None, data_flow_kernel=None, cache=False, 
               executors='all', ignore_for_cache=None): ...
def bash_app(function=None, data_flow_kernel=None, cache=False,
             executors='all', ignore_for_cache=None): ...
def join_app(function=None, data_flow_kernel=None, cache=False,
             ignore_for_cache=None): ...

App Decorators

Configuration System

Parsl configuration system for specifying executors, monitoring, checkpointing, and workflow execution policies.

class Config:
    def __init__(self, executors=None, app_cache=True, 
                 checkpoint_files=None, checkpoint_mode=None,
                 dependency_resolver=None, monitoring=None,
                 usage_tracking=None, initialize_logging=True): ...

Configuration

Execution Backends

Execution backends for running parallel tasks on different computing resources from local machines to HPC systems and cloud platforms.

class HighThroughputExecutor: ...
class ThreadPoolExecutor: ...
class WorkQueueExecutor: ...
class MPIExecutor: ...
class FluxExecutor: ...
class RadicalPilotExecutor: ...

Executors

Resource Providers

Resource providers that interface Parsl with various computing platforms, schedulers, and cloud services.

class LocalProvider: ...
class SlurmProvider: ...
class AWSProvider: ...
class KubernetesProvider: ...
# ... and 8 more providers

Providers

Data Management

File handling system supporting local and remote files with various protocols including Globus data transfer.

class File:
    def __init__(self, url): ...
    @property 
    def filepath(self): ...
    def cleancopy(self): ...

Data Management

Job Launchers

Command wrappers that handle job launching on different HPC systems and computing platforms, interfacing with various resource managers and execution environments.

class SimpleLauncher: ...
class SingleNodeLauncher: ...
class SrunLauncher: ...
class AprunLauncher: ...
class JsrunLauncher: ...
# ... and 5 more launchers

Launchers

Workflow Management

Core workflow management functions for loading configurations, managing execution state, and controlling task execution.

def load(config): ...
def clear(): ...
def wait_for_current_tasks(): ...
# Access to DataFlowKernel via parsl.dfk

Workflow Management

Monitoring and Logging

Monitoring system for tracking workflow execution, resource usage, and performance metrics with optional database storage.

class MonitoringHub:
    def __init__(self, hub_address=None, hub_port=None, 
                 monitoring_debug=False, resource_monitoring_interval=30): ...

def set_stream_logger(name='parsl', level=logging.DEBUG): ...
def set_file_logger(filename, name='parsl', level=logging.DEBUG): ...

Monitoring

Error Handling

# Core Parsl Errors
class ParslError(Exception): ...
class ConfigurationError(ParslError): ...
class OptionalModuleMissing(ParslError): ...
class InternalConsistencyError(ParslError): ...
class NoDataFlowKernelError(ParslError): ...

# App Execution Errors
class AppException(ParslError): ...
class BashExitFailure(AppException): ...
class AppTimeout(AppException): ...
class BashAppNoReturn(AppException): ...
class MissingOutputs(ParslError): ...
class BadStdStreamFile(ParslError): ...
class AppBadFormatting(ParslError): ...

# DataFlow Errors
class DataFlowException(ParslError): ...
class BadCheckpoint(DataFlowException): ...
class DependencyError(DataFlowException): ...
class JoinError(DataFlowException): ...

# Executor Errors
class ExecutorError(ParslError): ...
class BadStateException(ExecutorError): ...
class UnsupportedFeatureError(ExecutorError): ...
class InvalidResourceSpecification(ExecutorError): ...
class ScalingFailed(ExecutorError): ...

# Provider Errors
class ExecutionProviderException(ParslError): ...
class ScaleOutFailed(ExecutionProviderException): ...
class SubmitException(ExecutionProviderException): ...
class BadLauncher(ExecutionProviderException): ...

# Serialization Errors
class SerializationError(ParslError): ...
class DeserializationError(ParslError): ...

# Monitoring Errors
class MonitoringHubStartError(ParslError): ...

Common error scenarios include configuration validation failures, app execution timeouts, dependency resolution errors, executor scaling issues, job submission failures, and serialization problems across distributed workers.

Constants

AUTO_LOGNAME = -1  # Special value for automatic log filename construction