or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

batch-systems.mdcore-workflow.mdfile-management.mdindex.mdjob-stores.mdprovisioning.mdutilities.mdworkflow-languages.md
tile.json

tessl/pypi-toil

Pipeline management software for clusters.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/toil@9.0.x

To install, run

npx @tessl/cli install tessl/pypi-toil@9.0.0

index.mddocs/

Toil Python Package

Overview

Toil is a comprehensive Python pipeline management and workflow execution system designed for distributed computing environments. It provides robust job scheduling, cloud provisioning, and container execution capabilities across various batch systems including Slurm, LSF, Kubernetes, and local execution. Toil supports multiple workflow languages including CWL (Common Workflow Language) and WDL (Workflow Description Language), making it a versatile solution for scientific computing and data processing pipelines.

Package Information

  • Package Name: toil
  • Version: 9.0.0 (managed dynamically via toil.version module)
  • Description: Pipeline management software for clusters supporting distributed computing, cloud provisioning, and container execution
  • Main Module: toil
  • Installation: pip install toil

Core Imports

# Essential workflow components
from toil.common import Toil, Config
from toil.job import Job, JobDescription, Promise, AcceleratorRequirement
from toil.fileStores import AbstractFileStore, FileID

# Exception handling
from toil.exceptions import FailedJobsException

# Utility functions
from toil.lib.conversions import human2bytes, bytes2human
from toil.lib.retry import retry
from toil import physicalMemory, physicalDisk, toilPackageDirPath

Basic Usage

Simple Workflow Creation

from toil.common import Toil, Config
from toil.job import Job

class HelloWorldJob(Job):
    def __init__(self, message):
        # 100MB memory, 1 core, 100MB disk
        super().__init__(memory=100*1024*1024, cores=1, disk=100*1024*1024)
        self.message = message
    
    def run(self, fileStore):
        fileStore.logToMaster(f"Hello {self.message}")
        return f"Processed: {self.message}"

# Create and run workflow
if __name__ == "__main__":
    config = Config()
    config.jobStore = "file:my-job-store"
    config.logLevel = "INFO"
    
    with Toil(config) as toil:
        root_job = HelloWorldJob("World")
        result = toil.start(root_job)
        print(f"Result: {result}")

Function-Based Jobs

from toil.common import Toil, Config
from toil.job import Job

@Job.wrapJobFn
def process_data(job, input_data):
    # Job automatically gets memory=2G, cores=1, disk=2G by default
    job.fileStore.logToMaster(f"Processing: {input_data}")
    return input_data.upper()

@Job.wrapJobFn 
def combine_results(job, *results):
    combined = " + ".join(results)
    job.fileStore.logToMaster(f"Combined: {combined}")
    return combined

if __name__ == "__main__":
    config = Config()
    config.jobStore = "file:my-job-store"
    
    with Toil(config) as toil:
        # Create processing jobs
        job1 = process_data("hello")
        job2 = process_data("world")
        
        # Chain jobs together
        final_job = combine_results(job1.rv(), job2.rv())
        job1.addFollowOn(final_job)
        job2.addFollowOn(final_job)
        
        result = toil.start(job1)
        print(f"Final result: {result}")

Architecture

Toil's architecture consists of several key components that work together to provide scalable workflow execution:

Core Components

  1. Job Management Layer: Job, JobDescription, and Promise classes handle job definition, scheduling, and result handling
  2. Batch System Layer: Abstracts different compute environments (local, Slurm, Kubernetes, cloud services)
  3. Job Store Layer: Persistent storage for job metadata and workflow state (file system, AWS S3, Google Cloud Storage)
  4. File Store Layer: Manages file I/O operations and temporary file handling during job execution
  5. Leader-Worker Architecture: Centralized leader coordinates job scheduling while distributed workers execute tasks
  6. Provisioning Layer: Automatic cloud resource provisioning and scaling

Workflow Execution Flow

  1. Configuration: Define workflow parameters using Config class
  2. Job Definition: Create job hierarchy using Job classes or function decorators
  3. Workflow Execution: Use Toil context manager to execute the workflow
  4. Resource Management: Automatic allocation and cleanup of compute and storage resources
  5. Result Handling: Collect results through Promise objects and return values

Capabilities

Core Workflow Management

{ .api }

Basic job creation, execution, and chaining capabilities with resource management and promise-based result handling.

Key APIs:

  • Job(memory, cores, disk, accelerators, preemptible, checkpoint) - Job definition with resource requirements
  • Job.addChild(childJob) - Add dependent child jobs
  • Job.addFollowOn(followOnJob) - Add sequential follow-on jobs
  • Job.rv(*path) - Create promise for job return value
  • Toil(config).start(rootJob) - Execute workflow with root job
  • Config() - Workflow configuration and batch system settings

Core Workflow Management

Batch System Integration

{ .api }

Support for multiple compute environments including local execution, HPC schedulers, and cloud services.

Key APIs:

  • AbstractBatchSystem.issueBatchJob(jobNode) - Submit job to batch system
  • AbstractBatchSystem.getUpdatedBatchJob(maxWait) - Monitor job status
  • AbstractScalableBatchSystem.nodeTypes() - Query available node types
  • KubernetesBatchSystem, SlurmBatchSystem, LSFBatchSystem - Concrete implementations
  • BatchJobExitReason - Job completion status enumeration

Batch System Integration

Job Store Management

{ .api }

Persistent storage backends for workflow metadata and state management across different storage systems.

Key APIs:

  • AbstractJobStore.create(jobDescription) - Store job metadata
  • AbstractJobStore.load(jobStoreID) - Retrieve job by ID
  • AbstractJobStore.writeFile(localFilePath) - Store file in job store
  • AbstractJobStore.importFile(srcUrl, sharedFileName) - Import external files
  • FileJobStore, AWSJobStore, GoogleJobStore - Storage backend implementations

Job Store Management

File Management

{ .api }

Comprehensive file handling for temporary files, shared data, and persistent storage during workflow execution.

Key APIs:

  • AbstractFileStore.writeGlobalFile(localFileName) - Store globally accessible files
  • AbstractFileStore.readGlobalFile(fileStoreID, userPath, cache) - Read shared files
  • AbstractFileStore.getLocalTempDir() - Get temporary directory
  • AbstractFileStore.logToMaster(text, level) - Send logs to workflow leader
  • FileID - File identifier type for referencing stored files

File Management

Workflow Language Integration

{ .api }

Native support for CWL and WDL workflow specifications with seamless translation to Toil execution.

Key APIs:

  • toil-cwl-runner - Command-line CWL workflow execution
  • toil-wdl-runner - Command-line WDL workflow execution
  • toil.cwl.cwltoil.main() - Programmatic CWL execution
  • toil.wdl.wdltoil.main() - Programmatic WDL execution
  • CWL and WDL utility functions for workflow processing

Workflow Language Integration

Cloud Provisioning

{ .api }

Automatic cloud resource provisioning and cluster management for scalable workflow execution.

Key APIs:

  • AbstractProvisioner - Base provisioner interface
  • AWS, Google Cloud, Azure provisioners - Cloud-specific implementations
  • toil-launch-cluster - Cluster creation utility
  • toil-destroy-cluster - Cluster cleanup utility
  • Dynamic node scaling and resource management

Cloud Provisioning

Utilities and CLI Tools

{ .api }

Comprehensive command-line tools and utilities for workflow management, debugging, and monitoring.

Key APIs:

  • toil - Main CLI interface for workflow execution
  • toil-stats - Statistics collection and analysis
  • toil-status - Workflow monitoring and status
  • toil-clean - Cleanup utilities and job store management
  • toil-kill - Workflow termination utilities
  • Various debugging and cluster management tools

Utilities and CLI Tools