CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-toil

Pipeline management software for clusters.

Agent Success

Agent success rate when using this tile

67%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.05x

Baseline

Agent success rate without this tile

64%

Overview
Eval results
Files

task.mdevals/scenario-1/

Data Processing Pipeline with File Management

Build a pipeline that downloads data files from external sources, processes them, and exports the results to designated locations. The pipeline should handle temporary file management and coordinate data between jobs.

Requirements

Create a workflow with three jobs that demonstrates proper file management:

  1. Data Downloader Job: Downloads input files from external URLs and makes them available to downstream jobs

    • Accept a list of source URLs as input
    • Import each file into the job store
    • Return file references for downstream processing
  2. Data Processor Job: Processes the downloaded files and generates output

    • Read input files from the job store to local temporary locations
    • Process the data (count lines in the file and create a summary)
    • Write processed results to new files in the job store
    • Use temporary directories for intermediate work
    • Return references to the output files
  3. Data Exporter Job: Exports processed files to external destinations

    • Accept file references and destination URLs
    • Export files from the job store to the specified destinations
    • Confirm successful export

The workflow should:

  • Chain these jobs together with proper dependencies
  • Handle file lifecycle (import, storage, cleanup)
  • Use appropriate file management methods for each operation
  • Support common URL schemes (file://, http://, etc.)

Test Cases

  • When given a local file path as input, the pipeline successfully imports, processes, and exports the file @test
  • The processor job can read files written by the downloader job and create new output files @test
  • Temporary files created during processing are isolated per job @test
  • The exporter successfully writes files to the destination path @test

Implementation

@generates

API

from toil.job import Job

class DownloaderJob(Job):
    """Downloads files from external sources into the job store."""
    def __init__(self, source_urls):
        """
        Args:
            source_urls: List of source URLs to download
        """
        super().__init__()
        self.source_urls = source_urls

    def run(self, fileStore):
        """
        Import files and return their IDs.

        Returns:
            List of file IDs in the job store
        """
        pass

class ProcessorJob(Job):
    """Processes files from the job store."""
    def __init__(self, input_file_ids):
        """
        Args:
            input_file_ids: List of file IDs to process
        """
        super().__init__()
        self.input_file_ids = input_file_ids

    def run(self, fileStore):
        """
        Process input files and create output files.

        Returns:
            List of output file IDs
        """
        pass

class ExporterJob(Job):
    """Exports files from the job store to external destinations."""
    def __init__(self, file_ids, dest_urls):
        """
        Args:
            file_ids: List of file IDs to export
            dest_urls: List of destination URLs
        """
        super().__init__()
        self.file_ids = file_ids
        self.dest_urls = dest_urls

    def run(self, fileStore):
        """Export files to destinations."""
        pass

def create_pipeline(source_urls, dest_urls):
    """
    Create the complete file processing pipeline.

    Args:
        source_urls: List of source file URLs
        dest_urls: List of destination file URLs

    Returns:
        Root job for the pipeline
    """
    pass

Dependencies { .dependencies }

toil { .dependency }

Provides workflow management and file handling capabilities.

tessl i tessl/pypi-toil@9.0.0

tile.json