CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-toil

Pipeline management software for clusters.

Agent Success

Agent success rate when using this tile

67%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.05x

Baseline

Agent success rate without this tile

64%

Overview
Eval results
Files

task.mdevals/scenario-10/

Data Processing Pipeline with Job Promises

Build a data processing pipeline that uses job promises to pass data between workflow stages. The pipeline should analyze a list of numbers, compute statistics in parallel jobs, and aggregate results using the promise system.

Requirements

Implement a workflow with three stages:

  1. Data Splitter Job: Takes a list of numbers and splits it into two sublists (even and odd indices). Returns a dictionary with keys "even_indices" and "odd_indices".

  2. Stats Calculator Jobs: Two parallel jobs that each receive one sublist and compute:

    • sum: Sum of all numbers
    • count: Number of elements
    • max: Maximum value
    • min: Minimum value

    Each job returns a dictionary with these statistics.

  3. Aggregator Job: Receives both statistics dictionaries and computes combined statistics:

    • total_sum: Sum of both sums
    • total_count: Sum of both counts
    • overall_max: Maximum of both max values
    • overall_min: Minimum of both min values

    Returns the combined statistics dictionary.

The workflow should be constructed such that the stats calculator jobs receive their data via promises from the splitter job, and the aggregator job receives its data via promises from both calculator jobs.

Test Cases

  • When given the list [10, 20, 30, 40, 50, 60], the workflow returns {"total_sum": 210, "total_count": 6, "overall_max": 60, "overall_min": 10} @test

  • When given the list [5, -3, 8, -1, 12], the workflow returns {"total_sum": 21, "total_count": 5, "overall_max": 12, "overall_min": -3} @test

  • The stats calculator jobs correctly receive sublists via promise path selection from the splitter job's return value @test

Implementation

@generates

API

from toil.job import Job
from typing import List, Dict, Any

def create_pipeline(input_data: List[int]) -> Job:
    """
    Creates and returns the root job of the data processing pipeline.

    Args:
        input_data: List of integers to process

    Returns:
        The root Job object that orchestrates the pipeline
    """
    pass

Dependencies { .dependencies }

toil { .dependency }

Provides workflow management and promise-based data flow.

tessl i tessl/pypi-toil@9.0.0

tile.json