CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-toil

Pipeline management software for clusters.

Agent Success

Agent success rate when using this tile

67%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.05x

Baseline

Agent success rate without this tile

64%

Overview
Eval results
Files

task.mdevals/scenario-6/

Data Quality Workflow

Build a data processing pipeline that conditionally processes datasets based on quality checks. The workflow should validate input data and only proceed with expensive processing steps if quality thresholds are met.

Requirements

Create a workflow that:

  1. Takes a data file path as input
  2. Runs an initial quality check that returns a quality score (0-100)
  3. Conditionally executes different processing paths:
    • If quality score >= 80: Run full processing pipeline
    • If quality score >= 50 and < 80: Run partial processing with data cleaning
    • If quality score < 50: Skip processing and generate an error report
  4. Each processing path should be a separate job in the workflow
  5. The workflow should return the final results or error report

Implementation Details

  • Use the workflow system's native Python API to define jobs and control flow
  • Implement conditional execution using return values from the quality check job
  • Create a job hierarchy where child jobs are added conditionally based on the quality score
  • Handle the different execution paths within the same workflow definition

Test Cases

  • Given input file with quality score 85, workflow executes FullProcessingJob and returns "fully_processed" @test
  • Given input file with quality score 65, workflow executes PartialProcessingJob and returns "partially_processed" @test
  • Given input file with quality score 30, workflow executes ErrorReportJob and returns "error_report" @test

Implementation

@generates

API

from toil.job import Job
from toil.common import Toil

class QualityCheckJob(Job):
    """Performs initial quality check on input data."""
    def run(self, fileStore):
        # Returns quality score (0-100)
        pass

class FullProcessingJob(Job):
    """Runs full processing pipeline for high-quality data."""
    def run(self, fileStore):
        pass

class PartialProcessingJob(Job):
    """Runs partial processing with cleaning for medium-quality data."""
    def run(self, fileStore):
        pass

class ErrorReportJob(Job):
    """Generates error report for low-quality data."""
    def run(self, fileStore):
        pass

class DataWorkflow(Job):
    """Main workflow that conditionally executes processing based on quality."""
    def __init__(self, input_file):
        super().__init__()
        self.input_file = input_file

    def run(self, fileStore):
        # Implement conditional workflow logic
        pass

def main():
    """Entry point that creates and runs the workflow."""
    pass

Dependencies { .dependencies }

toil { .dependency }

Provides workflow management and job orchestration capabilities.

@satisfied-by

tessl i tessl/pypi-toil@9.0.0

tile.json