or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/kedro@1.1.x
tile.json

tessl/pypi-kedro

tessl install tessl/pypi-kedro@1.1.0

Kedro helps you build production-ready data and analytics pipelines

Agent Success

Agent success rate when using this tile

98%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.32x

Baseline

Agent success rate without this tile

74%

task.mdevals/scenario-2/

Data Pipeline Versioning System

Build a data processing pipeline that tracks and manages multiple versions of datasets to enable time-travel and reproducibility.

Requirements

Create a three-stage data pipeline that processes customer data with automatic version tracking for all intermediate and output datasets.

Pipeline Structure

The pipeline should have three sequential processing nodes:

  1. Clean and preprocess raw customer data (input: raw_data, output: cleaned_data)
  2. Calculate summary metrics from cleaned data (input: cleaned_data, output: customer_metrics)
  3. Generate a final report from metrics (input: customer_metrics, output: final_report)

Versioning Configuration

Configure all three output datasets (cleaned_data, customer_metrics, final_report) to use automatic timestamp-based versioning.

Test Cases

  • Running the pipeline with raw_data {"name": "Alice", "age": 30} creates versioned datasets at timestamp T1 @test
  • Running the pipeline again with raw_data {"name": "Bob", "age": 25} creates new versions at timestamp T2 @test
  • Loading cleaned_data without a version specification returns the latest version (from T2) @test
  • Loading cleaned_data with version T1 returns the data from the first run @test

Implementation

@generates

API

def preprocess_data(raw_data: dict) -> dict:
    """Clean and preprocess raw customer data."""
    pass

def calculate_metrics(cleaned_data: dict) -> dict:
    """Calculate summary metrics from cleaned data."""
    pass

def generate_report(metrics: dict) -> dict:
    """Generate final report from metrics."""
    pass

def create_pipeline():
    """Create the data processing pipeline with three nodes."""
    pass

def create_versioned_catalog():
    """Create a data catalog with versioned datasets."""
    pass

Dependencies { .dependencies }

kedro { .dependency }

Provides data pipeline framework with versioning support.

@satisfied-by