CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-kedro

Kedro helps you build production-ready data and analytics pipelines

Overall
score

98%

Overview
Eval results
Files

task.mdevals/scenario-4/

Data Pipeline Versioning System

Build a data processing pipeline that tracks and manages multiple versions of datasets to enable time-travel and reproducibility.

Requirements

Create a three-stage data pipeline that processes customer data with automatic version tracking for all intermediate and output datasets.

Pipeline Structure

The pipeline should have three sequential processing nodes:

  1. Clean and preprocess raw customer data (input: raw_data, output: cleaned_data)
  2. Calculate summary metrics from cleaned data (input: cleaned_data, output: customer_metrics)
  3. Generate a final report from metrics (input: customer_metrics, output: final_report)

Versioning Configuration

Configure all three output datasets (cleaned_data, customer_metrics, final_report) to use automatic timestamp-based versioning.

Test Cases

  • Running the pipeline with raw_data {"name": "Alice", "age": 30} creates versioned datasets at timestamp T1 @test
  • Running the pipeline again with raw_data {"name": "Bob", "age": 25} creates new versions at timestamp T2 @test
  • Loading cleaned_data without a version specification returns the latest version (from T2) @test
  • Loading cleaned_data with version T1 returns the data from the first run @test

Implementation

@generates

API

def preprocess_data(raw_data: dict) -> dict:
    """Clean and preprocess raw customer data."""
    pass

def calculate_metrics(cleaned_data: dict) -> dict:
    """Calculate summary metrics from cleaned data."""
    pass

def generate_report(metrics: dict) -> dict:
    """Generate final report from metrics."""
    pass

def create_pipeline():
    """Create the data processing pipeline with three nodes."""
    pass

def create_versioned_catalog():
    """Create a data catalog with versioned datasets."""
    pass

Dependencies { .dependencies }

kedro { .dependency }

Provides data pipeline framework with versioning support.

@satisfied-by

Install with Tessl CLI

npx tessl i tessl/pypi-kedro

tile.json