tessl/pypi-kedro

Kedro helps you build production-ready data and analytics pipelines

Overall
score

98%

Overview

Eval results

Files

Data Processing Pipeline Manager

Name: tessl/pypi-kedro
Author: tessl

A utility that manages data processing workflows using a centralized data catalog system. The manager should provide a unified interface for loading, saving, and checking the existence of datasets.

Capabilities

Dataset Management

Given a catalog configuration dictionary, it creates a data catalog instance and saves a pandas DataFrame to a dataset named "raw_data" @test
Given a catalog with a saved dataset "processed_data", it loads the dataset and returns the data @test
Given a catalog, it correctly identifies whether a dataset "metrics" exists or not @test

Custom Dataset Support

It creates and registers a custom in-memory dataset that stores Python dictionaries @test

Implementation

@generates

API

from typing import Any, Dict

def create_catalog_from_config(config: Dict[str, Any]) -> Any:
    """
    Create a DataCatalog instance from a configuration dictionary.

    Args:
        config: Configuration dictionary for the catalog

    Returns:
        Data catalog instance
    """
    pass

def save_dataset(catalog: Any, dataset_name: str, data: Any) -> None:
    """
    Save data to a dataset in the catalog.

    Args:
        catalog: The data catalog instance
        dataset_name: Name of the dataset to save to
        data: Data to save
    """
    pass

def load_dataset(catalog: Any, dataset_name: str) -> Any:
    """
    Load data from a dataset in the catalog.

    Args:
        catalog: The data catalog instance
        dataset_name: Name of the dataset to load from

    Returns:
        The loaded data
    """
    pass

def dataset_exists(catalog: Any, dataset_name: str) -> bool:
    """
    Check if a dataset exists in the catalog.

    Args:
        catalog: The data catalog instance
        dataset_name: Name of the dataset to check

    Returns:
        True if dataset exists, False otherwise
    """
    pass

class DictMemoryDataset:
    """
    A custom in-memory dataset for storing Python dictionaries.
    Should implement load() and save() methods to work with the data catalog.
    """
    pass

Dependencies { .dependencies }

kedro { .dependency }

Provides data catalog management and dataset abstractions.

Install with Tessl CLI

npx tessl i tessl/pypi-kedro

tile.json