or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/kedro@1.1.x
tile.json

tessl/pypi-kedro

tessl install tessl/pypi-kedro@1.1.0

Kedro helps you build production-ready data and analytics pipelines

Agent Success

Agent success rate when using this tile

98%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.32x

Baseline

Agent success rate without this tile

74%

task.mdevals/scenario-5/

Dataset Pattern Resolver

A utility that resolves dataset names to their configuration templates based on pattern matching with specificity ranking.

Background

When working with large data pipelines, you often need to handle multiple datasets that follow similar naming conventions. Rather than configuring each dataset individually, pattern-based configuration allows you to define templates that match multiple dataset names. When multiple patterns could match a dataset name, the system needs to determine which pattern is the best match based on specificity.

Requirements

Build a dataset pattern resolver that accepts a list of pattern definitions and resolves dataset names to their configurations:

Pattern Format

  • Patterns use curly braces {} to denote placeholders (e.g., {namespace}, {name})
  • Patterns can mix literal text and placeholders (e.g., "data.int_{name}")
  • The asterisk * serves as a catch-all wildcard

Specificity Ranking

When multiple patterns match a dataset name:

  • Patterns with more literal characters are more specific
  • Among patterns with equal literals, those with more placeholders are more specific
  • The catch-all pattern * is always least specific

Functionality

  • Resolve dataset names to the configuration of the best matching pattern
  • Extract placeholder values from matched patterns
  • Return None when no pattern matches

Test Cases

  • Given patterns [("{namespace}.{name}", {"type": "A"}), ("data.{name}", {"type": "B"}), ("*", {"type": "C"})], resolving "data.sales" returns {"type": "B"} because "data.{name}" is more specific than "{namespace}.{name}" (more literals) @test

  • Given patterns [("{namespace}.{dataset}", {"type": "A"}), ("{prefix}_{suffix}", {"type": "B"})], resolving "sales.revenue" returns {"type": "A"} because it matches the pattern with a dot separator @test

  • Given patterns [("raw.{name}", {"type": "A"}), ("*", {"type": "B"})], resolving "processed.data" returns {"type": "B"} because only the catch-all matches @test

  • Given patterns [("{namespace}.{name}", {"type": "A"})], resolving "sales.revenue" also extracts placeholders as {"namespace": "sales", "name": "revenue"} @test

Implementation

@generates

API

class PatternResolver:
    """Resolves dataset names to configurations based on pattern matching with specificity ranking."""

    def __init__(self, patterns: list[tuple[str, dict]]):
        """
        Initialize the resolver with a list of patterns.

        Args:
            patterns: List of (pattern_string, config_dict) tuples
        """
        pass

    def resolve(self, dataset_name: str) -> dict | None:
        """
        Resolve a dataset name to its configuration.

        Args:
            dataset_name: The name of the dataset to resolve

        Returns:
            The configuration dict for the best matching pattern, or None if no match
        """
        pass

    def extract_placeholders(self, dataset_name: str) -> dict[str, str] | None:
        """
        Extract placeholder values from a dataset name using the best matching pattern.

        Args:
            dataset_name: The name of the dataset

        Returns:
            Dictionary mapping placeholder names to their values, or None if no match
        """
        pass

Dependencies { .dependencies }

kedro { .dependency }

Provides data catalog and pattern matching support.