or run

tessl search
Log in

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/kedro@1.1.x

docs

index.md
tile.json

tessl/pypi-kedro

tessl install tessl/pypi-kedro@1.1.0

Kedro helps you build production-ready data and analytics pipelines

Agent Success

Agent success rate when using this tile

98%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.32x

Baseline

Agent success rate without this tile

74%

configuration-management.mddocs/guides/

Configuration Management Guide

Best practices for managing multi-environment configurations in Kedro.

Basic Configuration Structure

conf/
├── base/                 # Shared configuration (committed to git)
│   ├── catalog.yml      # Data catalog
│   ├── parameters.yml   # Pipeline parameters
│   └── logging.yml      # Logging configuration
├── local/                # Local overrides (gitignored)
│   ├── catalog.yml      # Local data sources
│   └── credentials.yml  # Local credentials
└── prod/                 # Production environment
    ├── catalog.yml      # Production data sources
    └── parameters.yml   # Production parameters

Loading Configuration

from kedro.config import OmegaConfigLoader

# Load configuration for specific environment
loader = OmegaConfigLoader(
    conf_source="conf",
    env="prod"  # Loads base + prod configs
)

# Access configurations
catalog_config = loader["catalog"]
parameters = loader["parameters"]

Environment-Specific Configuration

Base Configuration

# conf/base/parameters.yml
model:
  learning_rate: 0.001
  epochs: 100

data:
  path: "/data/project"

Environment Override

# conf/prod/parameters.yml
model:
  epochs: 200  # Override base value

data:
  path: "/prod/data"  # Override base value

Result (with SOFT merge - default)

model:
  learning_rate: 0.001  # From base
  epochs: 200           # From prod (overridden)

data:
  path: "/prod/data"    # From prod (overridden)

Merge Strategies

SOFT Merge (Default)

Recursively merges configurations:

loader = OmegaConfigLoader(
    conf_source="conf",
    merge_strategy={"parameters": "soft"}  # Default
)

DESTRUCTIVE Merge

Completely replaces base configuration:

loader = OmegaConfigLoader(
    conf_source="conf",
    merge_strategy={"parameters": "destructive"}
)

Runtime Parameter Overrides

# Override parameters at runtime
loader = OmegaConfigLoader(
    conf_source="conf",
    env="prod",
    runtime_params={
        "model.learning_rate": 0.01,
        "data.batch_size": 128
    }
)

Variable Interpolation

# parameters.yml
data_dir: /data/project

raw_data_path: ${data_dir}/raw
processed_data_path: ${data_dir}/processed

model:
  learning_rate: 0.001

training:
  lr: ${model.learning_rate}  # Reference other parameters

Credentials Management

Store Credentials Separately

# conf/base/catalog.yml
database:
  type: pandas.SQLTableDataset
  credentials: db_credentials  # Reference credentials
  table_name: users

# conf/local/credentials.yml (gitignored)
db_credentials:
  con: postgresql://user:password@localhost:5432/dbname

Environment-Specific Credentials

# conf/local/credentials.yml
db_credentials:
  con: postgresql://user:password@localhost:5432/local_db

# conf/prod/credentials.yml
db_credentials:
  con: postgresql://user:password@prod-server:5432/prod_db

Common Patterns

Pattern: Environment Variables

# parameters.yml
database_url: ${oc.env:DATABASE_URL}
api_key: ${oc.env:API_KEY}

Pattern: Multi-Environment Catalog

# conf/base/catalog.yml
model_input:
  type: pandas.CSVDataset
  filepath: ${base_location}/input.csv

# conf/local/catalog.yml
base_location: /local/data

# conf/prod/catalog.yml
base_location: s3://prod-bucket/data

Pattern: Feature Flags

# parameters.yml
features:
  use_new_algorithm: false
  enable_caching: true
  debug_mode: false

# conf/local/parameters.yml
features:
  debug_mode: true

# conf/prod/parameters.yml
features:
  use_new_algorithm: true
  enable_caching: true

Parameter Handling

Kedro provides a powerful parameter management system through the params: prefix, enabling automatic parameter injection into pipeline nodes.

The params: Prefix

Use the params: prefix in node inputs to automatically access parameters from your configuration:

from kedro.pipeline import node

def train_model(data, learning_rate, epochs):
    """Train model with specified hyperparameters."""
    model = Model(learning_rate=learning_rate, epochs=epochs)
    return model.fit(data)

# Reference entire parameter group
node(
    train_model,
    inputs=["training_data", "params:model"],
    outputs="trained_model"
)
# conf/base/parameters.yml
model:
  learning_rate: 0.001
  epochs: 100
  batch_size: 32

When the pipeline runs, params:model is automatically resolved to the model dictionary from parameters.yml.

Parameter Loading Mechanism

Parameters are loaded and managed by KedroContext through the following process:

  1. Configuration Loading: The OmegaConfigLoader loads parameters.yml from the configuration directory
  2. Automatic Dataset Creation: For each parameter or parameter group, Kedro automatically creates a MemoryDataset in the catalog with the params: prefix
  3. Runtime Resolution: When a node requests params:something, Kedro looks up the corresponding dataset in the catalog
from kedro.framework.session import KedroSession

with KedroSession.create() as session:
    context = session.load_context()

    # Parameters are accessible through context
    all_params = context.params
    print(all_params)  # {'model': {'learning_rate': 0.001, ...}}

    # Parameters are also available in catalog with params: prefix
    model_params = context.catalog.load("params:model")
    print(model_params)  # {'learning_rate': 0.001, ...}

Nested Parameter Access

Kedro supports nested parameter access using dot notation, creating automatic datasets for each level:

# conf/base/parameters.yml
model:
  neural_network:
    learning_rate: 0.001
    layers:
      - 128
      - 64
      - 32
  optimizer:
    type: adam
    beta1: 0.9
    beta2: 0.999

With this configuration, all of the following are automatically available:

# Access entire model config
node(func, "params:model", "output")
# Receives: {'neural_network': {...}, 'optimizer': {...}}

# Access nested neural_network config
node(func, "params:model.neural_network", "output")
# Receives: {'learning_rate': 0.001, 'layers': [128, 64, 32]}

# Access specific parameter
node(func, "params:model.neural_network.learning_rate", "output")
# Receives: 0.001

# Access optimizer config
node(func, "params:model.optimizer", "output")
# Receives: {'type': 'adam', 'beta1': 0.9, 'beta2': 0.999}

# Access specific optimizer parameter
node(func, "params:model.optimizer.type", "output")
# Receives: "adam"

Important: Kedro automatically creates parameter datasets for every possible path through the nested structure. You don't need to explicitly register them.

Parameter Resolution Order

When parameters are loaded, they follow this resolution order:

  1. Base Configuration: Load from conf/base/parameters.yml
  2. Environment Override: Merge with conf/{env}/parameters.yml (e.g., conf/prod/parameters.yml)
  3. Runtime Parameters: Apply runtime parameter overrides
  4. Catalog Registration: Register all parameters and nested paths as params:* datasets
from kedro.config import OmegaConfigLoader

# Example: Runtime parameter overrides
loader = OmegaConfigLoader(
    conf_source="conf",
    env="prod",
    runtime_params={
        "model.learning_rate": 0.01,  # Override specific nested parameter
        "model.epochs": 200
    }
)

# Resolution order:
# 1. conf/base/parameters.yml: learning_rate = 0.001
# 2. conf/prod/parameters.yml: learning_rate = 0.005 (if exists)
# 3. runtime_params: learning_rate = 0.01 (final value)

Usage Examples

Basic Parameter Access in Nodes

def preprocess_data(data, params):
    """Preprocess with configurable settings."""
    threshold = params["threshold"]
    method = params["method"]
    return process(data, threshold=threshold, method=method)

node(
    preprocess_data,
    inputs=["raw_data", "params:preprocessing"],
    outputs="processed_data"
)
# conf/base/parameters.yml
preprocessing:
  threshold: 0.5
  method: "standardize"

Accessing Nested Parameters

def train_with_hyperparams(data, learning_rate, batch_size):
    """Train with specific hyperparameters."""
    return train(data, lr=learning_rate, batch_size=batch_size)

# Access individual nested parameters
node(
    train_with_hyperparams,
    inputs=[
        "training_data",
        "params:model.learning_rate",
        "params:model.batch_size"
    ],
    outputs="model"
)

Runtime Parameter Overrides

from kedro.framework.session import KedroSession

# Override parameters for this session
with KedroSession.create(
    extra_params={
        "model.learning_rate": 0.01,
        "model.epochs": 50,
        "preprocessing.threshold": 0.7
    }
) as session:
    session.run()

Command line override:

kedro run --params="model.learning_rate=0.01,model.epochs=50"

Using Parameters in Catalog Configuration

Parameters can also be referenced in catalog configuration using variable interpolation:

# conf/base/catalog.yml
processed_data:
  type: pandas.CSVDataset
  filepath: data/processed/output.csv
  save_args:
    index: false
    sep: ${params:file_format.separator}  # Reference parameter
# conf/base/parameters.yml
file_format:
  separator: ","
  encoding: "utf-8"

Multiple Parameter Groups

def complex_processing(data, data_params, model_params, output_params):
    """Process data with multiple parameter groups."""
    cleaned = clean(data, **data_params)
    processed = transform(cleaned, **model_params)
    return format_output(processed, **output_params)

node(
    complex_processing,
    inputs=[
        "raw_data",
        "params:data_processing",
        "params:model_config",
        "params:output_format"
    ],
    outputs="final_output"
)

Named Parameter Inputs

def train_model(data, model_type, hyperparams):
    """Train with named parameter inputs."""
    return train(data, model_type=model_type, **hyperparams)

# Use dict syntax for named inputs
node(
    train_model,
    inputs={
        "data": "training_data",
        "model_type": "params:model.type",
        "hyperparams": "params:model.hyperparameters"
    },
    outputs="trained_model"
)

Best Practices

1. Never Commit Secrets

# .gitignore
conf/local/
conf/**/credentials*
**/*credentials*

2. Use Environment Variables for Secrets

# ✅ Good: Use environment variables
api_key: ${oc.env:API_KEY}

# ❌ Bad: Hardcode secrets
api_key: "secret123"

3. Organize by Concern

conf/
├── base/
│   ├── catalog.yml
│   ├── parameters.yml
│   ├── spark.yml
│   └── mlflow.yml
└── prod/
    ├── catalog.yml
    ├── parameters.yml
    └── spark.yml

4. Document Configuration

# parameters.yml
# Model hyperparameters
model:
  learning_rate: 0.001  # Learning rate for gradient descent
  epochs: 100           # Number of training epochs
  batch_size: 32        # Mini-batch size

See also:

  • OmegaConfigLoader API - Complete API documentation