or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-definitions.md error-handling.md events-metadata.md execution-contexts.md index.md partitions.md sensors-schedules.md storage-io.md

tile.json

tessl/pypi-dagster

A cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/dagster@1.11.x

To install, run

npx @tessl/cli install tessl/pypi-dagster@1.11.0

Dagster Knowledge Tile

Overview

Dagster is a cloud-native data pipeline orchestrator with integrated lineage, observability, and a declarative programming model. It provides a comprehensive framework for building, testing, and deploying data pipelines with software engineering best practices. Dagster enables teams to build reliable, scalable data platforms with rich metadata, comprehensive observability, and powerful automation capabilities.

Package Information

Package: dagster
Version: 1.11.8
API Elements: 700+ public API elements
Functional Areas: 26 major areas

Core Imports

import dagster
from dagster import (
    # Core decorators
    asset, multi_asset, op, job, graph, sensor, schedule, resource,
    
    # Definition classes
    Definitions, AssetsDefinition, OpDefinition, JobDefinition,
    
    # Configuration
    Config, ConfigurableResource, Field, Shape,
    
    # Execution
    materialize, execute_job, build_op_context, build_asset_context,
    
    # Types
    In, Out, AssetIn, AssetOut, DagsterType,
    
    # Events and Results
    AssetMaterialization, AssetObservation, MaterializeResult,
    
    # Storage
    IOManager, fs_io_manager, mem_io_manager,
    
    # Partitions
    DailyPartitionsDefinition, HourlyPartitionsDefinition, StaticPartitionsDefinition,
    
    # Error handling
    DagsterError, Failure, RetryRequested
)

Architecture Overview

Dagster follows a modular architecture organized around these core concepts:

1. Software-Defined Assets (SDAs)

The primary abstraction for data artifacts. Assets represent data products that exist or should exist.

@asset
def my_asset() -> MaterializeResult:
    """Define a software-defined asset."""
    # Asset computation logic
    return MaterializeResult()

2. Operations and Jobs

Fundamental computational units and their orchestration.

@op
def my_op() -> str:
    """Define an operation."""
    return "result"

@job
def my_job():
    """Define a job."""
    my_op()

3. Resources

External dependencies and services injected into computations.

@resource
def my_resource():
    """Define a resource."""
    return "resource_value"

4. Schedules and Sensors

Automation triggers for pipeline execution.

@schedule(cron_schedule="0 0 * * *", job=my_job)
def daily_schedule():
    """Define a daily schedule."""
    return {}

Key Capabilities

Asset Management

Asset Definition: @asset decorator and AssetsDefinition class
Multi-Asset Support: @multi_asset for defining multiple related assets
Asset Dependencies: Automatic dependency inference and explicit specification
Asset Checks: Data quality validation with @asset_check
Asset Lineage: Automatic lineage tracking and visualization

See: Core Definitions

Configuration System

Type-Safe Config: Pydantic-based configuration with Config class
Configurable Resources: ConfigurableResource for dependency injection
Environment Variables: EnvVar for environment-based configuration
Schema Validation: Comprehensive validation with helpful error messages

See: Configuration System

Execution Engine

Multiple Executors: In-process, multiprocess, and custom execution
Rich Contexts: Execution contexts with metadata, logging, and resources
Asset Materialization: materialize() function for asset execution
Job Execution: execute_job() for operation-based workflows

See: Execution and Contexts

Storage and I/O

I/O Managers: Pluggable storage backends with IOManager interface
Built-in I/O: Filesystem, memory, and universal path I/O managers
Custom I/O: Configurable I/O managers for any storage system
Asset Value Loading: Efficient loading of materialized asset values

See: Storage and I/O

Events and Metadata

Rich Metadata: Comprehensive metadata system with typed values
Asset Events: Materialization and observation events
Custom Events: User-defined events for pipeline observability
Table Metadata: Specialized metadata for tabular data

See: Events and Metadata

Automation

Asset Sensors: Event-driven execution based on asset changes
Schedule System: Time-based execution with cron expressions
Auto-Materialization: Declarative policies for automatic execution
Run Status Sensors: Sensors for pipeline failure handling

See: Sensors and Schedules

Partitioning

Time Partitions: Daily, hourly, weekly, monthly partitions
Static Partitions: Fixed sets of partitions
Dynamic Partitions: Runtime-defined partitions
Multi-Dimensional: Complex partitioning schemes

See: Partitions System

Error Handling

Structured Errors: Comprehensive error hierarchy
Retry Policies: Configurable retry strategies
Failure Events: Rich failure information and debugging
Graceful Degradation: Partial execution and recovery

See: Error Handling

Basic Usage Example

import pandas as pd
from dagster import asset, materialize, Definitions

@asset
def raw_data() -> pd.DataFrame:
    """Load raw data from source."""
    return pd.DataFrame({"id": [1, 2, 3], "value": [10, 20, 30]})

@asset
def processed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
    """Process raw data."""
    return raw_data.assign(processed_value=raw_data["value"] * 2)

@asset  
def analysis_result(processed_data: pd.DataFrame) -> dict:
    """Generate analysis from processed data."""
    return {
        "total_records": len(processed_data),
        "average_value": processed_data["processed_value"].mean()
    }

# Define the complete set of definitions
defs = Definitions(
    assets=[raw_data, processed_data, analysis_result]
)

# Materialize assets
if __name__ == "__main__":
    result = materialize([raw_data, processed_data, analysis_result])
    print(f"Materialized {len(result.asset_materializations)} assets")

Advanced Features

Components System (Beta)

Modular, reusable components for complex data platforms:

from dagster import Component, Definitions

# Use components for modularity
defs = Definitions(
    assets=load_assets_from_modules([my_assets_module]),
    resources={"database": database_resource}
)

Pipes Integration

Execute external processes with full Dagster integration:

from dagster import PipesSubprocessClient, asset

@asset
def external_asset(pipes_subprocess_client: PipesSubprocessClient):
    """Run external Python script with Dagster context."""
    return pipes_subprocess_client.run(
        command=["python", "external_script.py"],
        context={"asset_key": "external_asset"}
    ).get_results()

Declarative Automation

Sophisticated automation policies:

from dagster import AutoMaterializePolicy, AutoMaterializeRule

@asset(
    auto_materialize_policy=AutoMaterializePolicy.eager()
    .with_rules(AutoMaterializeRule.materialize_on_parent_updated())
)
def auto_asset(upstream_asset):
    """Automatically materialized asset."""
    return process(upstream_asset)

Integration Capabilities

Dagster integrates with the entire modern data stack:

Data Warehouses: Snowflake, BigQuery, Redshift, PostgreSQL
Data Lakes: S3, GCS, Azure, Delta Lake, Iceberg
ML Platforms: MLflow, Weights & Biases, SageMaker
Orchestration: Kubernetes, Docker, cloud platforms
Observability: Slack, email, PagerDuty, custom webhooks
Version Control: Git-based deployments and CI/CD integration

Getting Started

Install Dagster: pip install dagster dagster-webserver
Define Assets: Create Python functions decorated with @asset
Create Definitions: Bundle assets, resources, and schedules in Definitions
Run Dagster: Use dagster dev for local development
Deploy: Use Dagster Cloud or self-hosted deployment options

Documentation Structure

This Knowledge Tile provides comprehensive API documentation organized by functional area:

Core Definitions - Assets, operations, jobs, and repositories
Configuration System - Type-safe configuration and resources
Execution and Contexts - Runtime system and execution contexts
Storage and I/O - Data persistence and I/O management
Events and Metadata - Event system and metadata framework
Sensors and Schedules - Automation and triggering systems
Partitions System - Data partitioning and time-based execution
Error Handling - Error hierarchy and failure management

Each section provides complete API documentation with function signatures, type definitions, usage examples, and cross-references to enable comprehensive understanding and usage of the Dagster framework.

Version

Tile

Files

tessl/pypi-dagster

To install, run

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Dagster Knowledge Tile

Overview

Package Information

Core Imports

Architecture Overview

1. Software-Defined Assets (SDAs)

2. Operations and Jobs

3. Resources

4. Schedules and Sensors

Key Capabilities

Asset Management

Configuration System

Execution Engine

Storage and I/O

Events and Metadata

Automation

Partitioning

Error Handling

Basic Usage Example

Advanced Features

Components System (Beta)

Pipes Integration

Declarative Automation

Integration Capabilities

Getting Started

Documentation Structure

index.mddocs/