A cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.
npx @tessl/cli install tessl/pypi-dagster@1.11.0Dagster is a cloud-native data pipeline orchestrator with integrated lineage, observability, and a declarative programming model. It provides a comprehensive framework for building, testing, and deploying data pipelines with software engineering best practices. Dagster enables teams to build reliable, scalable data platforms with rich metadata, comprehensive observability, and powerful automation capabilities.
Package: dagster
Version: 1.11.8
API Elements: 700+ public API elements
Functional Areas: 26 major areas
import dagster
from dagster import (
# Core decorators
asset, multi_asset, op, job, graph, sensor, schedule, resource,
# Definition classes
Definitions, AssetsDefinition, OpDefinition, JobDefinition,
# Configuration
Config, ConfigurableResource, Field, Shape,
# Execution
materialize, execute_job, build_op_context, build_asset_context,
# Types
In, Out, AssetIn, AssetOut, DagsterType,
# Events and Results
AssetMaterialization, AssetObservation, MaterializeResult,
# Storage
IOManager, fs_io_manager, mem_io_manager,
# Partitions
DailyPartitionsDefinition, HourlyPartitionsDefinition, StaticPartitionsDefinition,
# Error handling
DagsterError, Failure, RetryRequested
)Dagster follows a modular architecture organized around these core concepts:
The primary abstraction for data artifacts. Assets represent data products that exist or should exist.
@asset
def my_asset() -> MaterializeResult:
"""Define a software-defined asset."""
# Asset computation logic
return MaterializeResult()Fundamental computational units and their orchestration.
@op
def my_op() -> str:
"""Define an operation."""
return "result"
@job
def my_job():
"""Define a job."""
my_op()External dependencies and services injected into computations.
@resource
def my_resource():
"""Define a resource."""
return "resource_value"Automation triggers for pipeline execution.
@schedule(cron_schedule="0 0 * * *", job=my_job)
def daily_schedule():
"""Define a daily schedule."""
return {}@asset decorator and AssetsDefinition class@multi_asset for defining multiple related assets@asset_checkSee: Core Definitions
Config classConfigurableResource for dependency injectionEnvVar for environment-based configurationSee: Configuration System
materialize() function for asset executionexecute_job() for operation-based workflowsIOManager interfaceSee: Storage and I/O
See: Events and Metadata
See: Partitions System
See: Error Handling
import pandas as pd
from dagster import asset, materialize, Definitions
@asset
def raw_data() -> pd.DataFrame:
"""Load raw data from source."""
return pd.DataFrame({"id": [1, 2, 3], "value": [10, 20, 30]})
@asset
def processed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
"""Process raw data."""
return raw_data.assign(processed_value=raw_data["value"] * 2)
@asset
def analysis_result(processed_data: pd.DataFrame) -> dict:
"""Generate analysis from processed data."""
return {
"total_records": len(processed_data),
"average_value": processed_data["processed_value"].mean()
}
# Define the complete set of definitions
defs = Definitions(
assets=[raw_data, processed_data, analysis_result]
)
# Materialize assets
if __name__ == "__main__":
result = materialize([raw_data, processed_data, analysis_result])
print(f"Materialized {len(result.asset_materializations)} assets")Modular, reusable components for complex data platforms:
from dagster import Component, Definitions
# Use components for modularity
defs = Definitions(
assets=load_assets_from_modules([my_assets_module]),
resources={"database": database_resource}
)Execute external processes with full Dagster integration:
from dagster import PipesSubprocessClient, asset
@asset
def external_asset(pipes_subprocess_client: PipesSubprocessClient):
"""Run external Python script with Dagster context."""
return pipes_subprocess_client.run(
command=["python", "external_script.py"],
context={"asset_key": "external_asset"}
).get_results()Sophisticated automation policies:
from dagster import AutoMaterializePolicy, AutoMaterializeRule
@asset(
auto_materialize_policy=AutoMaterializePolicy.eager()
.with_rules(AutoMaterializeRule.materialize_on_parent_updated())
)
def auto_asset(upstream_asset):
"""Automatically materialized asset."""
return process(upstream_asset)Dagster integrates with the entire modern data stack:
pip install dagster dagster-webserver@assetDefinitionsdagster dev for local developmentThis Knowledge Tile provides comprehensive API documentation organized by functional area:
Each section provides complete API documentation with function signatures, type definitions, usage examples, and cross-references to enable comprehensive understanding and usage of the Dagster framework.