Tessl Tile for pypi/dagster@1.11.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-dagster

A cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/dagster@1.11.x

To install, run

npx @tessl/cli install tessl/pypi-dagster@1.11.0

0
# Dagster Knowledge Tile
1

2
## Overview
3

4
Dagster is a cloud-native data pipeline orchestrator with integrated lineage, observability, and a declarative programming model. It provides a comprehensive framework for building, testing, and deploying data pipelines with software engineering best practices. Dagster enables teams to build reliable, scalable data platforms with rich metadata, comprehensive observability, and powerful automation capabilities.
5

6
## Package Information
7

8
**Package:** `dagster`  
9
**Version:** 1.11.8  
10
**API Elements:** 700+ public API elements  
11
**Functional Areas:** 26 major areas  
12

13
## Core Imports
14

15
```python
16
import dagster
17
from dagster import (
18
    # Core decorators
19
    asset, multi_asset, op, job, graph, sensor, schedule, resource,
20
    
21
    # Definition classes
22
    Definitions, AssetsDefinition, OpDefinition, JobDefinition,
23
    
24
    # Configuration
25
    Config, ConfigurableResource, Field, Shape,
26
    
27
    # Execution
28
    materialize, execute_job, build_op_context, build_asset_context,
29
    
30
    # Types
31
    In, Out, AssetIn, AssetOut, DagsterType,
32
    
33
    # Events and Results
34
    AssetMaterialization, AssetObservation, MaterializeResult,
35
    
36
    # Storage
37
    IOManager, fs_io_manager, mem_io_manager,
38
    
39
    # Partitions
40
    DailyPartitionsDefinition, HourlyPartitionsDefinition, StaticPartitionsDefinition,
41
    
42
    # Error handling
43
    DagsterError, Failure, RetryRequested
44
)
45
```
46

47
## Architecture Overview
48

49
Dagster follows a modular architecture organized around these core concepts:
50

51
### 1. Software-Defined Assets (SDAs)
52
The primary abstraction for data artifacts. Assets represent data products that exist or should exist.
53

54
```python { .api }
55
@asset
56
def my_asset() -> MaterializeResult:
57
    """Define a software-defined asset."""
58
    # Asset computation logic
59
    return MaterializeResult()
60
```
61

62
### 2. Operations and Jobs
63
Fundamental computational units and their orchestration.
64

65
```python { .api }
66
@op
67
def my_op() -> str:
68
    """Define an operation."""
69
    return "result"
70

71
@job
72
def my_job():
73
    """Define a job."""
74
    my_op()
75
```
76

77
### 3. Resources
78
External dependencies and services injected into computations.
79

80
```python { .api }
81
@resource
82
def my_resource():
83
    """Define a resource."""
84
    return "resource_value"
85
```
86

87
### 4. Schedules and Sensors
88
Automation triggers for pipeline execution.
89

90
```python { .api }
91
@schedule(cron_schedule="0 0 * * *", job=my_job)
92
def daily_schedule():
93
    """Define a daily schedule."""
94
    return {}
95
```
96

97
## Key Capabilities
98

99
### Asset Management
100
- **Asset Definition**: `@asset` decorator and `AssetsDefinition` class
101
- **Multi-Asset Support**: `@multi_asset` for defining multiple related assets
102
- **Asset Dependencies**: Automatic dependency inference and explicit specification
103
- **Asset Checks**: Data quality validation with `@asset_check`
104
- **Asset Lineage**: Automatic lineage tracking and visualization
105

106
See: [Core Definitions](./core-definitions.md)
107

108
### Configuration System
109
- **Type-Safe Config**: Pydantic-based configuration with `Config` class
110
- **Configurable Resources**: `ConfigurableResource` for dependency injection  
111
- **Environment Variables**: `EnvVar` for environment-based configuration
112
- **Schema Validation**: Comprehensive validation with helpful error messages
113

114
See: [Configuration System](./configuration.md)
115

116
### Execution Engine
117
- **Multiple Executors**: In-process, multiprocess, and custom execution
118
- **Rich Contexts**: Execution contexts with metadata, logging, and resources
119
- **Asset Materialization**: `materialize()` function for asset execution
120
- **Job Execution**: `execute_job()` for operation-based workflows
121

122
See: [Execution and Contexts](./execution-contexts.md)
123

124
### Storage and I/O
125
- **I/O Managers**: Pluggable storage backends with `IOManager` interface
126
- **Built-in I/O**: Filesystem, memory, and universal path I/O managers
127
- **Custom I/O**: Configurable I/O managers for any storage system
128
- **Asset Value Loading**: Efficient loading of materialized asset values
129

130
See: [Storage and I/O](./storage-io.md)
131

132
### Events and Metadata
133
- **Rich Metadata**: Comprehensive metadata system with typed values
134
- **Asset Events**: Materialization and observation events
135
- **Custom Events**: User-defined events for pipeline observability
136
- **Table Metadata**: Specialized metadata for tabular data
137

138
See: [Events and Metadata](./events-metadata.md)
139

140
### Automation
141
- **Asset Sensors**: Event-driven execution based on asset changes
142
- **Schedule System**: Time-based execution with cron expressions
143
- **Auto-Materialization**: Declarative policies for automatic execution
144
- **Run Status Sensors**: Sensors for pipeline failure handling
145

146
See: [Sensors and Schedules](./sensors-schedules.md)
147

148
### Partitioning
149
- **Time Partitions**: Daily, hourly, weekly, monthly partitions
150
- **Static Partitions**: Fixed sets of partitions
151
- **Dynamic Partitions**: Runtime-defined partitions
152
- **Multi-Dimensional**: Complex partitioning schemes
153

154
See: [Partitions System](./partitions.md)
155

156
### Error Handling
157
- **Structured Errors**: Comprehensive error hierarchy
158
- **Retry Policies**: Configurable retry strategies
159
- **Failure Events**: Rich failure information and debugging
160
- **Graceful Degradation**: Partial execution and recovery
161

162
See: [Error Handling](./error-handling.md)
163

164
## Basic Usage Example
165

166
```python { .api }
167
import pandas as pd
168
from dagster import asset, materialize, Definitions
169

170
@asset
171
def raw_data() -> pd.DataFrame:
172
    """Load raw data from source."""
173
    return pd.DataFrame({"id": [1, 2, 3], "value": [10, 20, 30]})
174

175
@asset
176
def processed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
177
    """Process raw data."""
178
    return raw_data.assign(processed_value=raw_data["value"] * 2)
179

180
@asset  
181
def analysis_result(processed_data: pd.DataFrame) -> dict:
182
    """Generate analysis from processed data."""
183
    return {
184
        "total_records": len(processed_data),
185
        "average_value": processed_data["processed_value"].mean()
186
    }
187

188
# Define the complete set of definitions
189
defs = Definitions(
190
    assets=[raw_data, processed_data, analysis_result]
191
)
192

193
# Materialize assets
194
if __name__ == "__main__":
195
    result = materialize([raw_data, processed_data, analysis_result])
196
    print(f"Materialized {len(result.asset_materializations)} assets")
197
```
198

199
## Advanced Features
200

201
### Components System (Beta)
202
Modular, reusable components for complex data platforms:
203

204
```python { .api }
205
from dagster import Component, Definitions
206

207
# Use components for modularity
208
defs = Definitions(
209
    assets=load_assets_from_modules([my_assets_module]),
210
    resources={"database": database_resource}
211
)
212
```
213

214
### Pipes Integration
215
Execute external processes with full Dagster integration:
216

217
```python { .api }
218
from dagster import PipesSubprocessClient, asset
219

220
@asset
221
def external_asset(pipes_subprocess_client: PipesSubprocessClient):
222
    """Run external Python script with Dagster context."""
223
    return pipes_subprocess_client.run(
224
        command=["python", "external_script.py"],
225
        context={"asset_key": "external_asset"}
226
    ).get_results()
227
```
228

229
### Declarative Automation
230
Sophisticated automation policies:
231

232
```python { .api }
233
from dagster import AutoMaterializePolicy, AutoMaterializeRule
234

235
@asset(
236
    auto_materialize_policy=AutoMaterializePolicy.eager()
237
    .with_rules(AutoMaterializeRule.materialize_on_parent_updated())
238
)
239
def auto_asset(upstream_asset):
240
    """Automatically materialized asset."""
241
    return process(upstream_asset)
242
```
243

244
## Integration Capabilities
245

246
Dagster integrates with the entire modern data stack:
247

248
- **Data Warehouses**: Snowflake, BigQuery, Redshift, PostgreSQL
249
- **Data Lakes**: S3, GCS, Azure, Delta Lake, Iceberg  
250
- **ML Platforms**: MLflow, Weights & Biases, SageMaker
251
- **Orchestration**: Kubernetes, Docker, cloud platforms
252
- **Observability**: Slack, email, PagerDuty, custom webhooks
253
- **Version Control**: Git-based deployments and CI/CD integration
254

255
## Getting Started
256

257
1. **Install Dagster**: `pip install dagster dagster-webserver`
258
2. **Define Assets**: Create Python functions decorated with `@asset`
259
3. **Create Definitions**: Bundle assets, resources, and schedules in `Definitions`
260
4. **Run Dagster**: Use `dagster dev` for local development
261
5. **Deploy**: Use Dagster Cloud or self-hosted deployment options
262

263
## Documentation Structure
264

265
This Knowledge Tile provides comprehensive API documentation organized by functional area:
266

267
- **[Core Definitions](./core-definitions.md)** - Assets, operations, jobs, and repositories
268
- **[Configuration System](./configuration.md)** - Type-safe configuration and resources  
269
- **[Execution and Contexts](./execution-contexts.md)** - Runtime system and execution contexts
270
- **[Storage and I/O](./storage-io.md)** - Data persistence and I/O management
271
- **[Events and Metadata](./events-metadata.md)** - Event system and metadata framework
272
- **[Sensors and Schedules](./sensors-schedules.md)** - Automation and triggering systems
273
- **[Partitions System](./partitions.md)** - Data partitioning and time-based execution  
274
- **[Error Handling](./error-handling.md)** - Error hierarchy and failure management
275

276
Each section provides complete API documentation with function signatures, type definitions, usage examples, and cross-references to enable comprehensive understanding and usage of the Dagster framework.