0
# Apache Airflow
1
2
A platform to programmatically author, schedule and monitor workflows. Airflow allows you to author workflows as Directed Acyclic Graphs (DAGs) of tasks, execute them on an array of workers while following specified dependencies, and provides rich monitoring and troubleshooting capabilities.
3
4
## Package Information
5
6
- **Package Name**: airflow
7
- **Version**: 1.6.0
8
- **Language**: Python
9
- **Installation**: `pip install airflow`
10
11
## Core Imports
12
13
Basic operator imports:
14
15
```python
16
from airflow.operators.bash_operator import BashOperator
17
from airflow.operators.python_operator import PythonOperator, BranchPythonOperator, ShortCircuitOperator
18
from airflow.operators.dummy_operator import DummyOperator
19
from airflow.operators.email_operator import EmailOperator
20
from airflow.operators.mysql_operator import MySqlOperator
21
from airflow.operators.postgres_operator import PostgresOperator
22
from airflow.operators.sqlite_operator import SqliteOperator
23
from airflow.operators.http_operator import SimpleHttpOperator
24
from airflow.operators.subdag_operator import SubDagOperator
25
from airflow.operators.sensors import BaseSensorOperator, SqlSensor, HdfsSensor
26
```
27
28
Hook imports:
29
30
```python
31
from airflow.hooks.base_hook import BaseHook
32
from airflow.hooks.dbapi_hook import DbApiHook
33
from airflow.hooks.http_hook import HttpHook
34
```
35
36
Core utilities:
37
38
```python
39
from airflow.models import BaseOperator, DAG
40
from airflow.utils import State, TriggerRule, AirflowException, apply_defaults
41
```
42
43
## Basic Usage
44
45
```python
46
from datetime import datetime, timedelta
47
from airflow import DAG
48
from airflow.operators.bash_operator import BashOperator
49
from airflow.operators.python_operator import PythonOperator
50
from airflow.operators.dummy_operator import DummyOperator
51
52
# Define DAG
53
default_args = {
54
'owner': 'airflow',
55
'depends_on_past': False,
56
'start_date': datetime(2023, 1, 1),
57
'retries': 1,
58
'retry_delay': timedelta(minutes=5)
59
}
60
61
dag = DAG(
62
'example_workflow',
63
default_args=default_args,
64
schedule_interval=timedelta(days=1),
65
catchup=False
66
)
67
68
# Define tasks
69
start_task = DummyOperator(
70
task_id='start',
71
dag=dag
72
)
73
74
def process_data(**context):
75
print(f"Processing data for {context['ds']}")
76
return "Data processed successfully"
77
78
python_task = PythonOperator(
79
task_id='process_data',
80
python_callable=process_data,
81
provide_context=True,
82
dag=dag
83
)
84
85
bash_task = BashOperator(
86
task_id='cleanup',
87
bash_command='echo "Cleanup completed"',
88
dag=dag
89
)
90
91
# Set dependencies
92
start_task >> python_task >> bash_task
93
```
94
95
## Architecture
96
97
Apache Airflow's architecture centers around several key concepts:
98
99
- **Operators**: Define atomic units of work (tasks) in workflows
100
- **Hooks**: Provide interfaces to external systems and services
101
- **DAGs**: Directed Acyclic Graphs that define workflow structure and dependencies
102
- **Task Instances**: Runtime representations of tasks with execution context
103
104
**Operators vs Hooks**: Operators focus on task execution and workflow logic, while hooks abstract external system connectivity. Operators often use hooks internally for data source interactions.
105
106
**Task Lifecycle**: Tasks progress through states (queued → running → success/failed) with support for retries, upstream dependency checking, and conditional execution via trigger rules.
107
108
## Capabilities
109
110
### Core Task Operators
111
112
Essential operators for task execution including bash commands, Python functions, workflow branching, and email notifications. These operators form the building blocks of most Airflow workflows.
113
114
```python { .api }
115
class BashOperator(BaseOperator):
116
def __init__(self, bash_command, xcom_push=False, env=None, **kwargs): ...
117
118
class PythonOperator(BaseOperator):
119
def __init__(self, python_callable, op_args=[], op_kwargs={}, provide_context=False, **kwargs): ...
120
121
class DummyOperator(BaseOperator):
122
def __init__(self, **kwargs): ...
123
124
class EmailOperator(BaseOperator):
125
def __init__(self, to, subject, html_content, files=None, **kwargs): ...
126
```
127
128
[Core Operators](./operators.md)
129
130
### Database Operations
131
132
SQL execution operators for various database systems including MySQL, PostgreSQL, and SQLite with connection management, parameter binding, and transaction control.
133
134
```python { .api }
135
class MySqlOperator(BaseOperator):
136
def __init__(self, sql, mysql_conn_id='mysql_default', parameters=None, **kwargs): ...
137
138
class PostgresOperator(BaseOperator):
139
def __init__(self, sql, postgres_conn_id='postgres_default', autocommit=False, parameters=None, **kwargs): ...
140
141
class SqliteOperator(BaseOperator):
142
def __init__(self, sql, sqlite_conn_id='sqlite_default', parameters=None, **kwargs): ...
143
```
144
145
[Database Operators](./operators.md#database-operations)
146
147
### HTTP and Web Operations
148
149
Execute HTTP requests against external APIs and web services with response validation and customizable request parameters.
150
151
```python { .api }
152
class SimpleHttpOperator(BaseOperator):
153
def __init__(self, endpoint, method='POST', data=None, headers=None,
154
response_check=None, http_conn_id='http_default', **kwargs): ...
155
```
156
157
[HTTP Operations](./operators.md#http-operations)
158
159
### Sensor Operations
160
161
Monitor external systems and wait for conditions to be met with configurable polling intervals and timeout handling.
162
163
```python { .api }
164
class BaseSensorOperator(BaseOperator):
165
def __init__(self, poke_interval=60, timeout=60*60*24*7, **kwargs): ...
166
167
class SqlSensor(BaseSensorOperator):
168
def __init__(self, conn_id, sql, **kwargs): ...
169
170
class HdfsSensor(BaseSensorOperator):
171
def __init__(self, filepath, hdfs_conn_id='hdfs_default', **kwargs): ...
172
```
173
174
[Sensor Operations](./operators.md#sensor-operations)
175
176
### Workflow Composition
177
178
Create complex workflows with sub-DAGs for modular and reusable workflow components.
179
180
```python { .api }
181
class SubDagOperator(BaseOperator):
182
def __init__(self, subdag, executor=DEFAULT_EXECUTOR, **kwargs): ...
183
```
184
185
[Workflow Composition](./operators.md#workflow-composition)
186
187
### System Integration Hooks
188
189
Hooks provide standardized interfaces for connecting to external systems including databases, HTTP APIs, and custom services with built-in connection management and error handling.
190
191
```python { .api }
192
class BaseHook:
193
@classmethod
194
def get_connection(cls, conn_id): ...
195
@classmethod
196
def get_connections(cls, conn_id): ...
197
198
class DbApiHook(BaseHook):
199
def get_records(self, sql, parameters=None): ...
200
def run(self, sql, autocommit=False, parameters=None): ...
201
202
class HttpHook(BaseHook):
203
def run(self, endpoint, data=None, headers=None, extra_options=None): ...
204
```
205
206
[System Hooks](./hooks.md)
207
208
### Foundation Classes and Utilities
209
210
Base classes and utility functions that provide the core framework for operator development, state management, error handling, and workflow control.
211
212
```python { .api }
213
class BaseOperator:
214
def __init__(self, task_id, owner='airflow', retries=0, **kwargs): ...
215
def execute(self, context): ...
216
217
class State:
218
QUEUED = "queued"
219
RUNNING = "running"
220
SUCCESS = "success"
221
FAILED = "failed"
222
223
@apply_defaults
224
def operator_constructor(self, **kwargs): ...
225
```
226
227
[Core Framework](./core.md)